System Reliability Engineer -Vault

Similar jobs

New York, NY

Posted Feb 11, 2020 - Requisition No. 79122

The Bloomberg Vault Cloud team, whose platform processes over 300 million messages daily, and archives 90+ billion objects, is looking for a Site Reliability Engineer.
You will be working to define and improve our entire compute and web infrastructure. As we are beginning to rearchitect our platform, you will have the opportunity to make a grassroots impact. We need your help ensuring our systems are reliable, which includes scaling with the ever-increasing flow of enterprise data.

What we are working on:

Building a robust monitoring platform as we migrate from a legacy big data and web application platform to one built on top of Bloomberg managed cloud services (Kafka, Spark, Zookeeper, etc., all “as a service”)
Supporting key Vault end-user applications with extensive end-to-end monitoring services that provide metrics against our Service Level Objectives
An overhaul of the greater Vault department to an SRE mindset, with a focus on customer-centric metrics while reducing KTLO and tech debt

We'll trust you to:

Work with the development teams to highlight recurring issues; you will ensure these are addressed across all application teams in a consistent way
Use your excellent SDLC skills to identify and optimize development and engineering practices throughout the organization.
Automate away manual processes
Help us establish Service Level Objectivess and Service Level Indicators that we can use to measure our quality as an organization, and contribute to engineering projects aimed at ensuring we meet those standards

You’ll need to have:

Experience developing full-time and are comfortable with multiple languages
Experience with automation/configuration management systems like Chef, Puppet, or Ansible
Experience with monitoring and logging analysis metrics tools
Confidence working with Linux
Excellent communication skills and the ability to effectively collaborate with developers

We'd love to see:

Prior SRE experience or are excited about the field
Experience with team transformation to focus on system reliability, with a focus on applying software engineering principles to systems management
Experience operating and deploying Continuous Integration and Continuous Deployment (CI/CD) environments
Knowledge of Cloud Native Applications and Infrastructure, including Docker and Kubernetes

Bloomberg is an equal opportunities employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.