System Reliability Engineer -Vault
New York, NY
Posted Feb 11, 2020 - Requisition No. 79122
The Bloomberg Vault Cloud team, whose platform processes over 300 million messages daily, and archives 90+ billion objects, is looking for a Site Reliability Engineer.
You will be working to define and improve our entire compute and web infrastructure. As we are beginning to rearchitect our platform, you will have the opportunity to make a grassroots impact. We need your help ensuring our systems are reliable, which includes scaling with the ever-increasing flow of enterprise data.
What we are working on:
- Building a robust monitoring platform as we migrate from a legacy big data and web application platform to one built on top of Bloomberg managed cloud services (Kafka, Spark, Zookeeper, etc., all “as a service”)
- Supporting key Vault end-user applications with extensive end-to-end monitoring services that provide metrics against our Service Level Objectives
- An overhaul of the greater Vault department to an SRE mindset, with a focus on customer-centric metrics while reducing KTLO and tech debt
We'll trust you to:
- Work with the development teams to highlight recurring issues; you will ensure these are addressed across all application teams in a consistent way
- Use your excellent SDLC skills to identify and optimize development and engineering practices throughout the organization.
- Automate away manual processes
- Help us establish Service Level Objectivess and Service Level Indicators that we can use to measure our quality as an organization, and contribute to engineering projects aimed at ensuring we meet those standards
You’ll need to have:
- Experience developing full-time and are comfortable with multiple languages
- Experience with automation/configuration management systems like Chef, Puppet, or Ansible
- Experience with monitoring and logging analysis metrics tools
- Confidence working with Linux
- Excellent communication skills and the ability to effectively collaborate with developers
We'd love to see:
- Prior SRE experience or are excited about the field
- Experience with team transformation to focus on system reliability, with a focus on applying software engineering principles to systems management
- Experience operating and deploying Continuous Integration and Continuous Deployment (CI/CD) environments
- Knowledge of Cloud Native Applications and Infrastructure, including Docker and Kubernetes
Bloomberg is an equal opportunities employer and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.