Systems Reliability Engineer - Bloomberg Law
New York, NY
Posted Oct 5, 2020 - Requisition No. 81822
Bloomberg Law SRE combines software and systems engineering to champion the use of sound engineering principles, operational discipline, and automation. We focus on improving Bloomberg Law (BLAW) product reliability, stability, and scaling with an interest in fault-tolerant distributed system design. Our culture of diversity, intellectual curiosity, methodical problem solving and openness in a blameless environment are keys to our success.
What's in it for you:
As a SRE at Bloomberg Law, your mission is to improve reliability, stability and performance of the BLAW Platform by implementing and promoting industry-wide SRE best practices. You will be empowered to identify stability gaps and influence/drive solutions to improve the overall reliability of BLAW. You will promote optimal availability, latency, and scalability of client-facing applications as well as data processing pipelines. You will have the opportunity to work alongside application engineers across the full stack that uses modern open source web and data processing technologies.
We'll trust you to:
- Provide application teams with self-serve tools to deploy/manage applications, and run their production environment
- Help us establish Service Level Objectives (SLO) to measure reliability and initiate projects aimed at meeting those objectives.
- Work alongside application engineers to code, deploy and troubleshoot production problems as they occur, and drive post-mortem process
- Measure current capacity, predict future capacity needs and make suggestions accordingly
You need to have:
- 3+ years of experience working on highly available, fault-tolerant distributed systems
- A mindset to ensure stability of production environment, applying software engineering solutions to run/manage applications
- Understanding of Unix/Linux operating systems, shell scripting and tools
- BS/MS/PhD in Computer Science, Engineering or related technology field
We'd love to see:
- A deep understanding stability & reliability engineering (SRE) principles and practices
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Create project ideas and implement them with effective collaboration and communication.
- Knowledge of cloud computing, storage and networking infrastructures
- Familiarity with kubernetes/docker/containers
- Ability to work with diverse teams and personalities
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.