Posted Mar 21, 2018 - Requisition No. 66139
A Service Reliability Engineer (SRE) at Bloomberg is a hybrid of systems and software engineering who is trusted to improve the stability and availability of the production environment through automation. They are responsible for Monitoring, Provisioning / Configuration / Orchestration, Capacity Management, Deployment and Rollback, Incident Management, and SDLC practices.
The Cloud Stability group is trusted to support Bloomberg's private cloud infrastructure. This infrastructure runs on our own open-source OpenStack distribution based on OpenStack itself, Ubuntu, Chef, Ansible and Ceph. It spans across Bloomberg's own world-class data centers and global private network, hosting business critical applications and services. You'll be focused on ensuring the high-availability and scalability of this environment.
You'll work with modern open-source tooling while maintaining mission-critical systems hosting a wide array of applications. We'll depend on you to advise on design, architecture, and scaling of our virtual farms that utilize several different technologies with different SLAs. In addition you'll play a critical role in improving the stability of all cloud systems to help us ensure we have a solid platform as we scale.