Cloud Infrastructure SRE
New York, NY
Posted Feb 24, 2017 - Requisition No. 57191
About Engineering Compute Infrastructure:
Bloomberg has an impressive internal Cloud that enables our applications to prosper. We’re the team of system engineers focused on scaling it. Every day we use our system architecture & engineering skills to dig deep into the Openstack architecture and influence the reliability and supportability aspects. We have a vision and a mission to automate away highly operational tasks such as break/fix, cluster migrations, new service build-outs, abuse, etc. On any given day we work with engineering, product & project managers, test and automation teams on everything from architecture to developing strategic and tactical solutions.
Join us and you’ll contribute to services that can shrink and expand based on demand, self-heal and automatically rollout.
What’s in it for you:
On the Cloud Services team you’ll get to work with OpenStack and Ceph operations. Troubleshooting system issues, educating our customers, deploying hardware and software and improving system reliability will be things you touch regularly. You’ll also contribute to our public open source OpenStack cloud and software-defined storage (Ceph) distributions.
We’ll trust you to:
- Automate operation, installation and monitoring of OpenStack ecosystem components; specifically: Prometheus, Calico, Bird, Vcenter, Clustered MySQL, Zabbix, OpenStack, Ceph, Ubuntu, RabbitMQ, Apache HTTPd, and others
- Performance analysis and capacity planning for clusters
- Troubleshoot and debug Cloud ecosystem run-time issues, which includes weekly On Call duties every 5-6 weeks.
- Provide developer and operations documentation
- OS and hardware level optimizations
You need to have:
- Proven experience building and scaling out and managing large distributed services
- 3+ years of experience configuring, deploying and operating OpenStack and Ceph with both block and s3 configurations
- 2+ years of Chef and/or Ansible
- 3+ years of Scripting, ideally Python and Ruby with use of Nova API
- 4+ years of experience with running a large infrastructure Service with at least four nines of availability
- 3+ years of Dev. Ops. experience or System Administration experience (be conversant in Unix networking and C system calls)
- Experience in Java, Python or Ruby development is a plus (including testing with standard test frameworks and dependency management systems, knowledge of Java garbage collection fundamentals)