Site Reliability Engineer - Data Technologies

Careers at Bloomberg


Posted Sep 22, 2016 - Requisition No. 54485

As an SRE (or "Site Reliability Engineer ") at Bloomberg, you will work on our biggest, most critical services. Your mission will be to ensure Bloomberg is fast, highly available, scalable, and able to withstand unprecedented increases in load. In this role you will be at the heart of managing production with a scope from the kernel to the application. That means the position requires the flexibility and creativity to take an all-round approach to troubleshooting.

We are building a new infrastructure from the ground up, on top of this you will design and build automation tools for system health, production acceptance tests to validate production changes and will ensure the system is well instrumented and highly fault tolerant. A strong attention to detail will be needed as you will deep dive in certain issues when required.

This is a new project where we are reimagining the way systems are currently developed across our entire company - it is our aim to entirely change the way our business operates. You will be part of a small focused team with the autonomy and flexibility to make bold choices.

We'll trust you to:

  • Ensure optimal availability, latency, scalability and efficiency of Bloomberg application development. You will do this by advocating engineering reliability into our development life cycle with a focus on fault tolerant approaches
  • Respond to and resolve unexpected and potential service problems. You will write software to prevent the same problem happening again
  • Drive capacity planning, performance analysis, instrumentation and other non-functional systems requirements
  • Review and influence on-going design, architecture, standards and methods for improving operating services
  • Own system releases, write production software acceptance tests and coordinate all aspects of the release including coverage and communication plans

You'll need to have:

  • A background as a Software Engineer or development of customer-facing, high-availability, large scale distributed applications
  • In-depth knowledge of Linux/Unix
  • Extensive exposure to working with fault tolerant approaches in a large scale distributed environment and high performance systems
  • Proficient in C, C++ or Python technologies
  • Understanding of a variety of scripting languages

We'd love to see:

  • You know how docker/rkt containers work at scale
  • You have exposure to Kubernetes/Swarm/Mesos/Spark
  • An understanding of how complex systems environments work
  • A deep understanding of internet and networking protocols
  • A passion for performance excellence, robustness and engineering mind-set
  • You have the ability to analyse and troubleshooting large-scale distributed systems

If this sounds like you:

Apply if you think we're a good match! We'll get in touch with you to let you know what the next steps are. In the meantime, check us out at

Bloomberg is an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Similar jobs