Systems Reliability Engineer

Similar jobs

London

Posted Mar 22, 2016 - Requisition No. 49198

As an SRE (or "Systems Reliability Engineer ") at Bloomberg, you will work on our biggest critical services. Your mission will be to ensure Bloomberg is fast, highly available, scalable, and able to withstand unprecedented increases in load. In this role you will be at the heart of solving production with a scope from the kernel to the application. That means the position requires the flexibility and creativity to take an all-round approach to troubleshooting. A strong attention to detail will be needed as you will deep dive in certain issues when required.

The team is co-located with the various application development teams and will report directly to the Systems Reliability Engineering organisation for oversight, strategic direction, training and career development. You will build automation tools for system health, production acceptance tests to validate production changes and will ensure the system is well instrumented and highly fault tolerant.

We'll trust you to:

Ensure optimal availability, latency, scalability and efficiency of Bloomberg application development. You will do this by advocating enginering reliability into our development life cycle with a focus on fault tolerant approaches
Respond to and resolve unexpected and potential service problems. You will write software to prevent the same problem happening again
Drive capacity planning, performance analysis, instrumentation and other non-functional systems requirements
Review and influence on-going design, architecture, standards and methods for improving operating services
Own system releases, write production software acceptance tests and coordinate all aspects of the release including coverage and communication plans

You'll need to have:

Bachelor's degree in Computer Science or equivalent experience
Experience as a Software Engineer or Development of customer-facing, high-availability, large scale distributed applications
In-depth knowledge of Linux/Unix
Exposure to in C or C++, Java technologies
Understanding of a variety of scripting languages

We'd love to see:

Extensive exposure to working with fault tolerant approaches in a large scale distributed environment and high performance systems
Understanding of how complex systems environments work
Deep understanding of internet and networking protocols
A passion for performance excellence, robustness and engineering mindset
Ability to analyse and troubleshooting large-scale distributed systems
Ability to handle periodic on-call duty as well as out-of-band requests

If this sounds like you:

Apply if you think we're a good match! We'll get in touch with you to let you know what the next steps are. In the meantime, check us out at http://www.techatbloomberg.com/

Bloomberg is an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.