Systems Reliability Engineer

Careers at Bloomberg

New York

Posted May 5, 2016 - Requisition No. 49669

As a Systems Reliability Engineer (SRE), you will work on our most critical services. You will ensure Bloomberg's financial product, the terminal, is fast, highly available, scalable and able to withstand unprecedented increases in load. You will be at the heart of solving production problems with a scope from the kernel to the application. Flexibility and creativity are essential for taking a holistic approach to troubleshooting. A strong attention to detail is also needed as you will dig deeper into certain issues when required.

The team is located in multiple locations and works with the various application development teams but reports directly to the SRE organization for oversight, strategic direction, training and career development. You will build automation tools for system health and production acceptance tests to validate production changes to ensure the system is well-instrumented and highly fault tolerant.

We'll trust you to:

  • Ensure optimal availability, latency, scalability and efficiency of Bloomberg application development
  • Respond to and resolve unexpected and potential service problems
  • Drive capacity planning, performance analysis, instrumentation and other non-functional systems requirements
  • Review and influence ongoing design, architecture, standards and methodology for improving operating services
  • Write production software acceptance tests and own systems releases including coordination of coverage and communication plans

You'll need to have:

  • A Bachelor's degree in Computer Science or equivalent experience
  • Experience developing customer-facing, high availability, large-scale distributed applications
  • In-depth knowledge of Linux/UNIX
  • Exposure to C/C++ or Java technologies
  • An understanding of a variety of scripting languages

We'd love to see:

  • Extensive experience working with fault-tolerant approaches in a large-scale distributed environment and high performance systems
  • Familiarity with complex systems environments
  • A solid understanding of Internet and networking protocols
  • A passion for performance excellence and robustness
  • The ability to analyze and troubleshoot large-scale distributed systems
  • The ability to handle periodic on-call duty as well as out-of-band requests
Similar jobs