Software Engineer/SRE Team Leader - Application Middleware Team | New York, NY

Similar jobs

Software Engineer/SRE Team Leader - Application Middleware Team

New York, NY

Posted Feb 13, 2018 - Requisition No. 65230

Our Team:

It is not every day that you get the opportunity to lead the software engineering team that is responsible for the reliability and production quality of some of the most critical application frameworks at Bloomberg.

Our group builds middleware - the software infrastructure designed to create large-scale, fault-tolerant applications that run on thousands of machines throughout the world. This infrastructure includes a handful of systems such as BAS (Bloomberg Application Services, a rich framework for micro-services), MBUS (Message Bus, a publish/subscribe system using multicast) and BMQ (Bloomberg Message Queues, a high-performance MQ system). Our end users are software engineers, who have different needs from each other, and we’re trusted to make architectural decisions that will scale across a wide range of use cases.

With thousands of clients depending on our infrastructure solutions, we are looking for a Team Lead for our System Reliability Engineering team. That’s where you come in.

What's in it for you:

Our application frameworks run on tens of thousands of machines and are used every day by over 5,000 engineers, so your work will have an impact across the entire organization. You’ll be trusted to define the processes that will make the system as reliable, high volume, high performance, high throughput, low latency and scalable as possible, with self-healing characteristics. On any given day, you'll make decisions that impact some of the most critical systems at Bloomberg.

You will be a key member of a development team that our clients rely on, leading a highly technical team and influencing the products' technical direction. The job is very hands-on, and all team members spend the majority of their time writing code.

We'll trust you to:

Inspire and motivate a high performing team to achieve great results, while supporting individual growth and development
Establish best practices that result in the highest quality in our products and service
Review and influence the design and standards of our software
Respond to and resolve unexpected service problems. Your team will write software to prevent the same problem happening again
Manage system releases, write production software acceptance tests and coordinate all aspects of the release including coverage and communication plans
Create dashboards and instrument the code to capture and publish essential metrics, and use this data to define alerts
Build data analysis tools to keep track of important service level indicators, predict future capacity needs, audit application configurations
Automate everything from deployment and configuration management to mitigation of outages, all aspects end-to-end

You need to have:

Demonstrated experience leading a team of software engineers
An ability to cultivate a collaborative environment through driving a strong culture of teamwork and taking advantage of team diversity
Strong programming ability
3+ years experience with C++ and Python (or other scripting languages)
A solid understanding of data structures, algorithms, complexity analysis
Experience in all phases of the software development lifecycle
Good knowledge of Linux/Unix
Excellent problem solving skills
Ability to handle periodic on-call duty as well as urgent requests
Excellent stakeholder relationship management
The ability to effectively listen, communicate, challenge and influence team members, peers and senior managers
The desire to take ownership and responsibility of issues and handle effectively through to resolution
BA, BS, MS, PhD in Computer Science, Engineering or related technology field

We'd love to see:

Extensive exposure to working with fault tolerant approaches in a large scale distributed environment and high performance systems
Good understanding of internet and networking protocols
Experience with Git, CMake, Jenkins, DPKG, RPM, Docker, Chef, Terraform, OpenStack, VMware, Grafana, time-series databases

Check out more about how we work and what it means to be an SRE at Bloomberg in our blog post: https://www.techatbloomberg.com/blog/bloomberg-bets-big-on-sres/