Senior Software Engineer/SRE - Communication Channels
New York, NY
Posted Apr 20, 2021 - Requisition No. 82275
The Communication Channels team builds products used by the Bloomberg community for real-time communication, such as exchanging price quotes, trade ideas, news, and other financial information. Our email (MSG) and instant message (IB) products deliver more than 2 billion messages across millions of chat rooms per day. We have a broad user-base unlike any other company, including asset managers, brokers, traders, financial analysts, and desks across all asset classes. Our users rely on these products’ real-time performance, massive scale, ironclad security, tight integration with financial data & applications on the Bloomberg Terminal, and most importantly their singular access to the Bloomberg network of 350,000 financial professionals.
Given the criticality of our products to the daily workflow of the financial community, and the scale at which they are used, the Reliability Engineering team is one of the most visible teams across Bloomberg. Our products are continuously evolving and have experienced more than 100% growth in usage over the last year, which means we have very high standards for reliability, stability, and scalability - our goal is to ensure that IB and MSG are up 24/7.
That’s where you come in!
- As a member of the team, you'll be trusted to ensure that our production systems are well-monitored, healthy, automated, and designed to scale.
- We'll depend on you to improve resiliency of our infrastructure through stress tests and chaos engineering, confirming the effectiveness of failover systems and auto-recovery.
- You’ll be building and standardizing observability tools for MSG and IB systems for engineering as well as business partners.
- We’ll trust you to define standards for testing, monitoring, logging, alarming, and provisioning across 90+ developers, and build tools to automate our release processes.
- You’ll be involved from design to deployment, to ensure our infrastructure is reliable, performant and scalable.
- As a member of the team, you’ll help build and standardize our performance and capacity planning environment, to allow us to easily answer questions around the health and capacity of our system as we continue adding features and users.
What’s in it for you?
- A critical part of our mission is fostering a culture of system reliability across Engineering teams in CC - you’ll be able to make a significant impact on the design choices and decisions that go into developing MSG and IB infrastructure.
- This is an opportunity to forge your own path and drive the engineering culture forward. Making our infrastructure best–in-class will be your main mission, so you’ll have many opportunities to create and implement your own improvements.
- We’ll send you to professional conferences and meetups to keep up with the technology space outside Bloomberg, and apply that knowledge to building and improving our processes and products.
Our projects include:
- Building downstream and upstream caller reports to quickly identify bottlenecks and dependencies of our system using Apache Spark and distributed tracing infrastructure
- Creating black-box health testing frameworks to monitor the health of IB and MSG
- Building a comprehensive performance testing framework that will be utilized by all teams in Communication Channels for stress-testing and capacity measurement of key pieces of infrastructure
- Building tools to track the availability and uptime of our products
- Establishing procedures around scalability, failover, Service Level Objectives (SLOs), cluster provisioning, deployment strategies, etc. with the goal of improving the robustness of our infrastructure
Establishing standards and building dashboards, libraries and tools for metric collection, visualization and alarming
You’ll need to have:
- 3+ years of professional work experience in a software engineer or SRE role
- Proven experience with at least one object oriented language with preference towards C++, Python or Java
- Demonstrated experience with design and implementation of large scale distributed systems
- Experience with one or more of: system design, production monitoring, capacity management, deployment and rollback, provisioning, configuration and orchestration
- Strong communication skills
- BA, BS, MS, PhD in Computer Science, Engineering or related technology field
We’d love to see:
- Experience creating and implementing new processes and workflows related to SDLC
- Exposure to observability tools such as Graphite, Splunk, Humio and Distributed Tracing
- Exposure to containers and orchestration frameworks
- A track record of open-source contributions
Bloomberg is an equal opportunity employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.