Senior Software Engineer/SRE - Ticker Plant
New York, NY
Posted Aug 23, 2022 - Requisition No. 105942
Who We Are
What is Ticker Plant?
Bloomberg is the global leader in business and financial data. Providing real-time and historical market data to our customers – reliably, accurately, and quickly – is at the heart of what we do, and the Ticker Plant system is the core that makes it happen. Our system processes hundreds of billions of unique market events each day. We ingest and process events from hundreds of exchanges and thousands of other financial institutions, 24 hours a day, around the world, on millions of financial instruments across all asset classes, including stocks, bonds, commodities, currencies, and crypto. We disseminate corresponding updates to our customers in real-time, after the events have been normalized and enriched by our systems. In addition, we respond to billions of requests for current snapshot and historical data every day, retrieved from our petabytes of recorded market history, to which we add terabytes of new data every day.
What is Ticker Plant SRE?
The SRE team is central to Ticker Plant's success. We are engineers whose expertise centers on the emergent properties of a massively distributed, large-scale, real-time market data system. Our mission aligns with our customers' expectations, and we focus on the characteristics of the system they care about, namely,
- Correctness - the data a customer sees should accurately reflect the marketplace
- Performance - real-time latencies should be minimized; requests should be served without delay
- Availability - System components will fail; in a sufficiently large system, parts of it fail all the time. But the system as a whole should not fail. Deployments and upgrades should not affect availability
At the scale at which we operate, we cannot achieve these goals without sophisticated monitoring, proactive management, and automated response mechanisms. Thus, we concern ourselves with latency analysis, capacity management, cluster organization, deployment and configuration, fault tolerance, and telemetry. We work across the globe, with SRE engineers located in New York, London, and Tokyo. The regions have localized expertise in specific software and systems engineering domains, as well as broad knowledge of the entire Ticker Plant system. In addition to developing software, we also advise our partner component teams on the development of resilient software and we analyze and correct system failures as they happen.
What's in It for You?
This is a rare opportunity to work on some of Bloomberg's core technology. The scale of the Ticker Plant system, the demand for reliability, and its real-time nature all combine to create unique challenges that will be yours to solve. You will have the opportunity to work with, and therefore learn about, many different parts of the system. There are always people available in person and online, eager to help with problems, brainstorm ideas, or just catch up. You will be able to improve your software engineering skills through extensive review, support, and training. We like to continuously improve our software development practices and hope you do too. Among other things, you will have the opportunity to
- Design and develop predictive data models for our system capacity
- Build systems capable of early detection of issues through metrics and signals, and develop automated correction and remediation strategies
- Help set standards and partner closely with other engineers to ensure that all products meet those standards
- Develop Python/C++ services, libraries and tools that implement our designs
- Proactively scale our services to stay ahead of ever-increasing market data demands by driving capacity planning, instrumentation and performance analysis
- Ensure service issues do not reoccur by architecting automation and remediation strategies employing signal detection and orchestration frameworks
- Define service level objectives and drive measurable service improvement
- Review and influence on-going design, architecture, standards and methods for improving operating services
- Manage entire projects, including meeting with stakeholders, and figure how to execute on plans
- Join internal guilds to meet and collaborate with others who share your interests
- Share your accomplishments at internal forums and speak at industry conferences (e.g., SRECon)
We'll Trust You To
- Code – to read code, to debug code, and to write production-quality code. We provide software that defines the environment in which the component software runs. We build dashboards that are used to analyze current load and latency and plan for the impact of anticipated changes, but we also build the toolkits that allow for the software to be instrumented in the first place. Our platform technology provides self-service for many routine operations, and supports our Incident Response responsibilities.
- Design – We write code that integrates with components across the entire system. Often this work is done in collaboration with component teams. Our work with the component owners involves assessing workflows and designing appropriate interfaces that provide consistent access to the necessary functionality, and then building the applications that can perform many workflows via self-service or even automatically. Also, as components and product features move to production, we ensure that the features that are being introduced are supportable in production, without increasing operational load. Sometimes, this means advising the component team of existing solutions; in other cases, it means recognizing gaps and filling those accordingly. This results in improving the designs of our features and our system.
- Analyze – SRE is concerned with the behavior of the system. Often, we are asked to consider the impact of required or desired changes to the system before those changes make it to production. At other times, the system is simply not behaving as we expect, with no immediate obvious cause. We are the ones who are often asked to figure out what is going on.
- Be able to do research and present findings
- Propose solutions and be able to handle feedback, even rejection – it's how we learn
- Collaborate with others
- Teach us something we don't already know
- Push us to get better
You'll Need To Have
- A demonstrated ability to write code that is logical, modular, clear and maintainable. We primarily use C++ and Python, but an ability to present working solutions in any high-level programming language is sufficient to get started. We want to be clear here - this is a development position. Here are some quotes from our engineers and managers:
- A demonstrated ability to read and analyze code that you didn't write and that may not be documented
- An understanding of risk as it applies to software systems
- An ability to write technical documentation, including runbooks and design documents
- An ability to communicate clearly regarding issues encountered (e.g., post-mortem analysis) and reasoning for decisions made
- Curiosity and an eagerness to solve problems
- BA, BS, MS, PhD in Computer Science, Engineering or related technology field
"We should be willing to have this person coding alongside of us in our team"
"This person could just as well be a developer in one of our component teams"
"Our phone screens and initial in-house interview rounds are coding questions and conducted by SRE and non-SRE developers alike, same as would be done for our component teams"
What We'd Like to See
The role of Ticker Plant SRE is broad. Therefore, we don't expect expertise across everything. Instead, we are trying to build teams that cover the breadth of what we do. That means that we are happy if a candidate demonstrates depth in only some of the following areas:
- Systems knowledge operating systems - processes, threads, and scheduling, file systems, memory management, performance tuning; knowledge of Linux or other POSIX-based system is especially useful
- SRE domain knowledge cluster management - clusters, deployments, staging, configuration management, A/B testing, machine lifecycle (startup/shutdown)
networking - not hardware, but protocols and network stacks; TCP, UDP, multicast, implications of wide-area networking, global topologies
databases - schema design, data replication concerns, storage concerns
at-scale machine management and capacity management
monitoring - assessing system health and performance, understanding SLIs and SLOs, alerting mechanisms, what works and what does not
distributed systems - heterogeneity, fault tolerance, network and node failure, local inconsistencies (delays in convergence of shared state)
We would expect a senior SRE candidate to be proficient in at least one of these. However, we also recognize that people have widely different backgrounds. We still welcome strong developers who bring a perspective and curiosity to the challenges of SRE, with an interesting in developing such expertise, recognizing that such individuals are capable of elevating us all.
If this sounds like something you would be passionate about, apply!
Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or maternity/parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law.
Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email firstname.lastname@example.org.