Senior Software Engineer/SRE - Telemetry SRE
New York, NY
Posted Jan 6, 2023 - Requisition No. 113196
A System Reliability Engineer (SRE) at Bloomberg is a software engineer who specializes in solving infrastructure and operations problems with software engineering solutions. They are trusted to improve the stability and availability of production environments through telemetry, product features, and automation. They are responsible for building systems that perform monitoring, orchestration, capacity management, deployment, incident management, and SDLC practices. These systems consist of both internally developed software in various programming languages and Open Source software.
We are the SRE team for the Bloomberg Engineering Telemetry organization. In addition to us, the organization consists of multiple application and infrastructure development teams. As an organization, we are the official central source of telemetry data and services used by Bloomberg engineering and operations.
As the Telemetry SRE team, we work with our sister teams to develop the software and solutions to discover, collect, enrich, store, present, and alarm on Telemetry data in the form of log, metric, and trace data from all Bloomberg devices and applications. The insight and observability we provide is crucial for the stability and reliability of the services Bloomberg provides for our customers globally.
What’s in it for you:
As a member of a Telemetry SRE team, you will have direct influence on the stability and resilience of Bloomberg systems. You will get to learn and experiment with new technologies, help drive best practices across engineering teams, work with SMEs across various disciplines, and implement changes to improve the developer experience within your own team. Other opportunities include:
- Join a group of dedicated and motivated systems and software engineers working on the backbone of Bloomberg’s Telemetry system
- Learn what it takes, from the application-level down to the network-level, to maintain highly-reliable, scalable distributed Telemetry ingestion, enrichment, alarming, and visualization
- Manage software and hardware infrastructure that processes billions of data points every day from Bloomberg data centers, client data centers, and public clouds
- Work in a highly autonomous and impact driven environment
- Encouragement to get involved in and attend industry conferences, where you will get to learn from and contribute to communities that care about observability.
We’ll trust you to:
- Understand the current system capacity and load, predict future demand and make appropriate scaling recommendations
- Define standards and best practices with respect to logging, latency, troubleshooting and monitoring
- Work with application teams to review and influence the design of software to improve its reliability
- Facilitate continuous integration / continuous deployment to automate deployment and quality control (including functional and capacity testing)
- Investigate and triage production problems as they occur
- Work with application teams deploying software both internally and to the Cloud to ensure proper observability
- Help to create dashboards, monitoring rules, and alerting rules to track the health of the live system
The technologies you’ll use:
Languages: Python, Ruby, Go, C++
Platforms: Linux
Cloud Providers: Google, Microsoft, Amazon
Infrastructure: Kafka, Kuberenetes, ElasticSearch, ScyllaDB
Telemetry Visualization: Humio, Splunk, Grafana
You’ll need to have:
- 4+ years working with an object-oriented programming language (C/C++, Python, Java, etc.)
- A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience
- A desire to work with high performance, high availability distributed systems
- Curiosity and the ability to dig into systemic software problems, from the application layer, down to the network layer
- Experience with Linux
We’d love to see:
- Familiarity with high-performance, high-availability distributed systems
- Experience building infrastructure and tooling to be used by other Engineering teams
- Experience working with telemetry
- Experience working with Google, Microsoft, and Amazon Cloud providers
- Experience with containerization and orchestration technologies (Docker, Kubernetes)
- Working knowledge of Chef, Prometheus, Grafana, Humio, Splunk, ElasticSearch, Kafka
- Experience with continuous integration and deployment tools (Jenkins)
- Deep understanding of TCP/IP and Unix networking
Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age, ancestry, color, gender identity or expression, genetic predisposition or carrier status, marital status, national or ethnic origin, race, religion or belief, sex, sexual orientation, sexual and other reproductive health decisions, parental or caring status, physical or mental disability, pregnancy or parental leave, protected veteran status, status as a victim of domestic violence, or any other classification protected by applicable law.
Bloomberg is a disability inclusive employer. Please let us know if you require any reasonable adjustments to be made for the recruitment process. If you would prefer to discuss this confidentially, please email amer_recruit@bloomberg.net.
Salary Range: 160,000 - 240,000 USD Annually + Benefits + Bonus