Senior Software Engineer- Serverless Infrastructure for Data Science
New York, NY
Posted Aug 30, 2021 - Requisition No. 93755
Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a platform to transform and analyze the data is critical to our success.
Bloomberg’s Data Science Platform was established to support development efforts around data-driven science, machine learning, and business analytics on Bloomberg's many datasets.
The platform was developed to provide a standard set of tooling for the Model Development Life Cycle (MDLC) which includes tools for the early stages of development and data exploration, to experimentation and large scale training, all the way to live inference. Through access to scalable compute and specialized hardware, Data Science Platform users have access to ML training jobs and Inference Services, Analytics and ETL using Spark, and data exploration with Jupyter. The platform is built on Kubernetes, leveraging containerization, container orchestration and a cloud architecture using 100% open source foundations.
Model Prediction, or Inference, is the last critical step in the MDLC, when business value of model-driven applications can be realized. Our Inference solution is powered by the open source project KFServing, a “no code” serverless solution, ready to use with standard model types.
Delivering performance to latency-sensitive, throughput-heavy, model-driven applications, means making the right choices from hardware to Ingress. As a member of the Data Science Platform’s Infrastructure team with a focus on the Serverless Components, you’ll have the opportunity to work on open source serverless technologies underlying KFServing, such as Knative and Istio, as well as looking at the latest hardware available on the market to service hundreds to thousands of models in a scalable way.
As the founding members of KFServing, we regularly upstream features we develop, present at conferences and collaborate with our peers in the industry, and are in tune with the surrounding Kubernetes community. Open source is at the heart of our team. It's not just something we do in our free time, it is how we work.
We’ll trust you to:
- Interact with Data Science Platform Inference users and understand how their requirements can translate to common features provided by the platform.
- Interact and coordinate with various Bloomberg infrastructure teams to make use of other components when possible.
- Automate operation and improve telemetry of inference platform by integrating with systems for metrics and distributed tracing.
- Look for opportunities for optimizations from hardware to application
- Have a passion for building tools enabling other engineers a way to debug and understand performance of complicated systems
You’ll need to be able to:
- Innovate and design solutions that keep in mind strict production SLA: low latency/high throughput, multi-tenancy, high availability, reliability across clusters/data centers, etc.
- Troubleshoot and optimize model inference performance
- Provide developer and operational documentation
- Provide performance analysis and capacity planning for clusters
- Be organized and multi-task in a fast paced environment
- Have a passion for providing reliable and scalable infrastructure
- Experience designing and implementing low-latency, high-scalability systems
- Experience working in a multi-tenancy and multi-cluster environment
- Experience with distributed systems eg. Kubernetes, Kafka, Zookeeper/Etcd, Spark
- Experience with debugging performance issues with distributed tracing and benchmark tools
- Strong knowledge of data structures and algorithms
- Linux systems experience (Network, OS, Filesystems)
You’ll need to have:
We’d love to see:
- Experience with serverless framework or infrastructure, such as Knative, AWS Lambda, Google CloudRun.
- Experience working with Service Mesh, authentication & authorization.
- Experience with deep learning inference frameworks, such as TorchServe, TFServing, Triton Inference Server, ONNX Runtime.
- Experience working with GPU compute software and hardware
- Ability to identify and perform OS and hardware-level optimizations
- Open source involvement such as a well-curated blog, accepted contribution, or community presence
- Experience with cloud providers such as AWS, GCP or Azure
- Experience with configuration management systems (Chef, Puppet, Ansible, or Salt)
- Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops)
- Machine Learning the Kubernetes Way - https://www.youtube.com/watch?v=ncED2EMcxZ8
If this sounds like you, apply! You can also learn more about our work using the links below:
- Inference with KFServing - https://www.youtube.com/watch?v=saMkA4fIOH8
- ML at Bloomberg - https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9810-machine+learning+%40+bloomberg%3a+building+on+kubernetes
- Scaling Spark on Kubernetes -https://www.youtube.com/watch?v=GbpMOaSlMJ4
Our Open Source Commitment
Bloomberg sits at the intersection of high availability, low latency, and large-scale computing. We have a decade-long track record of using open source software to build data infrastructure and applications that address the unique constraints of the finance industry. We also support a broad open source ecosystem to empower others to solve similar real-world problems. From technical governance to upstream collaboration, we are committed to enhancing the impact and sustainability of open source.
In this role, you’ll be encouraged to interact with global open source project teams and communities. If you have a desire to use, develop, and lead open source software projects, we encourage you to apply. To learn more about our activities in the open source community, head over to our Tech at Bloomberg site.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.