Senior Software Engineer - BQuant Data Science Runtimes | New York, NY

Similar jobs

Senior Software Engineer - BQuant Data Science Runtimes

New York, NY

Posted Feb 12, 2021 - Requisition No. 88824

Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a platform to transform and analyze the data is critical to our success.

Bloomberg’s BQuant platform has enabled users to develop sophisticated financial applications on-top of Bloomberg’s data and services. Customers are able to programmatically access Bloomberg’s data; build and analyze factors; screen securities for investable ideas; backtest custom trading strategies; and much much more, all through BQuant’s unique portal.

The BQuant platform has further evolved to also support data-driven science, machine learning, and business analytics in a cloud-native way. Customers are enabled to integrate data science and distributed analytics into their quantitative workflows. To support this, the platform works to provide scalable compute, specialized hardware, and first-class support for a variety of workloads such as Spark, Tensorflow, and PyTorch. The platform was developed to provide a standard set of tooling for addressing the Model Development Life Cycle from experimentation and training to inference. The platform is built leveraging containerization, container orchestration, and cloud architecture and built on top of 100% open source foundations.

The platform is poised for enormous user growth this year and has an ambitious roadmap in terms of new features as well as improved user experience. That’s where you come in. As a member of the multidisciplinary BQuant Runtimes team, you’ll have the opportunity to make key technical decisions to keep this platform moving forward.

Our team makes extensive use of open-source (e.g. Kubernetes, Tensorflow, Spark, and Jupyter) and is deeply involved in a number of communities. As part of that, we regularly upstream features we develop, present at conferences, and collaborate with our peers in the industry. We are contributors to the Kubeflow project as well as founding members of the KFServing subproject to standardize ML Inference within the Kubernetes ecosystem. For Spark, we have implemented a scalable and resilient external shuffle service for dynamic resource allocation, a pluggable interface for secure worker creation, and a token renewal service that handles privacy and security across jobs, all in line with our effort to improve security and elasticity for Spark on Kubernetes. Open source is at the heart of our team. It's not just something we do in our free time, it is how we work.

We’ll trust you to:

Interact with quantitative and data scientists to understand their workflows and requirements to inform the next set of features for the platform
Design solutions for problems such as elastic load distribution, GPU sharing, and guaranteed scheduling
Automate operation and improve telemetry of data science platform components in our infrastructure stack

You’ll need to be able to:

Troubleshoot and debug run-time issues
Provide developer and operational documentation
Provide performance analysis and capacity planning for clusters
Be organized and multi-task in a fast-paced environment
Have a passion for providing reliable and scalable infrastructure

You’ll need to have:

Experience with distributed systems eg. Kubernetes, Spark, MPI, TF, PyTorch, Kafka
Proficiency in two or more languages (Python, Go, C++, Java, Scala, or JavaScript) and willingness to learn more as needed
Linux systems experience (Network, OS, Filesystems)

We’d love to see:

Experience building and scaling Docker-based systems using Kubernetes, Swarm, Rancher, Mesos
Experience working with authentication & authorization systems such as Kerberos and LDAP
Ability to identify and perform OS and hardware-level optimizations
Open source involvement such as a well-curated blog, accepted contribution, or community presence
Experience with cloud providers such as AWS, GCP, or Azure
Experience with configuration management systems (Chef, Puppet, Ansible, or Salt)
Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops)
Experience working with GPU compute software and hardware

If this sounds like you, apply! You can also learn more about our work using the links below:

Machine Learning the Kubernetes Way - https://www.youtube.com/watch?v=ncED2EMcxZ8
Inference with KFServing - https://www.youtube.com/watch?v=saMkA4fIOH8
ML at Bloomberg - https://on-demand-gtc.gputechconf.com/gtcnew/sessionview.php?sessionName=s9810-machine+learning+%40+bloomberg%3a+building+on+kubernetes
Introducing KFServing - https://www.youtube.com/watch?v=saMkA4fIOH8
Scaling Spark on Kubernetes -https://www.youtube.com/watch?v=GbpMOaSlMJ4
Serverless Inferencing on Kubernetes - https://arxiv.org/pdf/2007.07366.pdf
Serverless ML Inference https://www.youtube.com/watch?v=HlKOOgY5OyA
Kubeflow for Machine Learning: https://learning.oreilly.com/library/view/kubeflow-for-machine/9781492050117/