Senior Data Engineer - Media Data Services
New York, NY
Posted Sep 2, 2021 - Requisition No. 92262
Bloomberg Media empowers global business leaders with breaking news, expert opinion and proprietary data distributed on every platform, across every time zone reaching over 80 million unique visitors a month through its digital properties. Our products require scalable and performant data services to deliver the best content and experiences to our users.
Media Data Services is responsible for the real-time services which power all news on Bloomberg.com, serving complex publishing workflows and handling 10s of thousands of content queries a minute. We also maintain a data pipeline comprising dozens of ETL jobs that aggregate datasets in our Data Lake, and using a mix of open-source and public cloud technologies we provide a consistent query interface over 100s of terabytes of data. This data empowers teams of analysts and data scientists to improve our customers' experiences through machine learning, A/B testing, and data-driven decision making. Finally we support product managers, marketers, and analysts with data reporting tools and Quorum, our internal customer data platform.
What's in it for you:
As a member of the team you will work with a wide range of stakeholders such as: Product, Editorial, Ad Operations, Marketing, and other engineering teams. You will maintain and expand the features of real-time APIs that leverage Kubernetes and AWS to provide other application teams the data they need in order to deliver engaging experiences to our users. You'll also work closely with analysts and data scientists to provide the data and tooling they need to generate insights that will improve our products. With the help of Google Cloud technologies like BigQuery and Dataproc you'll empower them with the compute platform they need to train machine learning models and generate predictions.
Additionally, as the owner of applications that lie in the critical path for every team in Bloomberg Media, you'll play a central role in empowering other teams. This dynamic will provide you with opportunities to grow your network within the organization, and we'll trust you to leverage that network to maximize the impact of you and your team.
We'll trust you to:
- Manage and expand multiple DR-compliant software architectures, adhering to strict requirements concerning performance, stability, and availability
- Build efficient code to transform raw data into datasets for analysis, reporting and machine learning models
- Develop and maintain data-pipelines flowing terabytes of new data every day
- Develop the domain knowledge and strategic thinking necessary to identify opportunities to grow the impact of our team through ideation, project proposals, and technical discussions
- Develop on real-time systems ingesting 100k+ document updates a day and serving 10s of thousands of queries a minute
You need to have:
- Experience developing scalable, real-time APIs to power consumer-facing products
- Experience using Java/Scala, Python/Pyspark
- Experience with data processing frameworks such as Spark and exposure to orchestration platforms such as Airflow or Kubeflow
- Experience with RDBMS's like MySQL, PostgreSQL, etc
- Experience with NoSQL technologies like Elasticsearch, Solr, Cassandra, HBase, etc
- Experience developing data extraction, transformation, and load (ETL) pipelines
- BA, BS, MS, PhD in Computer Science, Engineering or related technology field
We'd love to see:
- Experience with public cloud platforms like AWS, GCP, and Azure
- Experience supporting data scientist workflows
- Experience integrating with 3rd party APIs
- Experience managing compute instances, databases, etc. to drive data wrangling and delivery
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.