Senior Data Classification Engineer - Business Intelligence
New York, NY
Posted Mar 10, 2022 - Requisition No. 100908
Note: This position can sit out of either NYC or Princeton NJ.
Bloomberg runs on data. As the Data Management team within Engineering, we support our organization’s needs in managing data efficiently. One of the critical focus areas for the team is to discover and classify different types of data across the organization. The team plans to achieve this by building innovative products and providing consistent framework and processes to help secure and protect sensitive data. This team also drives data quality, data governance and data analytics partnering with multiple business teams and helping them make improved decisions.
We’ll trust you to:
- Lead end-to-end data classification technical implementation utilizing open-source technology stack and vendor-based solutions like Informatica EDC (Enterprise Data Catalog), Big ID, Microsoft AIP, Alation etc.
- Collaborate with Business (Risk, Privacy, Legal, CTO etc.) and ENG teams to gather requirements around classification, design data solutions, develop prototypes, gain alignment and productionalize the process.
- Implement data classification rules for metadata and data using regex and reference data.
- Optimize classification process and rules to consistently improve the classification accuracy.
- Develop and automate resource configuration, test connectivity, job execution and monitoring within Informatica EDC/Big ID/Alation.
- Integrate Informatica EDC with Informatica Data Quality and other solutions.
- Support production issues, perform root cause analysis and exception handling.
- Automate end-to-end process using Python libraries, Rest API services etc.
- Design, develop and test data classification webservices using Swagger.
- Develop programs leveraging NLP/ML models (like spaCy) to detect data classification, identify and fix false positive and false negatives.
- Identify, document, and escalate critical findings impacting the capability of the data classification tools including performance improvements.
- Work with the vendors to resolve any issues and apply any emergency patches/fixes.
- Integrate data classification solutions with LDAP and manage user security and role accounts.
Need to have:
- 5+ years of experience using open-source technology stack and vendor-based solutions like Informatica EDC (Enterprise Data Catalog), Big ID, Microsoft AIP, Alation etc.
- 3+ years of Python programming to design, develop API services.
- At least one end-to-end data classification technical implementation.
- Data classification application administration and troubleshooting.
- Experience working with semi-structured and unstructured data at Petabyte scale.
- Advanced SQL capabilities. Good understanding of data modeling, database design, and ETL processing is required.
- Experience with software engineering best practices (e.g. unit testing, code reviews, design documentation).
- Experience programming in a Linux/UNIX environment.
- Bachelor’s degree in Computer Science or equivalent experience.
Nice to have:
- Integration with Informatica EDC, DPM, Axon and Data Quality.
- Knowledge of reporting tools such as QlikSense, Tableau.
- Experience with contributing to open-source software and/or metadata systems is a plus.
- Knowledge of data privacy regulations like GDPR, CCPA etc.
- Master's degree in Computer Science.
Bloomberg is an equal opportunities employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.