What’s in it for you:
- Be part of a global enterprise and build AI solutions at scale.
- Work alongside a highly skilled and technically strong team.
- Contribute to solving high-complexity, high-impact challenges in data transformation and machine learning.
Responsibilities:
- Build production ready data acquisition and transformation pipelines from ideation to deployment.
- Being a hands-on problem solver and developer helping to extend and manage the data platforms.
- Apply best practices in data modeling and building ETL pipelines (streaming and batch) using cloud-native solutions
- Model development: Design, develop, and evaluate state-of-the-art machine learning models for information extraction, leveraging techniques from NLP, computer vision (if applicable), and other relevant domains.
- Data preprocessing and feature engineering: Develop robust pipelines for data cleaning, preprocessing, and feature engineering to prepare data for model training.
- Model training and evaluation: Train, tune, and evaluate machine learning models, ensuring high accuracy, efficiency, and scalability.
- Deployment and monitoring: Deploy and maintain machine learning models in a production environment, monitoring their performance and ensuring their reliability.
- Research and innovation: Stay up-to-date with the latest advancements in machine learning and NLP, and explore new techniques and technologies to improve the extraction process.
- Collaboration: Work closely with product managers, data scientists, and other engineers to understand project requirements and deliver effective solutions.
- Code quality and best practices: Ensure high code quality and adherence to best practices for software development.
- Communication: Effectively communicate technical concepts and project updates to both technical and non-technical audiences.
What We’re Looking For:
- 6-10 years of professional software work experience, with a strong focus on Machine Learning, Natural Language Processing (NLP) for information extraction and MLOps
- Expertise in Python and related NLP libraries (e.g., spaCy, NLTK, Transformers, Hugging Face)
- Experience with Apache Spark or other distributed computing frameworks for large-scale data processing.
- AWS/GCP Cloud expertise, particularly in deploying and scaling ML pipelines for NLP tasks.
- Solid understanding of the Machine Learning model lifecycle, including data preprocessing, feature engineering, model training, evaluation, deployment, and monitoring, specifically for information extraction models .
- Experience with CI/CD pipelines for ML models, including automated testing and deployment.
- Docker & Kubernetes experience for containerization and orchestration.
- OOP Design patterns, Test-Driven Development and Enterprise System design
- SQL (any variant, bonus if this is a big data variant)
- Linux OS (e.g. bash toolset and other utilities)
- Version control system experience with Git, GitHub, or Azure DevOps.
- Excellent Problem-solving, Code Review and Debugging skills
- Software craftsmanship, adherence to Agile principles and taking pride in writing good code
- Techniques to communicate change to non-technical people
Nice to have
- Core Java 17+, preferably Java 21+, and associated toolchain
- Apache Avro
- Apache Kafka
Other JVM based languages - e.g. Kotlin, Scala
I find that the harder I work, the more luck I seem to have.
“Thomas Jefferson”