Senior Data Engineer (Python/PySpark/Kafka) - Full Remote Portugal at HumanIT Digital Consulting

About the opportunityJoin a digital healthcare company revolutionizing physical therapy through AI and wearable technology. As a Senior Data Engineer, you’ll architect the lakehouse infrastructure that powers virtual physical therapy platforms helping patients recover from musculoskeletal conditions through personalized, remotely-guided exercise programs. This full remote position from Portugal offers the chance to build mission-critical data systems that directly improve patient outcomes, reduce pain, and lower healthcare costs. You’ll spearhead the migration to Apache Iceberg format, establish robust data pipelines, and create AI-ready data infrastructure that powers machine learning models across the platform — while working with cutting-edge technologies in a healthcare environment where data quality and governance are paramount.Project & contextYou’ll lead the migration of existing workloads to the Iceberg format, establishing and maturing the foundational lakehouse architecture that will serve as the backbone for data-driven decision making. Your responsibilities include architecting and building robust batch and streaming data pipelines using Spark and Flink, collaborating closely with Backend Engineering teams on API integrations and formal data contract establishment, and contributing to a unified lineage and governance framework using DataHub. You’ll provide comprehensive support to the Core Team in adopting new data platform capabilities, ensuring solutions are platform-oriented and designed for broad organizational use. Building AI-ready data infrastructure is central — you’ll ensure clean, governed, and accessible data pipelines that power machine learning models and AI-driven products across the platform, while leveraging AI coding assistants and LLMs to accelerate development and improve code quality.What we’re looking for (Required)Demonstrated proficiency with Python and PySpark for data processing at scaleHands-on experience with data lake formats: Iceberg, Delta Lake, or HudiSolid understanding of Kafka and event-driven architecturesProven experience building and orchestrating data pipelines at scaleStrong SQL proficiency with comprehensive data modeling knowledgeFamiliarity with workflow orchestration tools: Airflow, Dagster, or similarPlatform-oriented mindset: developing solutions for broad organizational use, not individual purposesOwnership mentality: committed to seeing problems through to resolutionClear communication skills: ability to articulate complex technical concepts to non-technical stakeholdersHighly collaborative: excels working alongside backend engineers, data engineers, and analystsPragmatic approach: balancing ideal solutions with practical delivery timelinesExperience building and maintaining AI-ready data infrastructureAbility to leverage AI coding assistants and LLMs to accelerate developmentEnglish proficiency at B2 Upper Intermediate level minimumNice to have (Preferred)Demonstrated expertise with Flink or comparable streaming frameworksProficiency in DBT and familiarity with the modern data stackExperience with modern data platforms: BigQuery, Trino, Snowflake, or DatabricksProven background developing self-service data platformsExperience working in regulated healthcare or compliance-sensitive environmentsKnowledge of data governance frameworks and metadata managementUnderstanding of healthcare data standards (HL7, FHIR)Familiarity with DataHub or similar data catalog/lineage toolsExperience with infrastructure-as-code and CI/CD for data pipelinesWork modelFull Remote — must be based in Portugal.Experience levelSenior

ABOUT THE OPPORTUNITY

Join a forward-thinking technology company building cutting-edge machine learning solutions that power real-world applications at scale. This position offers the chance to work on production-grade ML systems, taking ownership of the full lifecycle from model development through deployment, monitoring, and continuous optimization. You'll be part of a team that values technical excellence, innovation, and the practical application of machine learning to solve meaningful business challenges.

This is a fully remote role based in Portugal.

You'll work in an environment that emphasizes MLOps best practices, ensuring your models don't just work in notebooks but deliver measurable value in production environments. The role offers excellent growth opportunities as you'll gain exposure to diverse ML use cases, modern deployment architectures, and the latest tools in the ML engineering ecosystem.

PROJECT & CONTEXT

You'll be building and deploying production-ready machine learning models across various applications including recommendation engines, predictive analytics systems, and classification solutions. Your work will span the complete ML lifecycle - from collaborating with Data Engineers on data preparation and feature engineering, through model training and optimization, to building robust MLOps pipelines that ensure your models perform reliably in production.

The role emphasizes the engineering side of machine learning - you'll spend significant time on model deployment, versioning, monitoring for data drift and model decay, automated retraining pipelines, and infrastructure management. You'll work with cloud-native architectures and containerization technologies to ensure your models can scale effectively and maintain high availability. This is an ideal opportunity for ML practitioners who want to bridge the gap between data science experimentation and production engineering.

Core Tech Stack: Python, TensorFlow, PyTorch, Scikit-learn, XGBoost
\nMLOps Toolkit: MLflow, Kubeflow, Docker, Kubernetes
\nCloud Platforms: AWS SageMaker, Azure ML, or Google AI Platform
\nFocus Areas: Model deployment, monitoring, performance optimization, and production reliability

WHAT WE'RE LOOKING FOR (Required)

ML Fundamentals: Strong background in machine learning algorithms, statistics, and model evaluation techniques

Python Expertise: Proficiency in Python for ML development, including experience with data manipulation libraries (pandas, NumPy)

TensorFlow/PyTorch: Deep expertise in at least one major ML framework (TensorFlow or PyTorch) with production experience

Model Development: Proven experience designing, training, and evaluating supervised, unsupervised, and deep learning models

Feature Engineering: Strong skills in feature selection, transformation, and engineering to improve model performance

MLOps Practices: Hands-on experience building ML pipelines for deployment, versioning, and monitoring

Containerization: Working experience with Docker for model packaging and deployment

Cloud Platforms: Practical experience with at least one cloud ML platform (AWS SageMaker, Azure ML, or Google AI Platform)

Data Engineering Basics: Understanding of data pipelines, ETL processes, and working with databases

Model Optimization: Experience with hyperparameter tuning, model performance optimization, and efficiency improvements

Production Mindset: Track record of deploying and maintaining models in production environments with monitoring and alerting

Language: B2+ English level (Upper Intermediate minimum) for technical communication and documentation

Location: Based in Portugal with availability for fully remote work

NICE TO HAVE (Preferred)

Kubernetes Orchestration: Experience deploying ML workloads on Kubernetes with autoscaling and resource management

MLOps Tools: Hands-on experience with Kubeflow, MLflow, or similar MLOps platforms for experiment tracking and model registry

Advanced ML Frameworks: Experience with XGBoost, LightGBM, CatBoost for gradient boosting applications

Multiple Frameworks: Proficiency in both TensorFlow and PyTorch with understanding of their respective strengths

Alternative Languages: Knowledge of R or Scala for specific ML workflows

Model Monitoring: Experience implementing data drift detection, model decay monitoring, and automated retraining triggers

CI/CD for ML: Building automated testing and deployment pipelines specifically for ML models

Distributed Training: Experience with distributed model training across multiple GPUs or nodes

Feature Stores: Familiarity with feature store solutions (Feast, Tecton, AWS Feature Store)

A/B Testing: Experience with experimentation frameworks and statistical methods for model evaluation in production

GPU Optimization: Understanding of GPU acceleration and optimization techniques for deep learning

Streaming ML: Experience with real-time model serving and online learning scenarios

AutoML Tools: Familiarity with automated machine learning platforms and neural architecture search

Model Explainability: Experience with interpretability tools (SHAP, LIME) and model transparency practices

Big Data Tools: Experience with Spark MLlib or Dask for large-scale ML workflows

Certifications (Advantageous):

AWS Certified Machine Learning - Specialty

Google Professional Machine Learning Engineer

Microsoft Certified: Azure AI Engineer Associate

TensorFlow Developer Certificate

Location: Portugal (Fully Remote)

Senior Data Engineer (Python/PySpark/Kafka) - Full Remote Portugal

Emma

Requirements and responsibilities

Skills wanted:

Language(s) required:

About HumanIT Digital Consulting:

ABOUT THE OPPORTUNITY

PROJECT & CONTEXT

WHAT WE'RE LOOKING FOR (Required)

NICE TO HAVE (Preferred)

www.humanit.pt/

Admin access needed

Payment confirmed

A member of the Torre team will contact you shortly