Senior Data Engineer (Python/PySpark/Kafka) - Full Remote Portugal at HumanIT Digital Consulting | Torre
Senior Data Engineer (Python/PySpark/Kafka) - Full Remote Portugal
Report

Senior Data Engineer (Python/PySpark/Kafka) - Full Remote Portugal

You'll architect AI-ready data infrastructure, directly improving patient outcomes in digital healthcare.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time

Legal agreement: Employment

Provide your expected compensation while applying
location_on
Remote (for Portugal residents)
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
about 1 month ago

Requirements and responsibilities


About the opportunityJoin a digital healthcare company revolutionizing physical therapy through AI and wearable technology. As a Senior Data Engineer, you’ll architect the lakehouse infrastructure that powers virtual physical therapy platforms helping patients recover from musculoskeletal conditions through personalized, remotely-guided exercise programs. This full remote position from Portugal offers the chance to build mission-critical data systems that directly improve patient outcomes, reduce pain, and lower healthcare costs. You’ll spearhead the migration to Apache Iceberg format, establish robust data pipelines, and create AI-ready data infrastructure that powers machine learning models across the platform — while working with cutting-edge technologies in a healthcare environment where data quality and governance are paramount.Project & contextYou’ll lead the migration of existing workloads to the Iceberg format, establishing and maturing the foundational lakehouse architecture that will serve as the backbone for data-driven decision making. Your responsibilities include architecting and building robust batch and streaming data pipelines using Spark and Flink, collaborating closely with Backend Engineering teams on API integrations and formal data contract establishment, and contributing to a unified lineage and governance framework using DataHub. You’ll provide comprehensive support to the Core Team in adopting new data platform capabilities, ensuring solutions are platform-oriented and designed for broad organizational use. Building AI-ready data infrastructure is central — you’ll ensure clean, governed, and accessible data pipelines that power machine learning models and AI-driven products across the platform, while leveraging AI coding assistants and LLMs to accelerate development and improve code quality.What we’re looking for (Required)Demonstrated proficiency with Python and PySpark for data processing at scaleHands-on experience with data lake formats: Iceberg, Delta Lake, or HudiSolid understanding of Kafka and event-driven architecturesProven experience building and orchestrating data pipelines at scaleStrong SQL proficiency with comprehensive data modeling knowledgeFamiliarity with workflow orchestration tools: Airflow, Dagster, or similarPlatform-oriented mindset: developing solutions for broad organizational use, not individual purposesOwnership mentality: committed to seeing problems through to resolutionClear communication skills: ability to articulate complex technical concepts to non-technical stakeholdersHighly collaborative: excels working alongside backend engineers, data engineers, and analystsPragmatic approach: balancing ideal solutions with practical delivery timelinesExperience building and maintaining AI-ready data infrastructureAbility to leverage AI coding assistants and LLMs to accelerate developmentEnglish proficiency at B2 Upper Intermediate level minimumNice to have (Preferred)Demonstrated expertise with Flink or comparable streaming frameworksProficiency in DBT and familiarity with the modern data stackExperience with modern data platforms: BigQuery, Trino, Snowflake, or DatabricksProven background developing self-service data platformsExperience working in regulated healthcare or compliance-sensitive environmentsKnowledge of data governance frameworks and metadata managementUnderstanding of healthcare data standards (HL7, FHIR)Familiarity with DataHub or similar data catalog/lineage toolsExperience with infrastructure-as-code and CI/CD for data pipelinesWork modelFull Remote — must be based in Portugal.Experience levelSenior
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.