ABOUT THE OPPORTUNITYWe are partnering with an innovative international technology company focused on building scalable AI-native platforms that support high-performance digital products used worldwide. This is an opportunity for a senior engineering leader who wants to shape the future of enterprise AI infrastructure, autonomous agent workflows, and cloud-native distributed systems.As a Lead AI Platform Engineer, you will work closely with executive leadership and engineering teams to define and implement the architecture behind advanced AI solutions. The role combines hands-on technical leadership with platform strategy, reliability engineering, and internal developer experience improvements. You will play a critical role in enabling secure, scalable, and production-ready AI ecosystems while mentoring engineering teams and driving technical excellence across the organization.This is a fully remote position based in Portugal, with occasional national and international travel requirements estimated at 0%–15%.PROJECT & CONTEXTThe project focuses on designing and evolving a modern AI platform ecosystem powered by AWS cloud technologies and agent-based architectures. The engineering environment is highly collaborative, fast-paced, and centered around cloud automation, observability, and AI workload orchestration.You will lead initiatives involving AWS Bedrock AgentCore, AWS Step Functions, multi-tenant AI infrastructure, vector databases, and automated CI/CD pipelines for AI workloads. The platform supports intelligent agent execution, evaluation pipelines, retrieval-augmented generation (RAG), and internal developer platforms (IDP).The stack includes AWS Networking and IAM, Terraform v1.x, GitHub Actions, Kubernetes, Docker, Python 3.x, Bash scripting, JavaScript/TypeScript, OpenSearch, Pinecone, Milvus, Datadog, CloudWatch, LangSmith, Vault, Artifactory, Backstage, and policy enforcement frameworks such as OPA and Cedar.English is required for daily communication with international stakeholders and distributed engineering teams.WHAT WE'RE LOOKING FOR (Required)Strong experience designing and operating cloud-native platforms on AWSProven expertise with AWS Bedrock AgentCore and AWS Step Functions in production environmentsExperience building custom Agent/Tool Gateways and AI orchestration workflowsAdvanced knowledge of Infrastructure as Code using Terraform v1.xHands-on experience with CI/CD automation using GitHub ActionsStrong containerization and orchestration experience with Docker and KubernetesExperience building scalable microservices architecturesSolid understanding of AI observability, monitoring, and tracing tools such as Datadog, CloudWatch, or LangSmithExperience with vector databases including OpenSearch, Pinecone, or MilvusStrong understanding of RAG architectures and AI knowledge retrieval strategiesExperience implementing secure multi-tenant environments and IAM policiesFamiliarity with policy and governance frameworks such as OPA or CedarStrong scripting and automation skills using Python 3.x, Bash, or JavaScript/TypeScriptExperience collaborating with senior stakeholders and translating technical concepts into business valueExcellent communication skills in English (written and spoken)NICE TO HAVE (Preferred)Experience with Internal Developer Platforms (IDP) and developer enablement initiativesFamiliarity with Backstage for platform engineering and developer experience improvementsExperience implementing AI model evaluation frameworks (Evals)Knowledge of non-deterministic AI agent behavior analysis and reliability engineering practicesPrevious experience mentoring engineering teams or acting as a technical leadExposure to enterprise security tooling such as Vault and ArtifactoryExperience working in distributed international teamsPortuguese language skills are considered a plusExperience supporting large-scale AI-native or autonomous agent ecosystems

ABOUT THE OPPORTUNITY

Join a forward-thinking technology company building cutting-edge machine learning solutions that power real-world applications at scale. This position offers the chance to work on production-grade ML systems, taking ownership of the full lifecycle from model development through deployment, monitoring, and continuous optimization. You'll be part of a team that values technical excellence, innovation, and the practical application of machine learning to solve meaningful business challenges.

This is a fully remote role based in Portugal.

You'll work in an environment that emphasizes MLOps best practices, ensuring your models don't just work in notebooks but deliver measurable value in production environments. The role offers excellent growth opportunities as you'll gain exposure to diverse ML use cases, modern deployment architectures, and the latest tools in the ML engineering ecosystem.

PROJECT & CONTEXT

You'll be building and deploying production-ready machine learning models across various applications including recommendation engines, predictive analytics systems, and classification solutions. Your work will span the complete ML lifecycle - from collaborating with Data Engineers on data preparation and feature engineering, through model training and optimization, to building robust MLOps pipelines that ensure your models perform reliably in production.

The role emphasizes the engineering side of machine learning - you'll spend significant time on model deployment, versioning, monitoring for data drift and model decay, automated retraining pipelines, and infrastructure management. You'll work with cloud-native architectures and containerization technologies to ensure your models can scale effectively and maintain high availability. This is an ideal opportunity for ML practitioners who want to bridge the gap between data science experimentation and production engineering.

Core Tech Stack: Python, TensorFlow, PyTorch, Scikit-learn, XGBoost
\nMLOps Toolkit: MLflow, Kubeflow, Docker, Kubernetes
\nCloud Platforms: AWS SageMaker, Azure ML, or Google AI Platform
\nFocus Areas: Model deployment, monitoring, performance optimization, and production reliability

WHAT WE'RE LOOKING FOR (Required)

ML Fundamentals: Strong background in machine learning algorithms, statistics, and model evaluation techniques

Python Expertise: Proficiency in Python for ML development, including experience with data manipulation libraries (pandas, NumPy)

TensorFlow/PyTorch: Deep expertise in at least one major ML framework (TensorFlow or PyTorch) with production experience

Model Development: Proven experience designing, training, and evaluating supervised, unsupervised, and deep learning models

Feature Engineering: Strong skills in feature selection, transformation, and engineering to improve model performance

MLOps Practices: Hands-on experience building ML pipelines for deployment, versioning, and monitoring

Containerization: Working experience with Docker for model packaging and deployment

Cloud Platforms: Practical experience with at least one cloud ML platform (AWS SageMaker, Azure ML, or Google AI Platform)

Data Engineering Basics: Understanding of data pipelines, ETL processes, and working with databases

Model Optimization: Experience with hyperparameter tuning, model performance optimization, and efficiency improvements

Production Mindset: Track record of deploying and maintaining models in production environments with monitoring and alerting

Language: B2+ English level (Upper Intermediate minimum) for technical communication and documentation

Location: Based in Portugal with availability for fully remote work

NICE TO HAVE (Preferred)

Kubernetes Orchestration: Experience deploying ML workloads on Kubernetes with autoscaling and resource management

MLOps Tools: Hands-on experience with Kubeflow, MLflow, or similar MLOps platforms for experiment tracking and model registry

Advanced ML Frameworks: Experience with XGBoost, LightGBM, CatBoost for gradient boosting applications

Multiple Frameworks: Proficiency in both TensorFlow and PyTorch with understanding of their respective strengths

Alternative Languages: Knowledge of R or Scala for specific ML workflows

Model Monitoring: Experience implementing data drift detection, model decay monitoring, and automated retraining triggers

CI/CD for ML: Building automated testing and deployment pipelines specifically for ML models

Distributed Training: Experience with distributed model training across multiple GPUs or nodes

Feature Stores: Familiarity with feature store solutions (Feast, Tecton, AWS Feature Store)

A/B Testing: Experience with experimentation frameworks and statistical methods for model evaluation in production

GPU Optimization: Understanding of GPU acceleration and optimization techniques for deep learning

Streaming ML: Experience with real-time model serving and online learning scenarios

AutoML Tools: Familiarity with automated machine learning platforms and neural architecture search

Model Explainability: Experience with interpretability tools (SHAP, LIME) and model transparency practices

Big Data Tools: Experience with Spark MLlib or Dask for large-scale ML workflows

Certifications (Advantageous):

AWS Certified Machine Learning - Specialty

Google Professional Machine Learning Engineer

Microsoft Certified: Azure AI Engineer Associate

TensorFlow Developer Certificate

Location: Portugal (Fully Remote)

Lead AI Platform Engineer - Remote Portugal

Emma

Requirements and responsibilities

Skills wanted:

Language(s) required:

About HumanIT Digital Consulting:

ABOUT THE OPPORTUNITY

PROJECT & CONTEXT

WHAT WE'RE LOOKING FOR (Required)

NICE TO HAVE (Preferred)

www.humanit.pt/

Admin access needed

Payment confirmed

A member of the Torre team will contact you shortly