Heads-up

The job you’re trying to post already exists in Torre:

Python Developer: AI Benchmark Task Construction, Review & Validation

Biz-Tech Analytics

Emma

You'll build and validate AI benchmarks, directly shaping the future of AI model capabilities.

Emma highlights

This highlight was written by Emma’s AI. Ask Emma to edit it.

Freelance

Recurrent

USD75.4K - 100K/year

~COP150M - 200M/year

+ Equity

+ Bonuses

Remote (anywhere)

Closed

Shared by

10 days ago

What you'll actually do: review and validate AI benchmark tasks in Python repos – and build new ones.That includes writing tasks with working solutions that are hard enough to make current AI coding agents fail. Run Docker-based test suites, verify oracle solutions, debug flaky tests, and assess task quality for reproducibility and correctness.This is an evaluation and task-design role, not a feature-building role – if you'd rather ship product than construct hard problems and find what's broken, this isn't for you.Must haves3+ years production Pythondeep pytest knowledgeDocker (building images, debugging containers)Linux CLI fluencyability to read large open-source repos quicklyNote: there's more to the role than Python alone – Docker, Linux, debugging depth, task design, and a few bonus areas (security, Kubernetes, async) all factor in.Also open to staffing/recruitment consultants who place contract tech roles – happy to discuss terms.

User since: Jun 2026

About Biz-Tech Analytics:

Biz-Tech Analytics - Your Partner for Building & Applying AI at Scale Whether you’re an enterprise building next-gen AI models, an MLOps platform serving major AI labs, or an industrial business enhancing productivity, Biz-Tech Analytics powers your AI journey from data to deployment. Fueling AI Success with Expert Data Services: We offer specialized, human-in-the-loop annotation, RLHF, and dataset creation, leveraging our network of 500+ vetted developers, STEM professionals, linguists, and medical experts. Recently, our teams evaluated and enhanced complex AI models for top global platforms, delivering precision feedback at scale. Accelerating Operational Intelligence: From manufacturing productivity and automated quality control to AI copilots and agents for healthcare and retail, our applied AI solutions streamline operations, cut costs, and unlock measurable business outcomes. Our solutions have transformed shop floors, QSR customer interactions, and patient engagement workflows. AI Consulting & Advisory: We take a consultative approach—understanding your needs deeply, crafting tailored AI strategies, and rapidly deploying scalable solutions that adapt as your business grows. Industries We Serve: • Manufacturing & Supply Chain (Operational Intelligence & Productivity Solutions) • Healthcare (AI Copilots for Patient Engagement) • Retail & QSR (AI-driven Customer Experience & Automation) • AI Platform Providers & Enterprise AI Labs (Expert Data Annotation, RLHF & Model Evaluation) Why Biz-Tech Analytics? • Deep AI Expertise: End-to-end AI services from data to deployment. • Specialized Workforce: Expert annotators, developers, and domain specialists with proven project experience. • Trusted by Industry Leaders: 100+ successfully completed projects globally across enterprises, scale-ups, and leading AI labs. • Commitment to Responsible AI: Ethical data handling, fairness, transparency, and accountability built into every project.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard ...

Reputation:

Domain created: Sep 2021

biztechanalytics.com/

learn more about Biz-Tech Analytics

To edit the organization's information go to Biz-Tech Analytics's team genome.

Admin access needed

Biz-Tech Analytics has to validate your admin access to edit their information. You won’t be able to edit until it is approved.

Team members:

Samarth Srivastava

Need assistance?

CHAT WITH US

Job admins:

Samarth Srivastava

Need assistance?

CHAT WITH US

Are you sure?

will no longer have admin privileges on Python Developer: AI Benchmark Task Construction, Review & Validation .

Optionally, you can add more information later (benefits, pre-screening questions, etc.)

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.

Heads-up

Python Developer: AI Benchmark Task Construction, Review & Validation

Emma

Requirements and responsibilities

Skills wanted:

Language(s) required:

Samarth Srivastava

Samarth Srivastava

About Biz-Tech Analytics:

biztechanalytics.com/

Admin access needed

Payment confirmed

A member of the Torre team will contact you shortly