SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms at SD Solutions | Torre
SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms
Report
warning

Heads-up

The job you’re trying to post already exists in Torre:

SafetyTech Client #1 | Adversarial Task Writer for AI Security RL Gyms

You'll fortify AI safety by designing adversarial prompt injection scenarios for frontier models.
Emma highlights
This highlight was written by Emma’s AI. Ask Emma to edit it.
Full-time
Provide your expected compensation while applying
location_on
Remote (for Serbia residents)
Remote (for Armenia residents)
Remote (for Bulgaria residents)
Match
skeleton-gauges
You have opted out of job matches in .
To undo this, go to the 'Skills and Interests' section of your preferences.
Review preferences
Shared by
Emma of Torre.ai
4 days ago

Requirements and responsibilities


On behalf of SafetyTech Client #1, SD Solutions is looking for a talented Adversarial Task Writer for AI Security RL Gyms.SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.Responsibilities:You design prompt injection scenarios in YAML, run them against frontier models, validate success rates, and submit passing tasks. 5 high-quality tasks per week (full-time equivalent). Per-task compensation, paid on acceptance.Requirements:Adversarial mindset: you think like an attacker and understand how to exploit an AI agent’s helpfulness, authority assumptions, or trust in its environmentPrompt injection expertise: direct (role-play, encoding, context flooding) and indirect/environment-embedded (poisoned tool responses, malicious content in documents, cross-context leakage)Technical writing in YAMLComfortable with Docker, CLI tools, and running systematic tests against multiple modelsDomain realism in at least one vertical: e-commerce, finance, HR, enterprise SaaS, healthcare, travelBackground in pentesting, appsec, LLM security research, or red teaming strongly preferredThe TaskYou build adversarial prompt injection tasks for Alice’s RL Gym platform. Each task is a self-contained YAML scenario simulating a realistic AI agent deployment, testing whether the agent can be manipulated into violating its safety policies.About the company:A company building specialized evaluation infrastructure for AI safety and robustness testing. Their platform simulates adversarial conditions used by AI development teams to validate agent behavior before deployment. Currently expanding a freelance contributor pool for scenario and environment development.
Optionally, you can add more information later (benefits, pre-screening questions, etc.)
check_circle

Payment confirmed

A member of the Torre team will contact you shortly

In the meantime, continue adding information to your job opening.