Code-Data Eval Author - Machine Learning Engineer (Pilot)

Remote — Americas & Europe Posted Jun 9

$45 to $140/hr

**Code-Data Eval Author, Machine Learning Engineer** (our client · remote contract)

our client partners with frontier AI labs to build the evaluations their models are trained and measured against. You'll design ML/LLM evaluation tasks and rubrics and grade model/agent outputs, your training-side knowledge directly shapes reward and eval signals.

**What you'll do**

Design ML/LLM evaluation tasks, rubrics, and metrics

Grade model/agent outputs and improve eval quality through review

Bring training-side judgment (SFT / RLHF / reward modeling) to eval design

**You are**

~5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evals

Ideally fluent in SFT / RLHF / reward modeling / eval metrics (rare, high-leverage here)

PyTorch/JAX, Hugging Face, experiment tracking; clear written communication

**Engagement & pay**

Remote contract, flexible 30+ hrs/week

Hourly rate set to your local market (e.g., US/Canada $100, 140/hr; Europe and LatAm scaled to region)

**Hiring process, paid**
A short our client Technical Screen, a live Code Review Session, and a Domain Expert Interview. You're paid $200 for completing all three, regardless of outcome.

Apply for this role

How it works: apply here and we connect you to our hiring partner for this role. By continuing you agree we may forward your application.