You enjoy rapidly building prototypes that demonstrate the boundaries of what LLMs are capable of, and you have developed resources to measure those capabilities
,
You have spent dozens of hours reviewing complex data and LLM outputs to ensure high data quality
,
You are obsessive about rigorously measuring AI capabilities, and also about making sure your measurements actually align with the capabilities you care about
,
You have strong software engineering skills
,
If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!
What the job involves
Evaluation is critical to making progress in scaling intelligence
,
As models continue to become superhuman in many real-world use cases, we must continue to develop new evaluation techniques that accurately reflect what models are already capable of, as well as set the agenda for what future models sho...