Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows
Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance
Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score
Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling
Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring