70% precision
- Has implemented a golden dataset or ML evaluation dataset management system with versioning and lineage
- Has built multi-format document ingestion pipelines processing 10 K+ documents reliably
- Has implemented physical data separation for a compliance-sensitive system
Project-Specific Skills and Domain Knowledge
Must-Have:
- Experience implementing semantic chunking strategies for different document types (transcripts, reports, code) with measurable retrieval quality impact
- Experience building data pipelines that integrate with LLM APIs for extraction and augmentation tasks
- Experience implementing physical data separation (separate storage, separate schemas) for compliance-sensitive ML datasets
- Experience with Timescale DB or equivalent time-series databases for metrics and cost tracking
PREFERRED QUALIFICATIONS
- Experience with knowledge graph data models and entity resolution pipelines
- Experience operating data infrastruc...