Design and build data architecture that transforms raw and processed omics data into harmonized, AI-consumable layers
Build and optimize ETL/ELT pipelines that produce denormalized views, pre-computed aggregations, embedding-ready text representations, and feature stores optimized for AI consumption
Implement data quality monitoring, automated profiling, and validation checks across harmonization layers
Create versioned, reproducible data snapshots that support model training, evaluation, and audit requirements in a regulated environment
Partner with teams to extend harmonization patterns as modalities expand beyond genomics and proteomics into spatial transcriptomics, Perturb-Seq, single-cell, and digital pathology
Design and maintain a semantic layer over multi-omics databases that enables AI systems