CTO · MateBio
GraphRAG for a TechBio Startup
I led the transformation of raw biomedical knowledge graph data into a scalable, defensible, commercially viable platform. MateBio provides AI-powered tools that enable wet lab researchers at biotech and pharma companies to explore complex biological relationships through natural language queries and interactive visualizations.
- Architected the core GraphRAG pipeline — a multi-tool LLM agent that turns natural-language biomedical questions into validated Cypher over a Neo4j knowledge graph spanning 80+ integrated data sources, with entity recognition, provenance tracking, and confidence scoring.
- Built a hybrid entity-resolution engine that maps imprecise researcher terms ("p53", "breast cancer") to exact graph identifiers, fusing heuristic biomedical NER, type-scoped full-text and vector search, and small-model disambiguation.
- Added a systems-biology analytics layer — Graph Data Science centrality (PageRank and personalized PageRank over pre-computed projections) plus a synthetic-EHR cohort service producing "Spoke signatures" that rank genes and pathways from clinical cohorts.
- Built multi-modal data ingestion so researchers could bring their own data alongside the graph — PDF and URL document RAG, plus CSV and omics analysis with auto-generated mermaid diagrams.
- Stood up the platform across two clouds — AWS and GCP, each running Neo4j and PostgreSQL — with Terraform-driven CI/CD, automated database migrations, and a tested cross-cloud migration path.