Open-source framework that extracts structured data from PDFs. Bring your own models and extend to any file type.
Bring your own OCR or LLM. open-xtract abstracts the extraction layer so you can swap engines at will.
Drop in a PDF and receive clean, layout-aware text that's ready for embeddings.
Every chunk is embedded into a vector DB, reranked, and served via RAG with inline citations.