OPEN-XTRACT

"EXTRACT"

FROM RAW PDFS TO CLEAN, QUERYABLE TEXT

Open-source framework that extracts structured data from PDFs. Bring your own models and extend to any file type.

"CAPABILITIES"

Bring your own OCR or LLM. open-xtract abstracts the extraction layer so you can swap engines at will.

Drop in a PDF and receive clean, layout-aware text that's ready for embeddings.

Every chunk is embedded into a vector DB, reranked, and served via RAG with inline citations.

MIT

OPEN SOURCE

PDF

FIRST FORMAT

FAST

EXTRACTION

OPEN

EXTENSIBLE

JOIN INDUSTRY LEADERS WHO TRUST OPEN-XTRACT