Skip to main content

Overview

LangExtract is an open‑source Python library that leverages large language models to turn unstructured text—such as clinical notes, literary works, or any free‑form document—into well‑structured, schema‑driven data. It guarantees precise source grounding by linking every extraction to its exact location, supports visual inspection through an interactive HTML viewer, and scales efficiently to long documents via smart chunking and parallel processing. With built‑in support for cloud models like Google Gemini and local LLMs via Ollama, you can define extraction tasks with just a few examples, without any model fine‑tuning. Whether you’re building a medication database, structuring radiology reports, or analyzing literary characters, LangExtract adapts to any domain while keeping the workflow simple and reproducible.

User Feedback


Rate the Costs fields
12345
12345
12345
12345
12345
12345
12345