LangExtract

Information Extraction NLP Library python

Visit Website

Overall Score

3.4

Community

2.9

Tech

3.8

Security

3.4

Overview

LangExtract is an open‑source Python library that leverages large language models to turn unstructured text—such as clinical notes, literary works, or any free‑form document—into well‑structured, schema‑driven data. It guarantees precise source grounding by linking every extraction to its exact location, supports visual inspection through an interactive HTML viewer, and scales efficiently to long documents via smart chunking and parallel processing. With built‑in support for cloud models like Google Gemini and local LLMs via Ollama, you can define extraction tasks with just a few examples, without any model fine‑tuning. Whether you’re building a medication database, structuring radiology reports, or analyzing literary characters, LangExtract adapts to any domain while keeping the workflow simple and reproducible.

User Feedback

Rate the Costs fields

Degree of openness —

12345

Support cost —

12345

Deployment cost —

12345

Training cost —

12345

Reputation —

12345

Availability and stability —

12345

Feature richness —

12345

General comment (optional)