Skip to content

Extraction Pipeline

The LLM-powered entity and relation extraction system.

ExtractionPipeline

ExtractionPipeline

Extracts entities and relations from conversational messages using an LLM.

Usage

pipeline = ExtractionPipeline(llm_client) result = pipeline.extract("My wife's name is Lena")

extract async

extract(message: str) -> ExtractionResult

Extract entities and relations from a user message.

Parameters:

Name Type Description Default
message str

The user's conversational message.

required

Returns:

Type Description
ExtractionResult

ExtractionResult with entities, relations, and timing info.

ExtractionResult

On failure, returns an empty result (never raises).

Result Types

ExtractionResult

ExtractionResult dataclass

Complete result from processing a single message.

ExtractedEntity

ExtractedEntity dataclass

An entity extracted from a user message.

ExtractedRelation

ExtractedRelation dataclass

A relation extracted between two entities.

LLM Clients

LLMClient Protocol

LLMClient

Bases: Protocol

Protocol for LLM clients used by the extraction pipeline.

extract async

extract(system_prompt: str, user_message: str) -> str

Send extraction prompt to LLM and return raw text response.

Parameters:

Name Type Description Default
system_prompt str

System instructions for extraction.

required
user_message str

The user's conversational message to extract from.

required

Returns:

Type Description
str

Raw LLM response text (expected to be JSON).

Raises:

Type Description
LLMError

If the LLM call fails.

MockLLMClient

MockLLMClient

Mock LLM that returns predetermined extraction results.

Register responses with set_response(message_substring, json_response). Falls back to an empty extraction if no match is found.

call_count property

call_count: int

last_system_prompt property

last_system_prompt: str

last_user_message property

last_user_message: str

set_response

set_response(message_contains: str, response: dict[str, Any]) -> None

Register a canned response for messages containing the given substring.

extract async

extract(system_prompt: str, user_message: str) -> str

AnthropicLLMClient

AnthropicLLMClient

LLM client using the Anthropic async API (Claude Haiku for extraction).

extract async

extract(system_prompt: str, user_message: str) -> str

Utilities

repair_llm_json

repair_llm_json

repair_llm_json(raw_output: str) -> dict[str, Any] | list[Any] | None

Attempt to parse and repair common LLM JSON output issues.

Handles
  • Markdown fences / prose around fenced blocks
  • Preamble / trailing extra text (extracts first complete JSON object/array)
  • Trailing commas before closing braces/brackets
  • Unclosed top-level brackets/braces (best-effort)

Returns:

Type Description
dict[str, Any] | list[Any] | None

Parsed dict/list or None if repair fails.