`rath.llm`#

Provider options, request/response types, OpenAI and Anthropic clients, streaming deltas, embedding/VLM clients, retry, budget accounting, and response normalization.

Source#

Module	Source
`rath.llm.provider`	`src/rath/llm/provider.py`
`rath.llm.base`	`src/rath/llm/base.py`
`rath.llm.registry`	`src/rath/llm/registry.py`
`rath.llm.embedding`	`src/rath/llm/embedding.py`
`rath.llm.vlm`	`src/rath/llm/vlm.py`
`rath.llm.openai.client`	`src/rath/llm/openai/client.py`
`rath.llm.anthropic.client`	`src/rath/llm/anthropic/client.py`
`rath.llm.chat_request`	`src/rath/llm/chat_request.py`
`rath.llm.chat_response`	`src/rath/llm/chat_response.py`
`rath.llm.openai.create_kwargs`	`src/rath/llm/openai/create_kwargs.py`
`rath.llm.openai.normalize`	`src/rath/llm/openai/normalize.py`
`rath.llm.anthropic.create_kwargs`	`src/rath/llm/anthropic/create_kwargs.py`
`rath.llm.anthropic.normalize`	`src/rath/llm/anthropic/normalize.py`

Public contract#

`Provider`#

Provider stores OpenAI-compatible client identity plus model, sampling, tool, and provider-specific parameters required by the loop. It does not contain messages or tools; the session loop constructs those.

Field category	Fields
client identity	`api_key`, `base_url`, `provider_kind`
model	`model`
sampling	`temperature`, `top_p`, `max_completion_tokens`, `max_tokens`, `stop`, `n`, `seed`
penalties	`frequency_penalty`, `presence_penalty`, `logit_bias`
tools/output	`tool_choice`, `parallel_tool_calls`, `response_format`
OpenAI options	`reasoning_effort`, `verbosity`, `metadata`, `user`, `store`, `service_tier`, `extra_create_args`
retry/budget	`retry_max_attempts`, `retry_base_seconds`, `budget_total_tokens`, `on_budget_exceeded`

Provider.from_config(name=None, **overrides) builds a provider from ~/.openrath/config.json; explicit overrides win over the file.

Client#

from rath.llm import Provider, RathOpenAIChatClient, chat_client_for

provider = Provider(api_key="sk-...", base_url=None, model="gpt-5.5")
client = RathOpenAIChatClient(provider)
response = client.complete(request)

anthropic = Provider(provider_kind="anthropic", model="claude-sonnet-4-5")
client = chat_client_for(anthropic)

chat_client_for(provider) dispatches through the registry. Built-in kinds are OpenAI-compatible (None or "openai") and Anthropic ("anthropic"). Third-party adapters can call register_chat_client(kind, factory).

Provider dispatch registry — `Provider.provider_kind` selects a registered chat-client factory; new provider kinds integrate at the registry boundary instead of changing the session loop.#

Request and response DTOs#

Type	Description
`RathLLMMessage`	Chat `messages[]` element.
`RathLLMFunctionTool`	Function-style tool schema.
`RathLLMChatRequest`	OpenAI-compatible request kwargs.
`RathLLMChatResponse`	Normalized completion response.
`RathLLMStreamDelta`	Normalized streaming delta.
`RathLLMChatChoice`	Single choice.
`RathLLMAssistantMessage`	Assistant message, including tool calls.
`RathLLMToolCallPart` / `RathLLMToolCallFunction`	Tool call structure.
`RathLLMTokenUsage`	Usage statistics.

Embeddings and VLM#

v1.2 adds first-class provider wrappers for non-chat model calls. They use the same config style as Provider, but keep their public surface narrow so memory backends and visual tools do not depend on chat-completion internals.

API	Config key	Default behavior
`EmbeddingProvider.from_config(name=None, **overrides)`	`llm.embedding_provider`	Falls back through the configured default chat provider credentials and uses `text-embedding-3-small` when no embedding model is set.
`RathOpenAIEmbeddingClient(provider)`	OpenAI-compatible embedding endpoint	Returns embedding vectors for text input.
`VLMProvider.from_config(name=None, **overrides)`	`llm.vlm_provider`	Requires an explicit VLM provider entry or overrides.
`RathOpenAIVLMClient(provider)`	OpenAI-compatible vision/chat endpoint	Sends text plus image inputs through a VLM-compatible model.

Create arguments#

to_create_kwargs(req, default_model=...) converts the internal request to non-streaming OpenAI SDK kwargs. RathOpenAIChatClient.complete_stream(...) uses the streaming sibling and yields RathLLMStreamDelta chunks.

Streaming loop deltas — Streaming forwards deltas to `on_event` while the session loop still appends one durable assistant chunk per completed model round.#

Behavior	Description
model selection	Uses `req.model`; otherwise uses `default_model`. Raises `ValueError` if both are empty.
tool schema	Converts `RathLLMFunctionTool` to `{"type": "function", "function": ...}`.
stream	Non-streaming kwargs force `stream=False`; streaming kwargs force `stream=True`.
extra args	Merges `req.extra_create_args` last.

Environment and config fallback#

Client	Resolution order
OpenAI API key	`Provider.api_key` → Azure-aware env vars → matching config provider.
OpenAI base URL	`Provider.base_url` → `OPENAI_BASE_URL` → `AZURE_OPENAI_ENDPOINT` → config.
OpenAI model	`Provider.model` → `OPENAI_DEFAULT_MODEL` → config default provider model.
Anthropic API key	`Provider.api_key` → `ANTHROPIC_API_KEY` → matching config provider.
Anthropic base URL	`Provider.base_url` → `ANTHROPIC_BASE_URL` → config.
Anthropic model	`Provider.model` → `ANTHROPIC_DEFAULT_MODEL` → config provider model.

Legacy Azure endpoints are routed through openai.AzureOpenAI; /openai/v1 endpoints use the standard OpenAI client.

LLM retry, usage, and budget guard flow — Retries, usage aggregation, and budget checks sit around provider calls without changing the public `Session` and `Provider` API shape.#

Autodoc#

LLM routing for run_session_loop (no messages / tools).

base_url, api_key, and model configure the HTTP client built from provider_kind (OpenAI-compatible or Anthropic). Other fields mirror RathLLMChatRequest (excluding what the loop fills in).

api_key may be omitted when callers supply a custom executor that never instantiates a default RathOpenAIChatClient or RathAnthropicChatClient.

classmethod from_config(name: str | None = None, *, store: ConfigStore | None = None, **overrides: Any) → Provider[source]#

Build a Provider from ~/.openrath/config.json.

Looks up name (or llm.default_provider when name=None) under llm.providers, then constructs a Provider whose fields come from the entry. Any explicit overrides win — pass e.g. Provider.from_config("openai-main", api_key="ad-hoc") to rotate one field without touching the on-disk file.

Lazy-imports rath.config so that import rath.llm never touches the filesystem.

Raises KeyError when the named provider is missing; the message lists what is available.

class rath.llm.RathOpenAIChatClient(provider: Provider)[source]#

Thin client around openai.OpenAI chat completions (sync + streaming).

Empty Provider.api_key / Provider.base_url fall back to environment variables (set them in the shell or via rath.config):

base_url: OPENAI_BASE_URL then AZURE_OPENAI_ENDPOINT.
api_key: OPENAI_API_KEY for OpenAI-compatible endpoints; for *.azure.com endpoints the order becomes AZURE_OPENAI_API_KEY → AZURE_API_KEY → OPENAI_API_KEY.

Azure endpoints exposing the new /openai/v1 surface speak plain OpenAI Chat Completions, so the vanilla SDK is used. Legacy Azure endpoints (/openai without /v1) are routed through openai.AzureOpenAI with api_version taken from OPENAI_API_VERSION (default 2024-10-21).

complete(req: RathLLMChatRequest) → RathLLMChatResponse[source]#

Run chat.completions.create and normalize the response.

Transient errors (rate limit, connection, timeout, server 5xx) are retried with exponential backoff per Provider.retry_max_attempts and Provider.retry_base_seconds.

complete_stream(req: RathLLMChatRequest) → Iterator[RathLLMStreamDelta][source]#

Yield RathLLMStreamDelta for each chunk of a streaming completion.

Transient errors during the initial create call are retried; once the iterator starts producing chunks, retries are no longer possible (the stream is committed).

class rath.llm.RathAnthropicChatClient(provider: Provider)[source]#

Thin client around anthropic.Anthropic messages API (sync + streaming).

complete(req: RathLLMChatRequest) → RathLLMChatResponse[source]#

Run messages.create and normalize the response.

Transient errors are retried per Provider.retry_max_attempts / Provider.retry_base_seconds. The retryable set is the Anthropic-flavored quadruple (RateLimitError, APIConnectionError, APITimeoutError, InternalServerError).

complete_stream(req: RathLLMChatRequest) → Iterator[RathLLMStreamDelta][source]#

Yield RathLLMStreamDelta for each event from messages.stream.

Transient errors during the initial stream open are retried; once the iterator starts producing events, retries are no longer possible.

Routing + credentials for an OpenAI-compatible embeddings endpoint.

The chat Provider (in rath.llm.provider) is intentionally not reused: embedding endpoints frequently live under a different base_url / model namespace even when the api_key is shared.

dimensions: int | None#: When set, request a truncated/projected embedding vector. The OpenAI SDK passes this as dimensions=. None means use the model’s native dimension.

retry_max_attempts: int | None#: Same retry knobs as Provider; None uses built-in defaults.

classmethod from_config(name: str | None = None, *, store: ConfigStore | None = None, **overrides: Any) → EmbeddingProvider[source]#

Build an EmbeddingProvider from ~/.openrath/config.json.

Lookup order:

name if given.
llm.embedding_provider if set.
llm.default_provider (chat fallback) — uses its credentials but replaces model with DEFAULT_EMBEDDING_MODEL since the chat model is unsuitable for embeddings.

Raises KeyError only when name is given explicitly and the entry is missing.

class rath.llm.RathOpenAIEmbeddingClient(provider: EmbeddingProvider)[source]#

Thin wrapper around openai.OpenAI().embeddings.create.

Construct once per EmbeddingProvider; the underlying SDK client is created up-front and reused across calls.

embed(texts: Sequence[str]) → tuple[tuple[float, ...], ...][source]#

Embed an arbitrary number of texts; returns one vector per input.

An empty texts short-circuits to () without an API call.

embed_one(text: str) → tuple[float, ...][source]#: Convenience for the single-text case.

Routing + credentials for an OpenAI-compatible vision endpoint.

classmethod from_config(name: str | None = None, *, store: ConfigStore | None = None, **overrides: Any) → VLMProvider[source]#

Build a VLMProvider from ~/.openrath/config.json.

Lookup order:

name if given.
llm.vlm_provider if set.

Unlike EmbeddingProvider, there is no fallback to llm.default_provider: a chat model is rarely a vision model, and silently falling back would produce confusing 400 errors at first use. Raises KeyError instead.

class rath.llm.RathOpenAIVLMClient(provider: VLMProvider)[source]#

Thin wrapper turning (image, prompt) -> caption into a chat call.

describe(image_bytes: bytes, *, prompt: str, mime: str = 'image/png') → str[source]#: Send a single image + text prompt; return the model’s reply text.

describe_path(path: Path, *, prompt: str) → str[source]#: Load an image from disk and call describe().

class rath.llm.ChatClient(*args, **kwargs)[source]#

Minimal synchronous chat-completion contract.

Implementations must keep complete blocking and side-effect-free beyond the network call itself; retries / token accounting / budget handling are layered above in the session loop.

class rath.llm.StreamingChatClient(*args, **kwargs)[source]#

A ChatClient that also supports streaming completions.

run_session_loop() accepts any object satisfying this Protocol when on_event is provided. Both OpenAI and Anthropic adapters implement it.

rath.llm.chat_client_for(provider: Provider) → ChatClient[source]#

Return the ChatClient for provider.provider_kind.

provider.provider_kind=None defaults to "openai". Unknown kinds raise ValueError listing what is currently registered.

rath.llm.register_chat_client(kind: str, factory: Callable[[Provider], ChatClient]) → None[source]#

Overwrites any previous registration silently — late imports therefore win. Built-in kinds ("openai", "anthropic") are registered when their subpackages are imported by rath.llm.

rath.llm.registered_kinds() → tuple[str, ...][source]#: Snapshot of currently registered kinds (useful for diagnostics / tests).

rath.llm.to_create_kwargs(req: RathLLMChatRequest, *, default_model: str | None) → dict[str, Any][source]#

Map RathLLMChatRequest to OpenAI.chat.completions.create kwargs.

Non-streaming only: stream is forced to False after extra_create_args are merged. stream=True in extras raises ValueError.

rath.llm.normalize_chat_completion(completion: ChatCompletion) → RathLLMChatResponse[source]#: Convert an SDK ChatCompletion into RathLLMChatResponse.

rath.llm.build_anthropic_kwargs(req: RathLLMChatRequest, *, default_model: str | None) → dict[str, Any][source]#

Translate RathLLMChatRequest into messages.create kwargs.

default_model mirrors to_create_kwargs(): it’s used when neither the request nor the provider supplies a model name.

rath.llm.build_anthropic_stream_kwargs(req: RathLLMChatRequest, *, default_model: str | None) → dict[str, Any][source]#

Same kwargs as build_anthropic_kwargs() for messages.stream.

Anthropic’s messages.stream(**kwargs) uses the same shape as messages.create; there is no stream=True flag. Named entrypoint parallel to rath.llm.openai.create_kwargs.to_create_kwargs_stream().

rath.llm.normalize_anthropic_response(payload: Mapping[str, Any]) → RathLLMChatResponse[source]#

Map an Anthropic Message-shaped dict to RathLLMChatResponse.

payload is expected to be the result of message.model_dump(mode='json') on the SDK return value (or an equivalent fixture dict). Defending via dict lookups keeps the adapter compatible across minor SDK upgrades.

Maps to keyword arguments passed to the vendor chat API.

model=None falls back to model on the Provider held by the chat client.

class rath.llm.RathLLMMessage(role: str, content: str | None = None, name: str | None = None, tool_call_id: str | None = None, tool_calls: tuple[Mapping[str, Any], ...] | None = None)[source]#

One messages[] element for chat completions.create.

tool_calls is set only for assistant turns in tool-using conversations.

class rath.llm.RathLLMFunctionTool(name: str, parameters: dict[str, Any], description: str | None = None, strict: bool | None = None)[source]#: A function-style tool definition (type: function).

class rath.llm.RathLLMChatResponse(id: str, choices: tuple[RathLLMChatChoice, ...], created: int, model: str, object_type: Literal['chat.completion'] = 'chat.completion', service_tier: str | None = None, system_fingerprint: str | None = None, usage: RathLLMTokenUsage | None = None, raw: Mapping[str, Any] | None = None)[source]#

Normalized non-streaming ChatCompletion.

property primary_choice: RathLLMChatChoice#: The first choice (typical when n is 1).

class rath.llm.RathLLMStreamDelta(content_delta: str | None = None, tool_call_index: int | None = None, tool_call_id: str | None = None, tool_call_name_delta: str | None = None, tool_call_args_delta: str | None = None, finish_reason: Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'] | None = None, usage: RathLLMTokenUsage | None = None)[source]#

One chunk emitted by a streaming completion.

Fields are independent and any subset may be populated:

content_delta carries an assistant text fragment.

tool_call_index / tool_call_id / tool_call_name_delta / tool_call_args_delta extend an in-progress assistant tool_call. Multiple tool calls in one stream are distinguished by tool_call_index.

finish_reason is set on the terminal chunk for a choice.

usage is populated only on the final stream event (and only when the underlying API agreed to report it, e.g. OpenAI’s stream_options={"include_usage": True}).

class rath.llm.RathLLMChatChoice(index: int, finish_reason: Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'], message: RathLLMAssistantMessage, logprobs: Mapping[str, Any] | None = None)[source]#: One element of choices.

class rath.llm.RathLLMAssistantMessage(role: Literal['assistant'] = 'assistant', content: str | None = None, refusal: str | None = None, reasoning_content: str | None = None, tool_calls: tuple[RathLLMToolCallPart, ...] | None = None, function_call: Mapping[str, Any] | None = None, annotations: tuple[Mapping[str, Any], ...] | None = None)[source]#: Assistant message on a choice (content, optional tool calls, provider extras).

class rath.llm.RathLLMToolCallPart(id: str, type: str, function: RathLLMToolCallFunction)[source]#: One entry from message.tool_calls.

class rath.llm.RathLLMToolCallFunction(name: str, arguments: str, arguments_parsed: dict[str, Any] | None, arguments_parse_error: bool)[source]#: function payload inside a tool call (name + arguments string).

class rath.llm.RathLLMTokenUsage(prompt_tokens: int, completion_tokens: int, total_tokens: int, completion_tokens_details: Mapping[str, Any] | None = None, prompt_tokens_details: Mapping[str, Any] | None = None)[source]#: Token counts from usage; optional detail dicts stay JSON-shaped.

rath.llm.add_usage(a: RathLLMTokenUsage | None, b: RathLLMTokenUsage | None) → RathLLMTokenUsage | None[source]#

Sum two token usages.

Returns None only when both inputs are None (so callers can detect that no provider in the chain reported usage). Detail dicts are not merged - they are dropped on the accumulated total because per-call breakdowns don’t sum cleanly.

exception rath.llm.BudgetExceededError[source]#

Raised by user code from Provider.on_budget_exceeded to abort a loop.

The session loop itself does not raise this automatically when budget_total_tokens is exceeded — it only invokes the callback (or logs a warning if no callback is set). Raising this from the callback is the documented way to stop the loop on overrun.

← API Reference

rath.llm#