LLM#

OpenRath uses registry-dispatched chat clients. The default kind is OpenAI-compatible; provider_kind="anthropic" selects the Anthropic adapter. v1.2 also adds embedding and VLM provider wrappers for memory and visual-model use cases. Advanced integrations can register a new chat client kind or replace SessionLoopExecutor to take over model calls, response parsing, and tool dispatch.

This page explains how OpenRath builds provider requests, normalizes responses, streams deltas, accounts for token usage, and replaces the client or executor.

The diagram below places Provider in the request path. Provider options, session messages, and FlowToolCall schemas become one OpenAI-compatible chat request; the normalized response is written back into the session loop.

Overview#

The LLM layer is deliberately narrow. It does not own workflow state and it does not execute tools. Its job is to carry provider options, build a request, call an OpenAI-compatible client, normalize the response, and hand the result back to the session loop.

Source map#

File	Responsibility
`src/rath/llm/provider.py`	`Provider` request options.
`src/rath/llm/base.py`	`ChatClient` and `StreamingChatClient` protocols.
`src/rath/llm/registry.py`	`chat_client_for(...)` and adapter registration.
`src/rath/llm/openai/client.py`	`RathOpenAIChatClient`, including streaming and Azure fallback.
`src/rath/llm/anthropic/client.py`	`RathAnthropicChatClient`, including streaming.
`src/rath/llm/embedding.py`	`EmbeddingProvider` and OpenAI-compatible embedding client.
`src/rath/llm/vlm.py`	`VLMProvider` and OpenAI-compatible VLM client.
`src/rath/llm/chat_request.py`	Request dataclasses.
`src/rath/llm/chat_response.py`	Normalized response and stream-delta dataclasses.
`src/rath/llm/openai/create_kwargs.py`	Conversion from internal request to OpenAI SDK kwargs.
`src/rath/llm/openai/normalize.py`	Conversion from OpenAI completion to internal response.
`src/rath/llm/anthropic/create_kwargs.py`	Conversion from internal request to Anthropic kwargs.
`src/rath/llm/anthropic/normalize.py`	Conversion from Anthropic response to internal response.
`src/rath/session/provider_builtin.py`	Default `SessionLoopExecutor`.

Provider Parameters#

Provider is the request options object. It stores the API key, optional base URL, model name, sampling parameters, tool choice, response format, and passthrough arguments.

import os

from rath.llm import Provider

provider = Provider(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url=os.environ.get("OPENAI_BASE_URL") or None,
    model=os.environ.get("OPENAI_DEFAULT_MODEL") or "gpt-5.5",
    temperature=0.2,
    parallel_tool_calls=False,
)

provider_into_chat_request(...) merges Provider into RathLLMChatRequest. The session loop builds messages and tools. Provider.from_config(...) can also load named providers from ~/.openrath/config.json; explicit kwargs override the config entry.

Default Client#

chat_client_for(provider) chooses a built-in adapter or a registered third-party adapter.

`Provider.provider_kind`	Client
`None` or `"openai"`	`RathOpenAIChatClient`
`"anthropic"`	`RathAnthropicChatClient`

RathOpenAIChatClient wraps openai.OpenAI(...).chat.completions.create(...); legacy Azure endpoints use openai.AzureOpenAI. RathAnthropicChatClient wraps anthropic.Anthropic(...).messages.create(...).

Environment variable	Purpose
`OPENAI_API_KEY`	API key for OpenAI or a compatible gateway.
`OPENAI_BASE_URL`	OpenAI-compatible endpoint.
`OPENAI_DEFAULT_MODEL`	OpenAI-compatible fallback model.
`AZURE_OPENAI_ENDPOINT`	Azure endpoint fallback.
`AZURE_OPENAI_API_KEY` / `AZURE_API_KEY`	Azure key fallbacks.
`ANTHROPIC_API_KEY`	Anthropic API key.
`ANTHROPIC_BASE_URL`	Anthropic endpoint override.
`ANTHROPIC_DEFAULT_MODEL`	Anthropic fallback model.

Config fallback comes after explicit Provider fields and environment variables. The OpenAI client looks for provider_kind="openai" config entries; the Anthropic client looks for provider_kind="anthropic".

Streaming#

OpenAI-compatible and Anthropic streaming are exposed through run_session_loop(on_event=...). The loop builds a StreamingExecutor around a StreamingChatClient, forwards every RathLLMStreamDelta to the callback, and appends one accumulated assistant chunk per model round.

Third-party chat clients that implement only complete(req) still work for non-streaming calls. Passing on_event with a non-streaming client raises TypeError before the loop registers sessions.

Embedding And VLM Providers#

Embedding and VLM wrappers deliberately sit beside chat providers instead of overloading Provider.

Provider	Main use	Config key
`EmbeddingProvider`	Vectorizing text for memory backends and retrieval.	`llm.embedding_provider`
`RathOpenAIEmbeddingClient`	OpenAI-compatible embeddings endpoint.	Uses the embedding provider fields.
`VLMProvider`	Vision-language requests.	`llm.vlm_provider`
`RathOpenAIVLMClient`	OpenAI-compatible image/text model endpoint.	Uses the VLM provider fields.

EmbeddingProvider.from_config(...) can reuse credentials from llm.default_provider and defaults to text-embedding-3-small when the model is omitted. VLMProvider.from_config(...) is stricter and expects an explicit VLM provider or overrides, because a chat default is not necessarily image-capable.

SessionLoopExecutor#

SessionLoopExecutor is the replacement point for the loop.

class SessionLoopExecutor(Protocol):
    def complete(self, req: RathLLMChatRequest) -> RathLLMChatResponse:
        ...

    def dispatch_tool(self, session, tool, arguments):
        ...

    def tool_schemas(self):
        ...

Method	Purpose
`complete(req)`	Runs one chat completion.
`dispatch_tool(session, tool, arguments)`	Executes a `FlowToolCall`.
`tool_schemas()`	Returns tool schemas; when it returns an empty tuple, the loop builds schemas from the local tool table.

DefaultSessionLoopExecutor uses chat_client_for(agent_provider) for model requests and directly calls tool(session, arguments) for tool execution.

Requests And Responses#

OpenRath uses normalized dataclasses internally:

Type	Purpose
`RathLLMChatRequest`	messages, tools, model, sampling parameters, and extra args.
`RathLLMChatResponse`	Normalized completion.
`RathLLMStreamDelta`	Normalized streaming content/tool-call/usage delta.
`RathLLMMessage`	system/user/assistant/tool message.
`RathLLMFunctionTool`	OpenAI-style function tool schema.
`RathLLMTokenUsage`	Prompt/completion/total token usage.

Integration Points#

Need	Extension point
Change OpenAI-compatible gateway	Set `Provider.base_url` (often from `OPENAI_BASE_URL`).
Change model and sampling parameters	Set `Provider(...)` before passing it to the loop or client.
Use a local model service	Implement `SessionLoopExecutor.complete(...)`.
Customize tool dispatch policy	Implement `SessionLoopExecutor.dispatch_tool(...)`.
Test fixed model responses	Use a scripted executor.
Call Anthropic (`claude-*`)	`Provider(provider_kind="anthropic", model="...")`.
Stream assistant deltas	Pass `on_event=` to `run_session_loop(...)`; the resolved client must satisfy `StreamingChatClient`.
Configure embeddings	Set `llm.embedding_provider` or pass `EmbeddingProvider(...)` directly.
Configure VLM calls	Set `llm.vlm_provider` or pass `VLMProvider(...)` directly.
Wire an MCP server’s tools	`from rath.flow.tool.mcp_adapter import mcp_tools_from_server` — `mcp` ships as a core dependency.
Per-session token accounting	`Session.cumulative_usage` (the loop / compress accumulate automatically).
Token budget guardrail	`Provider(budget_total_tokens=..., on_budget_exceeded=callback)`; the callback can `raise BudgetExceededError` to abort the loop.
Register a new provider	`register_chat_client("kind", factory)` and set `Provider(provider_kind="kind")`.

Call Path#

Default session loop LLM call path:

run_session_loop
  -> provider_into_chat_request(messages, tools, Provider, default_tool_choice="auto")
  -> DefaultSessionLoopExecutor.complete(req)
  -> chat_client_for(provider).complete(req)
  -> provider-specific create kwargs
  -> provider SDK call
  -> provider-specific response normalization

The compress path uses the same client and request/response DTOs, but passes tools=None and default_tool_choice="none".

Edge Cases#

Behavior	Current implementation
missing API key	Built-in clients raise `ValueError` after Provider/env/config fallback is exhausted.
missing model	Provider-specific create-kwargs builders raise `ValueError` when both `req.model` and the default model are empty.
streaming	Built-in OpenAI-compatible and Anthropic clients support `complete_stream`; third-party clients without it fail the protocol check when `on_event` is requested.
tool argument parsing	`normalize_chat_completion(...)` attempts to parse arguments as JSON and records a parse error flag.
empty choices	`RathLLMChatResponse.primary_choice` raises `IndexError`.
token budget	The guard fires only on the first completion that crosses `budget_total_tokens` for the output session.

Test Coverage#

Behavior	Tests
request/response wire shape	`tests/session/test_llm_message_wire.py`
live OpenAI-compatible client	`tests/llm/test_openai_chat_real.py`
OpenAI streaming chunks	`tests/llm/test_openai_stream_chunks.py`
Anthropic adapter and streaming	`tests/llm/test_anthropic_client.py`, `tests/llm/test_anthropic_normalize.py`, `tests/llm/test_anthropic_stream_deltas.py`
embedding/VLM wrappers	`tests/llm/test_embedding_client.py`, `tests/llm/test_vlm_client.py`
provider registry	`tests/llm/test_registry.py`
scripted loop executor	`tests/session/scripted_loop_executor.py`
integration loop/compress	`tests/integration/test_session_loop_real.py`, `tests/integration/test_session_compress_real.py`