(pkg-llm)=
# `rath.llm`

Provider options, request/response types, OpenAI and Anthropic clients, streaming deltas, embedding/VLM clients, retry, budget accounting, and response normalization.

## Source
| Module | Source |
| --- | --- |
| `rath.llm.provider` | `src/rath/llm/provider.py` |
| `rath.llm.base` | `src/rath/llm/base.py` |
| `rath.llm.registry` | `src/rath/llm/registry.py` |
| `rath.llm.embedding` | `src/rath/llm/embedding.py` |
| `rath.llm.vlm` | `src/rath/llm/vlm.py` |
| `rath.llm.openai.client` | `src/rath/llm/openai/client.py` |
| `rath.llm.anthropic.client` | `src/rath/llm/anthropic/client.py` |
| `rath.llm.chat_request` | `src/rath/llm/chat_request.py` |
| `rath.llm.chat_response` | `src/rath/llm/chat_response.py` |
| `rath.llm.openai.create_kwargs` | `src/rath/llm/openai/create_kwargs.py` |
| `rath.llm.openai.normalize` | `src/rath/llm/openai/normalize.py` |
| `rath.llm.anthropic.create_kwargs` | `src/rath/llm/anthropic/create_kwargs.py` |
| `rath.llm.anthropic.normalize` | `src/rath/llm/anthropic/normalize.py` |

## Public contract
### `Provider`

`Provider` stores OpenAI-compatible client identity plus model, sampling, tool, and provider-specific parameters required by the loop. It does not contain messages or tools; the session loop constructs those.

| Field category | Fields |
| --- | --- |
| client identity | `api_key`, `base_url`, `provider_kind` |
| model | `model` |
| sampling | `temperature`, `top_p`, `max_completion_tokens`, `max_tokens`, `stop`, `n`, `seed` |
| penalties | `frequency_penalty`, `presence_penalty`, `logit_bias` |
| tools/output | `tool_choice`, `parallel_tool_calls`, `response_format` |
| OpenAI options | `reasoning_effort`, `verbosity`, `metadata`, `user`, `store`, `service_tier`, `extra_create_args` |
| retry/budget | `retry_max_attempts`, `retry_base_seconds`, `budget_total_tokens`, `on_budget_exceeded` |

`Provider.from_config(name=None, **overrides)` builds a provider from `~/.openrath/config.json`; explicit overrides win over the file.

### Client
```python
from rath.llm import Provider, RathOpenAIChatClient, chat_client_for

provider = Provider(api_key="sk-...", base_url=None, model="gpt-5.5")
client = RathOpenAIChatClient(provider)
response = client.complete(request)

anthropic = Provider(provider_kind="anthropic", model="claude-sonnet-4-5")
client = chat_client_for(anthropic)
```

`chat_client_for(provider)` dispatches through the registry. Built-in kinds are OpenAI-compatible (`None` or `"openai"`) and Anthropic (`"anthropic"`). Third-party adapters can call `register_chat_client(kind, factory)`.

```{figure} ../_static/provider-dispatch-registry.png
:alt: Provider dispatch registry

`Provider.provider_kind` selects a registered chat-client factory; new provider
kinds integrate at the registry boundary instead of changing the session loop.
```

### Request and response DTOs
| Type | Description |
| --- | --- |
| `RathLLMMessage` | Chat `messages[]` element. |
| `RathLLMFunctionTool` | Function-style tool schema. |
| `RathLLMChatRequest` | OpenAI-compatible request kwargs. |
| `RathLLMChatResponse` | Normalized completion response. |
| `RathLLMStreamDelta` | Normalized streaming delta. |
| `RathLLMChatChoice` | Single choice. |
| `RathLLMAssistantMessage` | Assistant message, including tool calls. |
| `RathLLMToolCallPart` / `RathLLMToolCallFunction` | Tool call structure. |
| `RathLLMTokenUsage` | Usage statistics. |

### Embeddings and VLM

v1.2 adds first-class provider wrappers for non-chat model calls. They use the same config style as `Provider`, but keep their public surface narrow so memory backends and visual tools do not depend on chat-completion internals.

| API | Config key | Default behavior |
| --- | --- | --- |
| `EmbeddingProvider.from_config(name=None, **overrides)` | `llm.embedding_provider` | Falls back through the configured default chat provider credentials and uses `text-embedding-3-small` when no embedding model is set. |
| `RathOpenAIEmbeddingClient(provider)` | OpenAI-compatible embedding endpoint | Returns embedding vectors for text input. |
| `VLMProvider.from_config(name=None, **overrides)` | `llm.vlm_provider` | Requires an explicit VLM provider entry or overrides. |
| `RathOpenAIVLMClient(provider)` | OpenAI-compatible vision/chat endpoint | Sends text plus image inputs through a VLM-compatible model. |

### Create arguments
`to_create_kwargs(req, default_model=...)` converts the internal request to non-streaming OpenAI SDK kwargs. `RathOpenAIChatClient.complete_stream(...)` uses the streaming sibling and yields `RathLLMStreamDelta` chunks.

```{figure} ../_static/streaming-loop-deltas.png
:alt: Streaming loop deltas

Streaming forwards deltas to `on_event` while the session loop still appends one
durable assistant chunk per completed model round.
```

| Behavior | Description |
| --- | --- |
| model selection | Uses `req.model`; otherwise uses `default_model`. Raises `ValueError` if both are empty. |
| tool schema | Converts `RathLLMFunctionTool` to `{"type": "function", "function": ...}`. |
| stream | Non-streaming kwargs force `stream=False`; streaming kwargs force `stream=True`. |
| extra args | Merges `req.extra_create_args` last. |

### Environment and config fallback
| Client | Resolution order |
| --- | --- |
| OpenAI API key | `Provider.api_key` → Azure-aware env vars → matching config provider. |
| OpenAI base URL | `Provider.base_url` → `OPENAI_BASE_URL` → `AZURE_OPENAI_ENDPOINT` → config. |
| OpenAI model | `Provider.model` → `OPENAI_DEFAULT_MODEL` → config default provider model. |
| Anthropic API key | `Provider.api_key` → `ANTHROPIC_API_KEY` → matching config provider. |
| Anthropic base URL | `Provider.base_url` → `ANTHROPIC_BASE_URL` → config. |
| Anthropic model | `Provider.model` → `ANTHROPIC_DEFAULT_MODEL` → config provider model. |

Legacy Azure endpoints are routed through `openai.AzureOpenAI`; `/openai/v1` endpoints use the standard OpenAI client.

```{figure} ../_static/llm-resilience-budget.png
:alt: LLM retry, usage, and budget guard flow

Retries, usage aggregation, and budget checks sit around provider calls without
changing the public `Session` and `Provider` API shape.
```

## Autodoc
```{eval-rst}
.. autoclass:: rath.llm.Provider
   :members:

.. autoclass:: rath.llm.RathOpenAIChatClient
   :members:

.. autoclass:: rath.llm.RathAnthropicChatClient
   :members:

.. autoclass:: rath.llm.EmbeddingProvider
   :members:

.. autoclass:: rath.llm.RathOpenAIEmbeddingClient
   :members:

.. autoclass:: rath.llm.VLMProvider
   :members:

.. autoclass:: rath.llm.RathOpenAIVLMClient
   :members:

.. autoclass:: rath.llm.ChatClient
   :members:

.. autoclass:: rath.llm.StreamingChatClient
   :members:

.. autofunction:: rath.llm.chat_client_for

.. autofunction:: rath.llm.register_chat_client

.. autofunction:: rath.llm.registered_kinds

.. autofunction:: rath.llm.to_create_kwargs

.. autofunction:: rath.llm.normalize_chat_completion

.. autofunction:: rath.llm.build_anthropic_kwargs

.. autofunction:: rath.llm.build_anthropic_stream_kwargs

.. autofunction:: rath.llm.normalize_anthropic_response

.. autoclass:: rath.llm.RathLLMChatRequest
   :members:

.. autoclass:: rath.llm.RathLLMMessage
   :members:

.. autoclass:: rath.llm.RathLLMFunctionTool
   :members:

.. autoclass:: rath.llm.RathLLMChatResponse
   :members:

.. autoclass:: rath.llm.RathLLMStreamDelta
   :members:

.. autoclass:: rath.llm.RathLLMChatChoice
   :members:

.. autoclass:: rath.llm.RathLLMAssistantMessage
   :members:

.. autoclass:: rath.llm.RathLLMToolCallPart
   :members:

.. autoclass:: rath.llm.RathLLMToolCallFunction
   :members:

.. autoclass:: rath.llm.RathLLMTokenUsage
   :members:

.. autofunction:: rath.llm.add_usage

.. autoexception:: rath.llm.BudgetExceededError
```

[← API Reference](index.md)