Models

List Llm Models

client.models.list(?, ?): ModelListResponse { context_window, max_context_window, model, 21 more }

get/v1/models/

ModelsExpand Collapse

EmbeddingConfig { embedding_dim, embedding_endpoint_type, embedding_model, 7 more }

Configuration for embedding model connection and processing parameters.

embedding_dim: number

The dimension of the embedding.

embedding_endpoint_type: "openai" | "anthropic" | "bedrock" | 16 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

embedding_model: string

The model for the embedding.

azure_deployment?: string | null

The Azure deployment for the model.

azure_endpoint?: string | null

The Azure endpoint for the model.

azure_version?: string | null

The Azure version for the model.

batch_size?: number

The maximum batch size for processing embeddings.

embedding_chunk_size?: number | null

The chunk size of the embedding.

embedding_endpoint?: string | null

The endpoint for the model (None if local).

handle?: string | null

The handle for this config, in the format provider/model-name.

EmbeddingModel { display_name, embedding_dim, embedding_endpoint_type, 12 more }

display_name: string

Display name for the model shown in UI

embedding_dim: number

The dimension of the embedding

Deprecatedembedding_endpoint_type: "openai" | "anthropic" | "bedrock" | 16 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the embedding model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

Deprecatedembedding_model: string

Deprecated: Use 'name' field instead. Embedding model name.

name: string

The actual model name used by the provider

provider_name: string

The name of the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"xai"

Deprecatedazure_deployment?: string | null

Deprecated: The Azure deployment for the model.

Deprecatedazure_endpoint?: string | null

Deprecated: The Azure endpoint for the model.

Deprecatedazure_version?: string | null

Deprecated: The Azure version for the model.

Deprecatedbatch_size?: number

Deprecated: The maximum batch size for processing embeddings.

Deprecatedembedding_chunk_size?: number | null

Deprecated: The chunk size of the embedding.

Deprecatedembedding_endpoint?: string | null

Deprecated: The endpoint for the model.

handle?: string | null

The handle for this config, in the format provider/model-name.

model_type?: "embedding"

Type of model (llm or embedding)

Accepts one of the following:

"embedding"

LlmConfig { context_window, model, model_endpoint_type, 17 more }

Configuration for Language Model (LLM) connection and generation parameters.

context_window: number

The context window size for the model.

model: string

LLM model name.

model_endpoint_type: "openai" | "anthropic" | "google_ai" | 18 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

compatibility_type?: "gguf" | "mlx" | null

The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name?: string | null

A human-friendly display name for the model.

enable_reasoner?: boolean

Whether or not the model should use extended thinking if it is a 'reasoning' style model

frequency_penalty?: number | null

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. From OpenAI: Number between -2.0 and 2.0.

handle?: string | null

The handle for this config, in the format provider/model-name.

max_reasoning_tokens?: number

Configurable thinking budget for extended thinking. Used for enable_reasoner and also for Google Vertex models like Gemini 2.5 Flash. Minimum value is 1024 when used with enable_reasoner.

max_tokens?: number | null

The maximum number of tokens to generate. If not set, the model will use its default value.

model_endpoint?: string | null

The endpoint for the model.

model_wrapper?: string | null

The wrapper for the model.

parallel_tool_calls?: boolean | null

If set to True, enables parallel tool calling. Defaults to False.

provider_category?: ProviderCategory | null

The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name?: string | null

The provider name for the model.

put_inner_thoughts_in_kwargs?: boolean | null

Puts 'inner_thoughts' as a kwarg in the function call if this is set to True. This helps with function calling performance and also the generation of inner thoughts.

reasoning_effort?: "minimal" | "low" | "medium" | "high" | null

The reasoning effort to use when generating text reasoning models

Accepts one of the following:

"minimal"

"low"

"medium"

"high"

temperature?: number

The temperature to use when generating text with the model. A higher temperature will result in more random text.

tier?: string | null

The cost tier for the model (cloud only).

verbosity?: "low" | "medium" | "high" | null

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:

"low"

"medium"

"high"

Model { context_window, max_context_window, model, 21 more }

Deprecatedcontext_window: number

Deprecated: Use 'max_context_window' field instead. The context window size for the model.

max_context_window: number

The maximum context window for the model

Deprecatedmodel: string

Deprecated: Use 'name' field instead. LLM model name.

Deprecatedmodel_endpoint_type: "openai" | "anthropic" | "google_ai" | 18 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

name: string

The actual model name used by the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"xai"

Deprecatedcompatibility_type?: "gguf" | "mlx" | null

Deprecated: The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name?: string | null

A human-friendly display name for the model.

Deprecatedenable_reasoner?: boolean

Deprecated: Whether or not the model should use extended thinking if it is a 'reasoning' style model.

Deprecatedfrequency_penalty?: number | null

Deprecated: Positive values penalize new tokens based on their existing frequency in the text so far.

handle?: string | null

The handle for this config, in the format provider/model-name.

Deprecatedmax_reasoning_tokens?: number

Deprecated: Configurable thinking budget for extended thinking.

Deprecatedmax_tokens?: number | null

Deprecated: The maximum number of tokens to generate.

Deprecatedmodel_endpoint?: string | null

Deprecated: The endpoint for the model.

model_type?: "llm"

Type of model (llm or embedding)

Accepts one of the following:

"llm"

Deprecatedmodel_wrapper?: string | null

Deprecated: The wrapper for the model.

Deprecatedparallel_tool_calls?: boolean | null

Deprecated: If set to True, enables parallel tool calling.

Deprecatedprovider_category?: ProviderCategory | null

Deprecated: The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name?: string | null

The provider name for the model.

Deprecatedput_inner_thoughts_in_kwargs?: boolean | null

Deprecated: Puts 'inner_thoughts' as a kwarg in the function call.

Deprecatedreasoning_effort?: "minimal" | "low" | "medium" | "high" | null

Deprecated: The reasoning effort to use when generating text reasoning models.

Accepts one of the following:

"minimal"

"low"

"medium"

"high"

Deprecatedtemperature?: number

Deprecated: The temperature to use when generating text with the model.

Deprecatedtier?: string | null

Deprecated: The cost tier for the model (cloud only).

Deprecatedverbosity?: "low" | "medium" | "high" | null

Deprecated: Soft control for how verbose model output should be.

Accepts one of the following:

"low"

"medium"

"high"

ProviderCategory = "base" | "byok"

Accepts one of the following:

"base"

"byok"

ProviderType = "anthropic" | "azure" | "bedrock" | 14 more

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"xai"

ModelsEmbeddings

List Embedding Models

client.models.embeddings.list(?): EmbeddingListResponse { display_name, embedding_dim, embedding_endpoint_type, 12 more }

get/v1/models/embedding