Skip to content
  • Auto
  • Light
  • Dark
DiscordForumGitHubSign up
View as Markdown
Copy Markdown

Open in Claude
Open in ChatGPT

Multi-modal (image inputs)

Letta agents support image inputs, enabling richer conversations and more powerful agent capabilities.

Multi-modal capabilities depend on the underlying language model. You can check which models from the API providers support image inputs by checking their individual model pages:

  • OpenAI: GPT-4.1, o1/3/4, GPT-4o
  • Anthropic: Claude Opus 4, Claude Sonnet 4
  • Gemini: Gemini 2.5 Pro, Gemini 2.5 Flash

If the provider you’re using doesn’t support image inputs, your images will still appear in the context window, but as a text message telling the agent that an image exists.

You can pass images to your agents by drag-and-dropping them into the chat window, or clicking the image icon to select a manual file upload.

import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "LETTA_API_KEY" });
const response = await client.agents.messages.create(agentState.id, {
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Describe this image.",
},
{
type: "image",
source: {
type: "url",
url: "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg",
},
},
],
},
],
});
import { LettaClient } from "@letta-ai/letta-client";
const client = new LettaClient({ token: "LETTA_API_KEY" });
const imageUrl =
"https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg";
const imageResponse = await fetch(imageUrl);
const imageBuffer = await imageResponse.arrayBuffer();
const imageData = Buffer.from(imageBuffer).toString("base64");
const response = await client.agents.messages.create(agentState.id, {
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Describe this image.",
},
{
type: "image",
source: {
type: "base64",
mediaType: "image/jpeg",
data: imageData,
},
},
],
},
],
});