How can I set up pydantic_ai to use llama3.1 with structured output?

10:19 19 Feb 2026

I have trouble getting my pydantic_ai agent to respond with the output type set to a structured output with my local llama3.1 model. It is able to correctly use the tools registered via the tools attribute during agent creation, but it doesn't use the final_result tools that are automatically created by pydantic_ai when using the structured output. My question is if the issue is just that the llama3.1 model is not capable to consistently return tool output of the final_result functions or if there are pydantic_ai setting that I can use to force this behavior with the llama3.1 model.

I saw that pydantic_ai provides the output mode Prompted Output , but in the output mode documentation the default Tool Output is suggested as the more stable option, hence this post to ask if I'm doing something wrong in the setup.

For context here is my agent definition, the registered tools at the moment are get_number_of_entries and get_entry_details, which have documentation but are mostly placeholders for now:

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.ollama import OllamaProvider
from pydantic import BaseModel


class SyncResult(BaseModel):
    message: str


class AsyncResult(BaseModel):
    workflow_name: str


class MainAgentOutput(BaseModel):
    mode: Literal["sync", "async"]
    result: Union[SyncResult, AsyncResult]

main_agent = Agent(
    model=OpenAIChatModel(
        model_name="llama3.1:latest",
        provider=OllamaProvider(
           base_url="http://localhost:11434/v1", 
           api_key="somekey"
        )
    ),
    output_type=MainAgentOutput,
    tools=get_tools(),
    system_prompt=build_system_prompt(),
)

The system prompt currently looks like this, the workflow_descriptions are replaced by a list of name: routing instructions for each workflow that can be selected.

You are a routing agent.

You can either:
1. Execute one or more synchronous tools (mode="sync")
2. Select an async workflow (mode="async")

Information available to you:
- Synchronous tools: Their names, parameters, and docstrings are provided separately by the system.
  Use those tool descriptions to understand exactly what each tool can do.
- Asynchronous workflows (with routing descriptions):
{workflow_descriptions}

Output contract:
- When you choose mode="sync":
  - Call the appropriate synchronous tools as needed.
  - Populate result as a SyncResult.
  - Set result.message to a clear, user-facing text answer that summarizes
    the tool outputs and directly answers the request.
- When you choose mode="async":
  - Populate result as an AsyncResult.
  - Set result.workflow_name to exactly one of the keys listed in the async workflows.
  - Do not start or run the workflow yourself; you only suggest which
    workflow should be used.

Routing rules:
- Read the user request and compare it to:
  • The capabilities described by the synchronous tools (their names, arguments, and docstrings).
  • The routing_description of each async workflow above.
- Choose mode="sync" when one or more synchronous tools can fully satisfy the request.
- Choose mode="async" when the request clearly matches a workflow's routing_description
  or requires the type of processing that workflow is designed for.
- Do NOT base the decision purely on whether the task seems long or short;
  use the described capabilities instead.
- Never invent workflow names.
- If both a tool and a workflow could handle the request, prefer mode="sync" unless
  the workflow description explicitly makes it the better fit.

When running evals for the prompt How many entries are in the database I get the following message log structure

ModelRequest(SystemPrompt + User Prompt)
ModelResponse(Tool Call to registered tool as expected with correct parameters, matching user prompt)
ModelRequest(Tool call response)
ModelResponse(Text answer <- here the final_result tool call would be expected)
ModelRequest(Formatting Error Message)
ModelResponse(Same tool call as above again, probably because of missinterpretation of error)
ModelRequest(Tool call response)
ModelResponse(Text answer again)

After the last response the test throws an error: Exceeded maximum retries (1) for output validation , which is to be expected because the model doesn't return the structured response.

Here is the more complete message log for context, I removed provider details etc., the system prompt is as given above, the text responses contain a human readable answer with an attempt to give some structured output at the end of the string.

[
    ModelRequest(
        parts=[
            SystemPromptPart(
                content=...
            ),
            UserPromptPart(
                content="How many proteins are in the database?",
            ),
        ],
    ),
    ModelResponse(
        parts=[
            ToolCallPart(
                tool_name="get_number_of_entries",
                args='{"entry_type": ...}',
            )
        ],
        model_name="llama3.1:latest",
        provider_name="ollama",
        provider_url="http://localhost:11434/v1/",
        provider_details={
            "finish_reason": "tool_calls",
        },
        finish_reason="tool_call",
    ),
    ModelRequest(
        parts=[
            ToolReturnPart(
                tool_name="get_number_of_entries",
                content=42,
            )
        ],
    ),
    ModelResponse(
        parts=[
            TextPart(
                content=...
            )
        ],
        provider_details={
            "finish_reason": "stop",
        },
        finish_reason="stop",
    ),
    ModelRequest(
        parts=[
            RetryPromptPart(
                content=[
                    {
                        "type": "json_invalid",
                        "loc": (),
                        "msg": "Invalid JSON: expected ident at line 1 column 2",
                        "input": ...,
                        "ctx": {"error": "expected ident at line 1 column 2"},
                    }
                ],
                tool_call_id="pyd_ai_286c9b4a75114705b6dbfc44f0f585ed",
            )
        ],
    ),
    ModelResponse(
        parts=[
            ToolCallPart(
                tool_name="get_number_of_entries",
                args=...,
                tool_call_id="call_9ono7lfu",
            )
        ],
        usage=RequestUsage(input_tokens=969, output_tokens=20),
        model_name="llama3.1:latest",
        provider_name="ollama",
        provider_url="http://localhost:11434/v1/",
        provider_details={
            "finish_reason": "tool_calls",
        },
        finish_reason="tool_call",
    ),
    ModelRequest(
        parts=[
            ToolReturnPart(
                tool_name="get_number_of_entries",
                content=42,
            )
        ],
    ),
    ModelResponse(
        parts=[
            TextPart(
                content=...
            )
        ],
        finish_reason="stop",
    ),
]

pydantic llama

Your Answer

Privacy & Cookie Consent