12  Using Ollama in Python

The goal of this brief tutorial is to demonstrate how to use the Ollama Python library to generate responses from a local - and therefore private - LLM in a programmatic way.

You will learn how to:

12.1 Requirements

  • Ollama
  • Python and the ollama python package

12.2 Install the ollama package using uv

uv add ollama

12.3 Import the ollama library

import json
import ollama as ollama

12.4 Generate a free text response

out = ollama.generate(
  model="qwen3:8b",
  system="You are a meticulous research assistant.",
  prompt="What is your name and who made you?"
)

This generates an object of type GenerateResponse:

type(out)
ollama._types.GenerateResponse

If the model used is a “reasoning” model, like qwen3:8b, the response will contain a “thinking” field. All models include a “response” field:

if hasattr(out, "thinking") and out.thinking:
    print("--- Reasoning ---")
    print(out.thinking + "\n")
print("--- Response ---")
print(out['response'])
--- Reasoning ---
Okay, the user is asking about my name and who created me. I need to make sure I answer accurately. First, my name is Qwen, and I was developed by Alibaba Cloud. I should mention that I'm part of the Qwen series, which includes different models like Qwen, Qwen2, Qwen3, and QwenTurbo. It's important to highlight that I'm a large language model designed for various tasks. Also, I should note that my development team is part of Alibaba Cloud's research efforts. I need to keep the response clear and concise, avoiding any technical jargon. Let me structure the answer to first state my name and creator, then provide additional context about the series and development team. That should cover the user's question effectively.


--- Response ---
My name is Qwen, and I was developed by Alibaba Cloud. I am part of the Qwen series, which includes models like Qwen, Qwen2, Qwen3, and QwenTurbo. My development is led by Alibaba Cloud's research team, focusing on advancing large language model technology for diverse applications. Let me know if you'd like details about my capabilities or specific features!

12.5 Generate a structured response

There are many scenarios, especially in research, where we want to generate a structured response instead of free text. Many users try to achieve this using instructions added to the user prompt and LLMs are increasingly good at followin these instructions. However, there is native support to define an output schema including the required names and their descriptions, which is much cleaner, easier, and more likely to succeed, without requiring laborious and extensive prompting.

We begin by defining the output schema as a Pydantic BaseModel, that we will then co:

from pydantic import BaseModel, Field

class LLMInfoSchema(BaseModel):
    """
    LLM Info Schema
    """
    name: str = Field(description="Your name")
    manufacturer: str = Field(description="The name of the person, group, or company that built you.")
    knowledge_cutoff: str = Field(description="Your knowledge cutoff date.")

Let’s rerun the previous query, but this time we will include the output schema in the request.

out = ollama.generate(
    model="qwen3:8b",
    system="You are a meticulous research assistant.",
    prompt="What is your name and who made you?",
    format=LLMInfoSchema.model_json_schema()
)

Now the “response” field will contain a JSON string that conforms to the defined schema. We can print the JSON string directly or use the json package to pretty-print it:

print(json.dumps(json.loads(out["response"]), indent=4))
{
    "name": "Qwen",
    "manufacturer": "Alibaba Group",
    "knowledge_cutoff": "2024\u5e7404\u6708"
}

Ollama does not currently support output of both reasoning and structured response in a single call. This is likely to be fixed in future versions.