13 Using OpenRouter in Python

The goal of this brief tutorial is to demonstrate how to work with an OpenAI-compatible API in Python to interact with local Ollama-provided models.

You will learn how to:

Generate a free text response directly from an LLM.
Control important parameters like temperature.
Define an output schema and generate structured outputs from an LLM.
Use “thinking” models and extract their thinking output.

There are two main ways to access OpenRouter from Python:

Using the openai Python package with OpenRouter’s OpenAI-compatible API.
Using direct HTTP requests with the requests package.

In this tutorial, we will use the openai Python package, which can be used for accessing any OpenAI-compatible API.

The OpenAI-compatible API used by Ollama does not yet support all options included in the native Ollama API. This is likely to be added in a future release. See GitHub issue #2963 and issue #11012.

13.1 Requirements

OpenRouter API key
openai Python package

13.2 Setup

13.2.1 Install packages

If not already installed, use uv to install the required package:

uv add openai

13.2.2 Load packages

import json
import keyring
from openai import OpenAI

13.2.3 Ollama OpenAI API endpoint

Ollama does not require an API key. However, if some other API you are required to provide one (i.e. the relevant argument cannot be empty), you can use any string, like “ollama-api-key”.

ollama_openai_url = "http://localhost:11434/v1/"
model_name = "qwen3:8b"
api_key="ollama-api-key"

13.3 Generate a free text response

13.3.1 Instantiate the OpenAI client

Instantiate the OpenAI client with the Ollama base URL:

client = OpenAI(
    base_url=ollama_openai_url,
    api_key=api_key)

13.3.2 Generate response

resp = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "system", "content": "You are a meticulous research assistant."},
        {"role": "user", "content": "What is your name and who made you?"},
    ],
    temperature=0.2,
)

13.3.3 Print response

The response message can be accessed as follows:

message = resp.choices[0].message

if hasattr(message, "reasoning") and message.reasoning:
    print("--- Reasoning ---")
    print(message.reasoning + "\n")

print("--- Response ---")
print(message.content)

--- Reasoning ---
Okay, the user is asking for my name and who made me. Let me start by confirming my name. I should mention that I'm Qwen, developed by Alibaba Cloud. But wait, I need to make sure I'm not mixing up any details. Let me check the official information again. Yes, Qwen is the correct name, and Alibaba Cloud is the company responsible for my development. I should also highlight that I'm part of the Qwen series, which includes different versions like Qwen, Qwen2, etc. The user might be interested in knowing about the series or specific models. Also, I should mention my capabilities to give a comprehensive answer. Let me structure the response clearly: first, my name and developer, then my series, and finally my functions. Make sure the tone is friendly and informative. Avoid any technical jargon that might confuse the user. Alright, that should cover it.


--- Response ---
My name is Qwen, and I was developed by Alibaba Cloud. I am part of the Qwen series, which includes various models such as Qwen, Qwen2, Qwen3, and more. My capabilities include answering questions, creating text, coding, logical reasoning, and other tasks. I was designed to assist users in a wide range of scenarios. If you have any questions or need help, feel free to ask!

13.4 Generate structured output

There are many scenarios, especially in research, where we want to generate a structured response instead of free text. Many users try to achieve this using instructions added to the user prompt and LLMs are increasingly good at followin these instructions. However, there is native support to define an output schema including the required names and their descriptions, which is much cleaner, easier, and more likely to succeed, without requiring laborious and extensive prompting.

We begin by defining the output schema as a Pydantic BaseModel, that we will then pass along with the request:

from pydantic import BaseModel, Field


class LLMInfoSchema(BaseModel):
    name: str = Field(description="Your name")
    manufacturer: str = Field(
        description="The name of the person, group, or company that built you."
    )
    knowledge_cutoff: str = Field(description="Your knowledge cutoff date.")

Let’s rerun the previous query, but this time we will include the output schema in the request.

resp_structured = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "system", "content": "You are a meticulous research assistant."},
        {"role": "user", "content": "What is your name and who made you?"},
    ],
    temperature=0.2,
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Response",  # Optional but recommended
            "schema": LLMInfoSchema.model_json_schema(),
        },
    },
)

Access the message as before:

message_structured = resp_structured.choices[0].message

We can print the JSON string directly or use the json package to pretty-print it:

if hasattr(message_structured, "reasoning") and message_structured.reasoning:
    print("--- Reasoning ---")
    print(message_structured.reasoning + "\n")

print("--- Response ---")
print(json.dumps(json.loads(message_structured.content), indent=4))

--- Reasoning ---
Okay, the user is asking for my name and who made me. Let me start by confirming my name. I should mention that I'm Qwen, developed by Alibaba Cloud. But wait, I need to make sure I'm not mixing up any details. Let me check the official information again. Yes, Qwen is the correct name, and the developer is Alibaba Cloud. I should also mention that I'm part of the Tongyi Lab. Oh, and maybe add a bit about my purpose to help with various tasks. Keep the response friendly and informative. Let me structure it clearly: first name, then developer, and a brief purpose. Avoid any technical jargon. Make sure it's concise but covers all necessary points. Alright, that should cover it.


--- Response ---
{
    "name": "Qwen",
    "manufacturer": "Alibaba Cloud",
    "knowledge_cutoff": "2024\u5e7404\u6708"
}

13.5 References

OpenRouter Documentation