14 Using Ollama in Julia

The goal of this brief tutorial is to demonstrate how to use Ollama from Julia to generate responses from a local LLM.

You will learn how to:

Generate a free text response directly from an LLM.
Control important parameters like temperature.
Define an output schema and generate structured outputs from an LLM.
Use “reasoning” models and extract their reasoning output.

14.1 Requirements

Ollama
HTTP for sending requests to the Ollama API
JSON3 for creating the request body and parsing the response
JSONSchema for validating structured outputs (optional)

14.2 Install packages

If not already installed, use either Pkg.add() to install the required packages:

using Pkg
Pkg.add("HTTP")
Pkg.add("JSON3")
Pkg.add("JSONSchema")

14.3 Load packages

14.4 Generate a free text response

14.4.1 Define the API endpoint and the request body

First, define the Ollama API endpoint URL:

ollama_url = "http://localhost:11434/api/generate";

Then, define the request body.
The following shows you how to define the following fields:

model: The name of the model to use.
prompt: The prompt to send to the model. This is often called the “user” message or prompt.
stream: Whether to stream the response token by token. Streaming is useful for interactive applications, like chat interfaces.
options: A list of options to control the model’s behavior, like temperature.

Here, we construct the request body as a Dictionary and convert it to a JSON string using JSON3.write():

request_body = JSON3.write(
  Dict(
    "model" => "qwen3:8b",
    "system" => "You are a meticulous research assistant.",
    "prompt" => "What is your name and who made you?",
    "stream" => false,
    "options" => Dict("temperature" => 0.2)
  )
);

14.4.2 Perform the request

We use HTTP.post() to send the request to the Ollama API endpoint, specifying the Content-Type header as application/json:

resp = HTTP.post(
  ollama_url,
  ["Content-Type" => "application/json"],
  request_body
);

Test that the request was successful by checking the status code:

if resp.status == 200
  println("Request successful!")
else
  error("Request failed with status code: $(resp.status)")
end

Request successful!

14.4.2.1 Process the response

Parse the response to a JSON3 object:

resp_json = JSON3.read(String(resp.body))

JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}} with 13 entries:
  :model                => "qwen3:8b"
  :created_at           => "2025-11-02T20:17:25.475614Z"
  :response             => "My name is Qwen, and I was developed by Alibaba Clo…
  :thinking             => "Okay, the user is asking for my name and who made m…
  :done                 => true
  :done_reason          => "stop"
  :context              => [151644, 8948, 271, 2610, 525, 264, 95178, 3412, 178…
  :total_duration       => 13014683959
  :load_duration        => 149210917
  :prompt_eval_count    => 31
  :prompt_eval_duration => 2326238708
  :eval_count           => 220
  :eval_duration        => 10401640370

If the model used is a “reasoning” model, like qwen3:8b, the response will contain a “thinking” field. All models include a “response” field.

In this case, since we used a reasoning model, we can print the reasoning step followed by the final response:

println("--- Reasoning ---")
println(resp_json["thinking"])
println("--- Response ---")
println(resp_json["response"])

--- Reasoning ---
Okay, the user is asking for my name and who made me. Let me start by confirming my name. I should mention that I'm Qwen, developed by Alibaba Cloud. But wait, I need to make sure I'm not mixing up any details. Let me check the official information again. Yes, Qwen is the correct name, and the developer is Alibaba Cloud. I should also mention that I'm part of the Tongyi Lab. Oh, and maybe add a bit about my purpose to help with various tasks. Keep it friendly and informative. Let me structure the response clearly: first name, then developer, and a brief purpose. Avoid any technical jargon. Make sure it's concise but covers all necessary points. Alright, that should answer the user's question accurately.

--- Response ---
My name is Qwen, and I was developed by Alibaba Cloud's Tongyi Lab. I am a large language model designed to assist with a wide range of tasks, from answering questions and creating content to providing explanations and engaging in conversations. Let me know how I can help you!

14.5 Generate a structured response

There are many scenarios, especially in research, where we want to generate a structured response instead of free text. Many users try to achieve this using instructions added to the user prompt and LLMs are increasingly good at following these instructions. However, there is native support to define an output schema including the required names and their descriptions, which is much cleaner, easier, and more likely to succeed, without requiring laborious and extensive prompting.

We begin by defining the output schema. To do this, we create a dictionary that follows the JSON schema format:

type = "object": Indicates that the output is a JSON object.
properties: A named list where we define the fields we want in the output. The name of each element is the field name, and its value is another list defining the field’s type (e.g. number, string, etc.) and its description.
required: A character vector listing the names of the required fields that must be present in the output.

LLMinfo_schema = Dict(
  "type" => "object",
  "properties" => Dict(
    "name" => Dict(
      "type" => "string",
      "description" => "Your name"
    ),
    "manufacturer" => Dict(
      "type" => "string",
      "description" => "The name of the person, group, or company that built you."
    ),
    "knowledge_cutoff" => Dict(
      "type" => "string",
      "description" => "Your knowledge cutoff date."
    )
  ),
  "required" => ["name", "manufacturer", "knowledge_cutoff"]
);

Let’s rerun the previous query, but this time we will include the output schema in the request.

When generating structured output, you might choose to adjust the prompt to explicitly ask for responses that conform to the schema. This is not necessary, as models, especially those designed for structured outputs, will usually adhere to the schema without additional prompting. Note that in this case, we are purposely using the same prompt as before, while the schema is actually requesting for a third field, knowledge_cutoff, which is not mentioned in the prompt.

request_body_structured = JSON3.write(
  Dict(
    "model" => "qwen3:8b",
    "system" => "You are a meticulous research assistant.",
    "prompt" => "What is your name and who made you?",
    "stream" => false,
    "format" => LLMinfo_schema,
    "options" => Dict("temperature" => 0.2)
  )
);

Perform the request:

resp_structured = HTTP.post(
  ollama_url,
  ["Content-Type" => "application/json"],
  request_body_structured
);

If you want, you can combine the above into a single expression:

resp_structured = HTTP.post(
  ollama_url,
  ["Content-Type" => "application/json"],
  JSON3.write(
    Dict(
      "model" => "qwen3:8b",
      "system" => "You are a meticulous research assistant.",
      "prompt" => "What is your name and who made you?",
      "stream" => false,
      "format" => LLMinfo_schema,
      "options" => Dict("temperature" => 0.2)
    )
  )
);

Test that the request was successful by checking the status code:

if resp_structured.status == 200
  println("Request successful!")
else
  error("Request failed with status code: $(resp_structured.status)")
end

Request successful!

14.5.1 Process the structured response

resp_structured_json = JSON3.read(String(resp_structured.body));

Now the “response” field will contain a JSON string that conforms to the defined schema. We can use JSON3.pretty() to pretty print it:

JSON3.pretty(resp_structured_json["response"])

{
    "name": "Qwen",
    "manufacturer": "Alibaba Group",
    "knowledge_cutoff": "2024年04月"
}

14.5.1.1 Validate the response

We can optionally validate the response:

JSONSchema.validate(
  JSONSchema.Schema(LLMinfo_schema),
  JSON3.read(resp_structured_json["response"])
)

Note that JSONSchema.validate() will throw an error if the response does not conform to the schema.
Alternatively, the result of the validation can be captured as follows:

is_valid = JSONSchema.isvalid(
  JSONSchema.Schema(LLMinfo_schema),
  JSON3.read(resp_structured_json["response"])
)

true

Produce warning if the response does not conform to the schema:

if is_valid
  println("The response conforms to the schema")
else
  @warn "The response does not conform to the schema!"
end

The response conforms to the schema