1 Introduction

1.1 Programmatic access to LLMs via APIs

Large Language Models (LLMs) can be accessed programmatically via Application Programming Interfaces (APIs). These are commonly REST APIs (see below), though other types exist, especially in high-performance systems and critical applications (e.g. WebSockets, RPCs, especially gRPC). This allows us to incorporate LLMs in any data pipeline or application we want. While chatbots are great for interactive use and general assistance in many tasks, programmatic access via APIs is essential for research, development, and production use cases.

Some of the advantages of using APIs over chat interfaces include:

Batch processing of large numbers of requests.
Integration with other software systems and services into workflows of arbitrary complexity.
Customization of prompts and parameters for specific use cases beyond what is available in most chat interfaces.

1.1.1 What Is an API?

An API (Application Programming Interface) is a structured way for one piece of software to communicate with another. It defines a set of rules and formats that specify how requests should be made, what data can be sent or received, and how responses are delivered.

In practice, an API allows developers to use the functionality or data of a system without needing to understand its internal workings.

1.1.2 What Is a REST API?

A REST API (Representational State Transfer API) is a common type of web API that follows a specific architectural style based on HTTP — the same protocol used by web browsers.

REST APIs organize interactions around resources, which are typically represented as URLs. Clients perform operations on these resources using standard HTTP methods such as:

GET – retrieve data
POST – create new data
PUT or PATCH – update existing data
DELETE – remove data

REST APIs exchange data in lightweight formats like JSON, making them simple, fast, and widely compatible across platforms.

In the context of large language models (LLMs), REST APIs provide a standardized way to send a prompt to a model and receive its generated text, regardless of where the model is hosted — for example, OpenAI’s, Anthropic’s, or your own local model server.