Communication with an LLM occurs through an API—typically a JSON-based protocol that structures every interaction. Understanding this structure demystifies what happens when you “talk” to a model.
The Request Anatomy
A typical API request contains:
- Endpoint: The URL you’re addressing (e.g.,
/v1/chat/completions) - Headers: Authentication and content-type metadata
- Body: The payload containing your actual request
The body carries three essential components:
model: Which LLM you’re addressingparameters: Generation settings (temperature, max tokens, etc.)messages: The conversation itself
The Messages Array
The messages array is where interaction lives. It is an ordered list of message objects, each with a role and content:
{
"messages": [
{
"role": "system",
"content": "You are a careful scientific editor..."
},
{
"role": "user",
"content": "Please review this paragraph for clarity."
},
{
"role": "assistant",
"content": "I notice three areas where precision could improve..."
},
{
"role": "user",
"content": "Can you elaborate on the second point?"
}
]
}
The Three Roles
system
The system message appears first (when present) and establishes context for the entire conversation. The model treats it as persistent background—instructions that color every subsequent response. You, as user, typically set this once; it doesn’t appear in the visible conversation flow.
user
Your messages. Questions, requests, input text, corrections. Each user message advances the conversation and prompts a response.
assistant
The model’s responses. These accumulate in the array alongside user messages, forming the conversation history that the model sees when generating its next response.
Context as Memory
Here is a crucial point: the model has no memory beyond what appears in the messages array. Each API call sends the entire conversation history. The model reads from the beginning—system message first—and generates the next assistant response based on everything it sees.
This means the system-prompt is re-read with every turn. It occupies the “zeroth position” in the sequence, anchoring the conversation’s beginning. As the array grows, more tokens compete for the model’s attention, but the system-prompt remains, a persistent presence shaping interpretation.
A Minimal Example
The simplest possible system-prompt interaction:
{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "system", "content": "Respond only in haiku."},
{"role": "user", "content": "Explain recursion."}
]
}
The system message is seventeen tokens. Yet it completely transforms the response—from a technical explanation to a poetic compression. This is the leverage a system-prompt provides: minimal input, maximal behavioral shift.
With the mechanics understood, we can now ask a deeper question: what is actually happening when a system-prompt shapes model behavior? For this, we need an interpretive lens.