A practical guide to crafting system-prompts for LLM collaboration.
Overview#
This tutorial develops fluency in writing system-prompts—the hidden preambles that shape LLM behavior before conversation begins. We move from foundational concepts through practical craft to advanced techniques.
The approach is grounded in a statistical-physics perspective: system-prompts as potential landscapes that bias token generation toward desired behaviors. But you don’t need physics to benefit. The principles translate into concrete practices that work.
The Tutorial#
| Chapter | Title | Focus |
|---|
| 01 | Why System-Prompts Matter | Motivation: from generic assistant to shaped collaborator |
| 02 | The Mechanics: Messages and Roles | Foundation: how prompts enter the conversation via JSON API |
| 03 | An Interpretive Lens: Prompts as Potential Landscapes | Theory: a statistical-physics framework for understanding prompt effects |
| 04 | Crafting Your System-Prompt | Practice: principles and worked examples for simple prompts |
| 05 | When Prompts Fail: A Diagnostic Guide | Diagnosis: six failure modes and their remedies |
| 06 | The Practice: Iterative Refinement | Process: the experimental loop of test, observe, revise |
| 07 | Scaling Up: Complex System-Prompts | Advanced: architecture and tradeoffs for sophisticated prompts |
| 08 | Skills: Portable Knowledge for Agents | Comparative: how gptel-agent and Claude Code implement on-demand skills |
| 09 | The Workspace: Where System-Prompts Come Alive | Tooling: Emacs as a shared workspace for LLM collaboration |
Reading Paths#
The Quick Path#
If you want to start writing prompts immediately:
- 01-why — understand the stakes
- 04-craft — learn the principles
- 05-failures — know the pitfalls
The Complete Path#
For a thorough understanding, read sequentially from 01-why through 09-emacs-llm. Each chapter builds on the previous.
The Practitioner’s Path#
If you’re ready to build a complex prompt for a real collaboration:
- Skim 07-scaling to understand the architecture
- Study the System-Prompt Engineering framework
- Test and refine using 06-iterate as your guide
If you want to set up the workspace where system-prompt craft happens:
- 08-skills — understand how prompts become deployable skills
- 09-emacs-llm — build the Emacs workspace where it all converges
The previous chapters developed a craft: how to write system-prompts that shape LLM behavior with precision. But craft requires a medium. A sculptor needs clay, not a description of clay. This chapter concerns the environment where system-prompts are authored, tested, deployed, and refined—the workspace in which the collaboration actually unfolds.
The thesis is specific: Emacs, a programmable text environment, has become the most capable platform for operationalizing the system-prompt craft developed in this tutorial. Not because Emacs is trendy (it is nearly fifty years old), but because its architecture—transparent, extensible, text-native—aligns with what LLM collaboration demands. The medium shapes the practice.
...
Tools let an agent act. System-prompts shape how it thinks. But there is a gap between the two: domain knowledge that is too specific for a system-prompt yet too procedural for a tool. A commit workflow. A code review checklist. A deployment runbook. Knowledge that says not “here is a capability” but “here is how to do this particular thing well.”
This is what skills address. A skill is a packet of specialized instructions that an agent loads on demand—expanding its competence for a specific task without permanently consuming context. If system-prompts are the agent’s character and tools are its hands, skills are its training manuals, pulled from the shelf when the task requires them.
...
The prompts we’ve crafted so far have been compact—under 100 tokens, focused on a single role with a few behavioral constraints. This suffices for many purposes. But some collaborations demand more: explicit priority orderings, detailed epistemic standards, nuanced interaction patterns that can’t compress into a sentence or two.
This section explores when and how to scale up, using a substantial real-world prompt as our case study.
When Simple Isn’t Enough A simple prompt fails to meet your needs when you observe:
...
You have used an LLM. You typed a question, received an answer—perhaps useful, perhaps generic. The exchange felt transactional: you asked, it responded, the conversation drifted wherever momentum carried it.
But there is another mode of interaction. Before your first message, before you even arrive, a hidden preamble can shape everything that follows. This is the /system-prompt/—a message the model receives as context, yet which you, as user, never see in the conversation flow. It establishes who the model is, how it should behave, what it should prioritize, and what it should avoid.
...
Communication with an LLM occurs through an API—typically a JSON-based protocol that structures every interaction. Understanding this structure demystifies what happens when you “talk” to a model.
The Request Anatomy A typical API request contains:
Endpoint: The URL you’re addressing (e.g., /v1/chat/completions) Headers: Authentication and content-type metadata Body: The payload containing your actual request The body carries three essential components:
model: Which LLM you’re addressing parameters: Generation settings (temperature, max tokens, etc.) messages: The conversation itself The Messages Array The messages array is where interaction lives. It is an ordered list of message objects, each with a role and content:
...
You need not read this section to write effective system-prompts. But if you wish to understand what you’re doing—to develop intuition rather than follow recipes—a conceptual framework helps. We offer one drawn from statistical physics.
Token Generation as Random Walk An LLM generates text one token at a time. At each step, it computes a probability distribution over all possible next tokens, then samples from that distribution. The sequence of choices traces a path through a high-dimensional space of possibilities.
...
Theory informs; practice teaches. Here we construct system-prompts from first principles, developing intuition through concrete examples.
The Core Principles Economy The context window is finite. Your system-prompt competes with conversation history for the model’s attention. Every unnecessary token dilutes the signal. Be concise—not terse, but dense. Say what matters; omit what doesn’t.
Semantic Density Maximize meaning per token. Prefer “Respond with scientific rigor” over “Make sure your responses are accurate and based on scientific evidence.” The first is five tokens; the second is twelve. Both convey similar intent, but the first leaves more room for conversation.
...
A system-prompt that works perfectly on first draft is rare. More often, you’ll observe behaviors that diverge from your intent. This section catalogs common failure modes and their remedies—a diagnostic toolkit for prompt refinement.
Failure Mode 1: Conflicting Instructions Symptoms The model oscillates between behaviors, produces incoherent compromises, or seems to ignore parts of your prompt. Responses feel inconsistent across turns.
Cause Your prompt asks for incompatible things. “Be thorough and comprehensive” conflicts with “Keep responses under 100 words.” “Always ask clarifying questions” conflicts with “Respond immediately to requests.” The probability landscape has multiple competing minima; the model bounces between them.
...
A system-prompt is not written; it is evolved. The process resembles experimental science more than engineering—you form hypotheses, test them empirically, and refine based on observation. This final section describes the practice.
The Experimental Loop 1. Draft Begin with a candidate prompt based on the principles in Crafting Your System-Prompt. Don’t aim for perfection; aim for a reasonable starting point. Explicit is better than clever. Clear is better than complete.
2. Test Engage in representative conversations. Don’t just try your best-case scenarios—probe the edges. Ask questions that might reveal weaknesses. Push into areas where you’re uncertain how the model will behave. Vary your interaction style.
...