Understanding Chat Templates

Introduction

If you have worked with LLMs, you might have worked with lists of messages. Indeed, when you send a request to an LLM, we can call it a user messsage and the response can be called an assistant message. A conversation would consist of a list of a user message followed by an assistant message followed by a user message followed by an assistant message, etc. However, an LLM takes one text and outputs another text, so you might be wondering what’s the input text that’s being passed to the LLM. So what’s going on?

A user message will be something like this:

user_message = [{"role": "user", "content": "Hi!"}]

You can pass the user message to the LLM to obtain an assistant message (it will take a few minutes the first time since it will download a small LLM):

The assistant message will be something like this:

assistant_message = [{"role": "user", "content": "Hello! How can I assist you today?"}]

To have a conversation, you can pass it a list of messages:

messages = [
    {"role": "user", "content": "Hi!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What's the capital of France?"},
]

How do we convert a list of messages to an input text?

The first time I encountered this was with Thomas Capelle from W&B. Our conversation went something like this:

Thomas: An LLM has no idea about user or assistant messages, it is just an autocompletion program.

Alonso: So what’s going on with the list of messages I’m sending it? Are they just concatenated or what?

Thomas: No, no, no, it’s more complex than that, specially when you work with tools. You should check the model’s chat template.

Alonso: How do I do that?

Thomas: It’s stored in the tokenizer, let me show you.

The Chat Template

The chat template takes as input a list of messages (and tools but we will talk about it later) and convert them into a single string. Let’s see what it does with an example.

You should get the following:

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Hi!<|im_end|>

Notice that when you don’t provide a system message, the model Qwen/Qwen2.5-0.5B-Instruct adds a system message (“You are Qwen, created by Alibaba Cloud. You are a helpful assistant.”). Perhaps you want to change that to a different system message:

We can also see how does the chat template convert a conversation like this one:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi!"},
    {"role": "assistant", "content": "Hello! How can I assist you today?"},
    {"role": "user", "content": "What's the capital of France?"},
]

Each model has its own chat template. Let’s take a look at Mistral-7B-v0.3 chat template:

You should get:

<s>[INST] Hi! [/INST]Hello! How can I assist you today?</s>[INST] What's the capital of France? [/INST]

Since after sending a user message, you expect an assistant message, you can help the model by basically saying “Now, it’s your turn!”. This is so useful that it has been incorporated into the chat template itself.

The instruction is:

tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

You should get:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hi!<|im_end|>
<|im_start|>assistant

What’s interesting to me is that when you suggest to force a tool call or to force a tool, they look at you thinking you’re crazy even though it’s exactly the same thing (and it’s probably what OpenAI already does with the tool_choice=required and tool_choice: {"type": "function", "function": {"name": "my_function"}}.

Tool calls

The chat template also handles tool calls. That means that we can provide a list of tools (let’s do one as an example):

tools = [
    {
        "type": "function",
        "function": {
            "name": "Python_REPL",
            "description": "A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
            "parameters": {
                "properties": {
                    "python_code": {
                        "description": "Valid python command.",
                        "type": "string",
                    }
                },
                "required": ["python_code"],
                "type": "object",
            },
        },
    }
]

The instruction is:

tokenizer.apply_chat_template(messages, tokenize=False, tools=tools, add_generation_prompt=True)

You should get the following:

<|im_start|>system
You are a helpful assistant.

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "Python_REPL", "description": "A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.", "parameters": {"properties": {"python_code": {"description": "Valid python command.", "type": "string"}}, "required": ["python_code"], "type": "object"}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
What's 2 to the power of 5?<|im_end|>
<|im_start|>assistant

Quite complex string indeed.

If you look at some other model NousResearch/Hermes-3-Llama-3.1-8B, you see the following:

You should get the following:

<|begin_of_text|><|im_start|>system
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "Python_REPL", "description": "Python_REPL(python_code: str) - A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.

    Args:
        python_code(str): Valid python command.", "parameters": {"properties": {"python_code": {"description": "Valid python command.", "type": "string"}}, "required": ["python_code"], "type": "object"}} </tools>Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}}
For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"name": <function-name>, "arguments": <args-dict>}
</tool_call><|im_end|>
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's 2 to the power of 5?<|im_end|>
<|im_start|>assistant

I don’t like this chat template. It appears that our messages have two different and consecutive system prompts. I prefer much more the previous chat template of the model “Qwen/Qwen2.5-0.5B-Instruct”.

Thinking mode

A recent addition is the enable_thinking in some new reasoning models where the model will “think” between the XML tags <think>...</think>. For example in Qwen/Qwen3-4B the model has the possibility to reason (which is the default), but you can turn this option off if you want to. The instruction is:

tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)

You should get:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
How many r's in strawberry?<|im_end|>
<|im_start|>assistant
<think>

</think>

I saw some tweets (with multiple retweets) claiming they found a “hack” to make the model not think and it was appending <think>\n\n</think> while this is exactly what the chat template does!!!

How does the chat template handles this?

The chat template has been programmed in Jinja which is usually used in web development. You can see the chat template with the following command:

You should see the following:

{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}
{%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- '<|im_start|>' + message.role + '\n' + content }}
            {%- endif %}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- endif %}
{%- endif %}

After spending some time understanding what’s going on, you can create your own if you want to change its behavior.