Understanding LLM Memory

code
analysis
Author

Alonso Silva

Published

June 28, 2025

Understanding LLM Memory

Despite what some people think (even some researchers I’ve met), language models don’t have any memory. The confusion comes, I suppose, from the fact that most people interact with models through some user interface like www.chatgpt.com, which handles the memory for them (incidentally, in some interesting ways as explained below).

Let’s explore this further. Let’s use transformers_js_py which allows us to use language models in the browser.

Let’s download a small model and its tokenizer (it takes a few minutes):

We can ask the question What's 2 + 2? and get the following response:

If we ask the model to add to the result 2 more, we get the following response:

The model doesn’t have any recollection of our previous conversation. Now, that’s completely expected since we haven’t provided the model any way to access that information.

Handling Memory

The simplest way to handle memory is to provide our previous conversation within a list of messages:

With all these messages we obtain the following response:

That works! The problem with that approach is that we need to be mindful of the context length of the model. We could for example store only the last 10 messages or so and perhaps the system message if there is one (in this example, we don’t have one).

More sophisticated approaches could be to store the messages and its responses in a vector database and retrieve the most closely related ones. Similarly, we could store the information as a graph in a graph database and retrieve the nodes and edges most closely related.

ChatGPT’s memory feature is very interesting because it uses memory as a tool. Let’s take a look at that.

We can define the tools as follows:

tools = [
    {
        "type": "function",
        "function": {
            "name": "biography",
            "description": "The biography tool allows you to persist user information across conversations. Use this tool and write whatever user information you want to remember. The information will appear in the model set context below in future conversations.",
            "parameters": {
                "properties": {
                    "user_information": {
                        "description": "Information from the user you want to remember across conversations.",
                        "type": "string",
                    }
                },
                "required": ["user_information"],
                "type": "object",
            },
        },
    }
]

We can then store the tool call as a text file and provide it in the context.

I really like that idea. It’s very simple and clean!