In this article I am planning to discuss how to incorporate memory into LLM models and do it seamlessly via LangChain framework. One inherent limitation of the modern LLM models is their stateless nature, where each transaction with an LLM is treated as an independent interaction. This lack of memory poses a challenge when it comes to maintaining context across multiple conversations.

Overall approach

Since LLM models are stateless, the most common method of keeping track of memory of previous conversations, is to include it at every new call to LLM as a context. In the future, there may be new models that can keep track of memory naturally but with ChatGPT and GPT-4 it this is the most common way.

Frameworks like LangChain, however, allow developers to abstract out of this limitation and make it seamless. The framework manages the memory and include it every time “automatically”.

In this article I’ll explore examples of the basic LLM memory abstractions:

  • ChatMessageHistory
  • ConversationBufferMemory
  • ConversationBufferWindowMemory
  • ConversationSummaryBufferMemory
  • ConversationEntityMemory

LangChain abstractions

Let’s review several classes that I used in the last few weeks.

The core class that is behind all the abstractions is a ChatMessageHistory that is just a collection of past user and AI messages.


from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()
history.add_user_message("hello, chatgpt")
history.add_ai_message("hello")

print(history.messages)

Output is:

[HumanMessage(content='hello, chatgpt', additional_kwargs={}, example=False),
 AIMessage(content='hello', additional_kwargs={}, example=False)]

ConversationBufferMemory

ConversationBufferMemory is one abstraction above ChatMessageHistory and can be used in OpenAI models directly. It automatically records all the conversations with the model.

from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("My name is Maksud") 

conversation_with_memory = ConversationChain(
    llm=OpenAI(temperature=0), 
    verbose=True, 
    memory=ConversationBufferMemory()
)

conversation_without_memory =  = ConversationChain(
    llm=OpenAI(temperature=0), 
    verbose=True
)

print(conversation_without_memory.predict(input="What is my name?"))

print(conversation_with_memory.predict(input="What is my name?"))

The output, first with memoryless model and second is with memory. Nice touch GPT-4 reminding us that you don’t store personal information :)

> I apologize, but as an AI language model,
I don't have access to personal information...


> Your name is Maksud.

ConversationBufferWindowMemory

This class is the same as ConversationBufferMemory but it has a window of last N messages to pass to the LLM.

Let’s take a look at example:

memory = ConversationBufferWindowMemory( k=2)
memory.save_context({"input": "a=1"}, {"output": "The value of the variable a is 1."})
memory.save_context({"input": "b=5"}, {"output": "The value of the variable b is 5. "})
memory.save_context({"input": "c=11"}, {"output": "The value of the variable c is 11. "})

print(conversation_without_memory.predict(input="what is value of c?"))
print(conversation_without_memory.predict(input="what is value of a?"))

In the first case LLM knows the value of c because it was recent but in the second case it “forgot” the value of a since it’s outside of buffer window of 2.

> The value of the variable "c" is 11.
Is there anything specific you would like to know or do with this information?


> Based on the information provided so far,
there is no mention of the variable "a" or its value.

ConversationSummaryBufferMemory

Storing all the message history with LLM can accumulate over time and can be expensive tokenwise. To overcome this, a nice trick of summarising the previous chats retains most of information and compresses the content passed to LLM every time.

LangChain’s ConversationSummaryBufferMemory helps to summarise the previous chats and uses LLM itself to create these summaries.

ConversationEntityMemory

More advanced version of summarisation and LLM memory is ConversationEntityMemory. This class helps the model to keep track of multiple entities in the conversation.

memory = ConversationEntityMemory(llm=llm, return_messages=True)
_input = {"input": "Alice and Bob talk over the phone about their gym class"}
memory.load_memory_variables(_input)
memory.save_context(
    _input,
    {"output": " Great to know about Alice and Bob. Sounds like exicting class"}
)

print(memory.load_memory_variables({"input": 'what is Bob doing?'}))

LangChain uses LLM to extract entities from this conversation, in this case Alice and Bob and stores it.

{'history':
[HumanMessage(content='Alice and Bob talk over the phone about their gym class', additional_kwargs={}),
  AIMessage(content=' Great to know about Alice and Bob. Sounds like exicting class', additional_kwargs={})],
 'entities': {
   'Alice': 'Alice is talking to Bob about their gym class.',
   'Bob': 'Bob is talking to Alice about their gym class.',
   }}

Other LangChain memory classes

In this article I covered some of the basic LangChain clasess to emulate memory in LLMs. Some other classes worth considering are:

  • VectorStore-Backed Memory
  • DynamoDB, Cassandra, Zep, etc and other databases backed memory
  • Motörhead Memory

If you are interested, I am happy to connect with you to talk about the right memory class for your LLM project.


If you like what I write, consider subscribing to my newsletter, where I share weekly practical AI tips, write my about thoughts on AI and experiments.
Photo by Google DeepMind on Unsplash


This article reflects my personal views and opinions only, which may be different from the companies and employers that I am associated with.