How to save a session of chat with local LLM via Ollama, and make the LLM remember it in the next session after a restart?
To save a chat session with a local LLM (e.g., via Ollama) and make the model "remember" it across sessions, you need to implement a mechanism for storing conversation history and reloading it when the model restarts. While Ollama itself does not natively provide built-in memory persistence, you can achieve this by managing the conversation history externally and appending it to future prompts.
Here’s a step-by-step guide on how to save and restore chat sessions with a local LLM using Ollama:
1. Understand the Problem
LLMs like those run via Ollama process each input independently and do not inherently "remember" past interactions unless explicitly provided with context. To simulate memory:
- Store the conversation history in a persistent format (e.g., a file or database).
- Reload the conversation history at the start of a new session and append it to the prompt before sending it to the model.
2. Approach: Save and Restore Chat History
Step 1: Define a Persistent Storage Mechanism
You can store the conversation history in:
- A JSON file: Simple and human-readable.
- A database: For more complex use cases or larger-scale applications.
- Local storage: Use Python's
pickle
or similar libraries for serialization.
For simplicity, we’ll use a JSON file to store the chat history.
3. Implementation Steps
Step 1: Install Required Libraries
If you’re using Python to interact with Ollama, ensure you have the necessary libraries installed:
pip install requests
Step 2: Create a Script to Manage Chat History
Below is an example Python script that saves and restores chat sessions:
import os
import json
import requests
# Path to the chat history file
HISTORY_FILE = "chat_history.json"
# Load chat history from file
def load_chat_history():
if os.path.exists(HISTORY_FILE):
with open(HISTORY_FILE, "r") as f:
return json.load(f)
return []
# Save chat history to file
def save_chat_history(history):
with open(HISTORY_FILE, "w") as f:
json.dump(history, f)
# Function to interact with the LLM via Ollama
def chat_with_model(prompt, model_name="llama2"):
# Load previous chat history
chat_history = load_chat_history()
# Append the user's input to the history
chat_history.append({"role": "user", "content": prompt})
# Combine the history into a single string
full_prompt = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in chat_history]) + "\nModel:"
# Send the full prompt to the model
response = requests.post('http://localhost:11434/api/generate', json={
"model": model_name,
"prompt": full_prompt
})
# Extract the model's response
model_response = response.json()["response"]
# Append the model's response to the history
chat_history.append({"role": "model", "content": model_response})
# Save the updated chat history to file
save_chat_history(chat_history)
return model_response
# Example usage
if __name__ == "__main__":
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
break
response = chat_with_model(user_input)
print(f"Model: {response}")
4. How It Works
-
Load Chat History:
- At the start of the script, the chat history is loaded from the
chat_history.json
file (if it exists). This ensures that the model has access to previous conversations.
- At the start of the script, the chat history is loaded from the
-
Append User Input:
- Each time the user sends a message, it is appended to the chat history along with its role (
user
).
- Each time the user sends a message, it is appended to the chat history along with its role (
-
Generate Full Prompt:
- The chat history is combined into a single string, where each message is prefixed with its role (
User:
orModel:
). This full prompt is sent to the model.
- The chat history is combined into a single string, where each message is prefixed with its role (
-
Save Model Response:
- After receiving the model's response, it is appended to the chat history along with its role (
model
), and the updated history is saved back to thechat_history.json
file.
- After receiving the model's response, it is appended to the chat history along with its role (
-
Persist Across Sessions:
- Since the chat history is stored in a file, it persists even after the script or Ollama server is restarted. When the script runs again, it reloads the chat history and continues the conversation seamlessly.
5. Example Workflow
First Session
-
Start the script and chat with the model:
You: Hi, what's your name? Model: Hello! I'm a large language model created by Meta. You: What can you do? Model: I can answer questions, write stories, create emails, and more!
-
Exit the script. The chat history is saved to
chat_history.json
.
Second Session
- Restart the script. It loads the chat history from
chat_history.json
. - Continue the conversation:
You: Can you summarize our previous conversation? Model: Sure! In our previous conversation, you asked about my name and capabilities. I mentioned that I'm a large language model created by Meta and can perform tasks like answering questions and writing content.
6. Advanced Enhancements
6.1. Truncate Long Histories
If the chat history becomes too long, it may exceed the model's token limit. To handle this:
- Limit the number of messages stored in the history.
- Summarize older parts of the conversation.
Example:
MAX_HISTORY_LENGTH = 10 # Maximum number of messages to keep
def truncate_chat_history(history):
return history[-MAX_HISTORY_LENGTH:]
6.2. Use a Database
For more robust storage, consider using a database like SQLite or PostgreSQL to store chat history. This is especially useful for multi-user applications.
6.3. Add Timestamps
Enhance the chat history by adding timestamps to each message:
import datetime
chat_history.append({
"role": "user",
"content": prompt,
"timestamp": datetime.datetime.now().isoformat()
})
7. Conclusion
By saving the chat history to a persistent storage mechanism (e.g., a JSON file) and reloading it during subsequent sessions, you can make a local LLM (via Ollama) "remember" past conversations. This approach allows you to simulate memory and maintain continuity across sessions, even after restarting the model or the application.
Key takeaways:
- Store chat history: Use a file, database, or other storage mechanism.
- Reload history: Load the chat history at the start of each session.
- Truncate or summarize: Handle long histories to avoid exceeding token limits.
This method is flexible and can be adapted to various use cases, such as building conversational AI agents, chatbots, or personal assistants.