How to save a session of chat with local LLM via Ollama, and make the LLM remember it in the next session after a restart?

To save a chat session with a local LLM (e.g., via Ollama) and make the model "remember" it across sessions, you need to implement a mechanism for storing conversation history and reloading it when the model restarts. While Ollama itself does not natively provide built-in memory persistence, you can achieve this by managing the conversation history externally and appending it to future prompts.

Here’s a step-by-step guide on how to save and restore chat sessions with a local LLM using Ollama:


1. Understand the Problem

LLMs like those run via Ollama process each input independently and do not inherently "remember" past interactions unless explicitly provided with context. To simulate memory:

  • Store the conversation history in a persistent format (e.g., a file or database).
  • Reload the conversation history at the start of a new session and append it to the prompt before sending it to the model.

2. Approach: Save and Restore Chat History

Step 1: Define a Persistent Storage Mechanism

You can store the conversation history in:

  • A JSON file: Simple and human-readable.
  • A database: For more complex use cases or larger-scale applications.
  • Local storage: Use Python's pickle or similar libraries for serialization.

For simplicity, we’ll use a JSON file to store the chat history.


3. Implementation Steps

Step 1: Install Required Libraries

If you’re using Python to interact with Ollama, ensure you have the necessary libraries installed:

pip install requests

Step 2: Create a Script to Manage Chat History

Below is an example Python script that saves and restores chat sessions:

import os
import json
import requests

# Path to the chat history file
HISTORY_FILE = "chat_history.json"

# Load chat history from file
def load_chat_history():
    if os.path.exists(HISTORY_FILE):
        with open(HISTORY_FILE, "r") as f:
            return json.load(f)
    return []

# Save chat history to file
def save_chat_history(history):
    with open(HISTORY_FILE, "w") as f:
        json.dump(history, f)

# Function to interact with the LLM via Ollama
def chat_with_model(prompt, model_name="llama2"):
    # Load previous chat history
    chat_history = load_chat_history()

    # Append the user's input to the history
    chat_history.append({"role": "user", "content": prompt})

    # Combine the history into a single string
    full_prompt = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in chat_history]) + "\nModel:"

    # Send the full prompt to the model
    response = requests.post('http://localhost:11434/api/generate', json={
        "model": model_name,
        "prompt": full_prompt
    })

    # Extract the model's response
    model_response = response.json()["response"]

    # Append the model's response to the history
    chat_history.append({"role": "model", "content": model_response})

    # Save the updated chat history to file
    save_chat_history(chat_history)

    return model_response

# Example usage
if __name__ == "__main__":
    while True:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit"]:
            break
        response = chat_with_model(user_input)
        print(f"Model: {response}")

4. How It Works

  1. Load Chat History:

    • At the start of the script, the chat history is loaded from the chat_history.json file (if it exists). This ensures that the model has access to previous conversations.
  2. Append User Input:

    • Each time the user sends a message, it is appended to the chat history along with its role (user).
  3. Generate Full Prompt:

    • The chat history is combined into a single string, where each message is prefixed with its role (User: or Model:). This full prompt is sent to the model.
  4. Save Model Response:

    • After receiving the model's response, it is appended to the chat history along with its role (model), and the updated history is saved back to the chat_history.json file.
  5. Persist Across Sessions:

    • Since the chat history is stored in a file, it persists even after the script or Ollama server is restarted. When the script runs again, it reloads the chat history and continues the conversation seamlessly.

5. Example Workflow

First Session

  1. Start the script and chat with the model:

    You: Hi, what's your name?
    Model: Hello! I'm a large language model created by Meta.
    You: What can you do?
    Model: I can answer questions, write stories, create emails, and more!
    
  2. Exit the script. The chat history is saved to chat_history.json.

Second Session

  1. Restart the script. It loads the chat history from chat_history.json.
  2. Continue the conversation:
    You: Can you summarize our previous conversation?
    Model: Sure! In our previous conversation, you asked about my name and capabilities. I mentioned that I'm a large language model created by Meta and can perform tasks like answering questions and writing content.
    

6. Advanced Enhancements

6.1. Truncate Long Histories

If the chat history becomes too long, it may exceed the model's token limit. To handle this:

  • Limit the number of messages stored in the history.
  • Summarize older parts of the conversation.

Example:

MAX_HISTORY_LENGTH = 10  # Maximum number of messages to keep

def truncate_chat_history(history):
    return history[-MAX_HISTORY_LENGTH:]

6.2. Use a Database

For more robust storage, consider using a database like SQLite or PostgreSQL to store chat history. This is especially useful for multi-user applications.

6.3. Add Timestamps

Enhance the chat history by adding timestamps to each message:

import datetime

chat_history.append({
    "role": "user",
    "content": prompt,
    "timestamp": datetime.datetime.now().isoformat()
})

7. Conclusion

By saving the chat history to a persistent storage mechanism (e.g., a JSON file) and reloading it during subsequent sessions, you can make a local LLM (via Ollama) "remember" past conversations. This approach allows you to simulate memory and maintain continuity across sessions, even after restarting the model or the application.

Key takeaways:

  • Store chat history: Use a file, database, or other storage mechanism.
  • Reload history: Load the chat history at the start of each session.
  • Truncate or summarize: Handle long histories to avoid exceeding token limits.

This method is flexible and can be adapted to various use cases, such as building conversational AI agents, chatbots, or personal assistants.