Chat Message Histories

LangChain ArangoDB provides persistent chat message history storage using ArangoDB’s document database capabilities. The ArangoChatMessageHistory class enables you to store, retrieve, and manage conversation history across sessions.

Overview

The ArangoChatMessageHistory class integrates with LangChain’s chat memory system to provide:

  • Persistent Storage: Chat messages are stored permanently in ArangoDB

  • Session Management: Organize conversations by session ID

  • Automatic Indexing: Efficient retrieval with automatic session-based indexing

  • Message Ordering: Messages are retrieved in chronological order

  • Memory Integration: Works seamlessly with LangChain’s memory components

Quick Start

from arango import ArangoClient
from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage

# Connect to ArangoDB
client = ArangoClient("http://localhost:8529")
db = client.db("langchain_demo", username="root", password="openSesame")

# Initialize chat history for a specific session
chat_history = ArangoChatMessageHistory(
    session_id="user_123",
    db=db,
    collection_name="chat_sessions"
)

# Add messages to the conversation
chat_history.add_message(HumanMessage(content="Hello, how are you?"))
chat_history.add_message(AIMessage(content="I'm doing well, thank you! How can I help you today?"))

# Retrieve all messages in the session
messages = chat_history.messages
for message in messages:
    print(f"{message.type}: {message.content}")

Configuration

Constructor Parameters

class ArangoChatMessageHistory(session_id, db, collection_name='ChatHistory', window=3)
Parameters:
  • session_id – Unique identifier for the chat session (string or int)

  • db – ArangoDB database instance from python-arango

  • collection_name – Name of the collection to store messages (default: “ChatHistory”)

  • window – Message window size for future windowing feature (default: 3)

The class automatically:

  • Creates the collection if it doesn’t exist

  • Creates a persistent index on session_id for efficient queries

  • Handles message serialization and deserialization

Core Methods

Adding Messages

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

# Add different types of messages
chat_history.add_message(HumanMessage(content="What is machine learning?"))
chat_history.add_message(AIMessage(content="Machine learning is a subset of AI..."))
chat_history.add_message(SystemMessage(content="System: Conversation started"))

# Messages are automatically timestamped and stored with session context

Retrieving Messages

# Get all messages for the current session
all_messages = chat_history.messages

# Messages are returned in chronological order (most recent first in database,
# but converted to proper order for LangChain)
for i, message in enumerate(all_messages):
    print(f"Message {i+1}: [{message.type}] {message.content}")

Clearing History

# Clear all messages for the current session
chat_history.clear()

# Verify the session is cleared
print(f"Messages after clear: {len(chat_history.messages)}")

Integration with LangChain Memory

Conversation Buffer Memory

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory

# Create chat history
chat_history = ArangoChatMessageHistory(
    session_id="conversation_1",
    db=db,
    collection_name="conversations"
)

# Create memory with persistent storage
memory = ConversationBufferMemory(
    chat_memory=chat_history,
    return_messages=True,
    memory_key="chat_history"
)

# Use with any LangChain chain
llm = ChatOpenAI(model="gpt-3.5-turbo")

# The memory will automatically persist conversations
conversation_input = {"input": "Tell me about Python programming"}

Conversation Summary Memory

from langchain.memory import ConversationSummaryMemory

# Create summary memory with persistent storage
summary_memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-3.5-turbo"),
    chat_memory=chat_history,
    return_messages=True
)

# Conversation summaries are also persisted

Integration with Chains

QA Chain with Memory

from langchain_arangodb.chains import ArangoGraphQAChain
from langchain_arangodb.graphs import ArangoGraph

# Set up graph and chat history
graph = ArangoGraph(database=db)
chat_history = ArangoChatMessageHistory(
    session_id="qa_session_1",
    db=db,
    collection_name="qa_conversations"
)

# Create memory
memory = ConversationBufferMemory(
    chat_memory=chat_history,
    return_messages=True
)

# Create QA chain with persistent memory
qa_chain = ArangoGraphQAChain.from_llm(
    llm=ChatOpenAI(model="gpt-3.5-turbo"),
    graph=graph,
    memory=memory,
    verbose=True
)

# Conversations are automatically persisted
response1 = qa_chain.run("What entities are in our knowledge graph?")
response2 = qa_chain.run("Tell me more about the first one you mentioned")

Conversation Chain

from langchain.chains import ConversationChain

# Create a simple conversation chain with persistent memory
conversation = ConversationChain(
    llm=ChatOpenAI(model="gpt-3.5-turbo"),
    memory=ConversationBufferMemory(
        chat_memory=ArangoChatMessageHistory(
            session_id="simple_chat",
            db=db
        ),
        return_messages=True
    ),
    verbose=True
)

# Each interaction is persisted
response1 = conversation.predict(input="Hi, I'm interested in learning about databases")
response2 = conversation.predict(input="What makes ArangoDB special?")
response3 = conversation.predict(input="Can you elaborate on the multi-model aspect?")

Advanced Usage

Multiple Sessions

# Manage different conversation sessions
user_sessions = {}

def get_chat_history(user_id: str) -> ArangoChatMessageHistory:
    if user_id not in user_sessions:
        user_sessions[user_id] = ArangoChatMessageHistory(
            session_id=f"user_{user_id}",
            db=db,
            collection_name="user_conversations"
        )
    return user_sessions[user_id]

# Use for different users
alice_history = get_chat_history("alice")
bob_history = get_chat_history("bob")

# Each user maintains separate conversation history
alice_history.add_message(HumanMessage(content="Hello from Alice"))
bob_history.add_message(HumanMessage(content="Hello from Bob"))

Custom Collection Management

# Use different collections for different purposes
support_history = ArangoChatMessageHistory(
    session_id="support_ticket_123",
    db=db,
    collection_name="customer_support"
)

training_history = ArangoChatMessageHistory(
    session_id="training_session_1",
    db=db,
    collection_name="ai_training_conversations"
)

# Each collection can have different retention policies or indexes

Session Analytics

# Query conversation statistics directly from ArangoDB
def get_session_stats(db, collection_name: str, session_id: str) -> dict:
    query = """
        FOR doc IN @@collection
            FILTER doc.session_id == @session_id
            COLLECT WITH COUNT INTO length
            RETURN {
                message_count: length,
                session_id: @session_id
            }
    """

    bind_vars = {
        "@collection": collection_name,
        "session_id": session_id
    }

    result = list(db.aql.execute(query, bind_vars=bind_vars))
    return result[0] if result else {"message_count": 0, "session_id": session_id}

# Get conversation statistics
stats = get_session_stats(db, "chat_sessions", "user_123")
print(f"Session user_123 has {stats['message_count']} messages")

Data Structure

Storage Format

Messages are stored in ArangoDB with the following structure:

{
    "_key": "auto_generated_key",
    "_id": "collection_name/auto_generated_key",
    "_rev": "revision_id",
    "session_id": "user_123",
    "role": "human",
    "content": "Hello, how are you?",
    "time": "2024-01-01T12:00:00Z"
}

Field Descriptions:

  • session_id: Groups messages by conversation session

  • role: Message type (human, ai, system, etc.)

  • content: The actual message content

  • time: Timestamp for message ordering (automatically added by ArangoDB)

Indexing Strategy

The class automatically creates a persistent index on session_id to ensure efficient retrieval:

// Automatic index creation
CREATE INDEX session_idx ON ChatHistory (session_id) OPTIONS {type: "persistent", unique: false}

This index enables fast filtering of messages by session while maintaining good performance even with large message volumes.

Best Practices

Session ID Management

  1. Use descriptive session IDs: Include user context or conversation type

  2. Avoid special characters: Stick to alphanumeric characters and underscores

  3. Include timestamps for analytics: Consider formats like user_123_2024_01_01

# Good session ID patterns
session_id = f"user_{user_id}_{datetime.now().strftime('%Y_%m_%d')}"
session_id = f"support_ticket_{ticket_id}"
session_id = f"training_{model_version}_{session_counter}"

Memory Management

  1. Choose appropriate memory types based on conversation length

  2. Implement session cleanup for privacy or storage management

  3. Monitor collection size and implement archiving if needed

# Cleanup old sessions
def cleanup_old_sessions(db, collection_name: str, days_old: int = 30):
    cutoff_date = datetime.now() - timedelta(days=days_old)

    query = """
        FOR doc IN @@collection
            FILTER doc.time < @cutoff_date
            REMOVE doc IN @@collection
    """

    bind_vars = {
        "@collection": collection_name,
        "cutoff_date": cutoff_date.isoformat()
    }

    db.aql.execute(query, bind_vars=bind_vars)

Error Handling

from arango.exceptions import ArangoError

try:
    chat_history = ArangoChatMessageHistory(
        session_id="test_session",
        db=db,
        collection_name="chat_test"
    )

    chat_history.add_message(HumanMessage(content="Test message"))
    messages = chat_history.messages

except ValueError as e:
    print(f"Invalid session ID: {e}")
except ArangoError as e:
    print(f"Database error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Considerations

  1. Session ID indexing: Automatic indexing ensures O(log n) lookup performance

  2. Message ordering: Uses ArangoDB’s built-in sorting capabilities

  3. Batch operations: Consider bulk operations for high-volume scenarios

  4. Collection sizing: Monitor and archive old conversations as needed

Example: Complete Chat Application

from arango import ArangoClient
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory

class ChatApplication:
    def __init__(self, db_url: str, username: str, password: str):
        # Initialize ArangoDB connection
        self.client = ArangoClient(db_url)
        self.db = self.client.db("chat_app", username=username, password=password)

        # Initialize LLM
        self.llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

        # Session storage
        self.sessions = {}

    def get_conversation(self, session_id: str) -> ConversationChain:
        """Get or create a conversation for a session."""
        if session_id not in self.sessions:
            # Create persistent chat history
            chat_history = ArangoChatMessageHistory(
                session_id=session_id,
                db=self.db,
                collection_name="app_conversations"
            )

            # Create memory with chat history
            memory = ConversationBufferMemory(
                chat_memory=chat_history,
                return_messages=True
            )

            # Create conversation chain
            conversation = ConversationChain(
                llm=self.llm,
                memory=memory,
                verbose=True
            )

            self.sessions[session_id] = conversation

        return self.sessions[session_id]

    def chat(self, session_id: str, message: str) -> str:
        """Send a message and get a response."""
        conversation = self.get_conversation(session_id)
        return conversation.predict(input=message)

    def get_history(self, session_id: str) -> list:
        """Get conversation history for a session."""
        chat_history = ArangoChatMessageHistory(
            session_id=session_id,
            db=self.db,
            collection_name="app_conversations"
        )
        return chat_history.messages

    def clear_session(self, session_id: str):
        """Clear a conversation session."""
        if session_id in self.sessions:
            del self.sessions[session_id]

        chat_history = ArangoChatMessageHistory(
            session_id=session_id,
            db=self.db,
            collection_name="app_conversations"
        )
        chat_history.clear()

# Usage example
app = ChatApplication("http://localhost:8529", "root", "openSesame")

# Start conversations with different users
response1 = app.chat("user_alice", "Hello, I need help with Python programming")
response2 = app.chat("user_bob", "What's the weather like?")
response3 = app.chat("user_alice", "Can you explain list comprehensions?")

# Get conversation history
alice_history = app.get_history("user_alice")
print(f"Alice has {len(alice_history)} messages in her conversation")

# Clear a session when done
app.clear_session("user_bob")

Troubleshooting

Common Issues

ValueError: Please ensure that the session_id parameter is provided
  • Ensure session_id is not None, empty string, or 0

  • Use descriptive, non-empty session identifiers

Database connection errors
  • Verify ArangoDB is running and accessible

  • Check connection credentials and database permissions

  • Ensure the database exists or the user has create permissions

Index creation failures
  • Verify the user has index creation permissions

  • Check if the collection already has conflicting indexes

  • Ensure adequate disk space for index creation

Message retrieval issues
  • Verify session_id matches exactly (case-sensitive)

  • Check if messages exist in the collection using ArangoDB web interface

  • Ensure proper message format in the database