Chat Message Histories ===================== LangChain ArangoDB provides persistent chat message history storage using ArangoDB's document database capabilities. The ``ArangoChatMessageHistory`` class enables you to store, retrieve, and manage conversation history across sessions. Overview -------- The ``ArangoChatMessageHistory`` class integrates with LangChain's chat memory system to provide: - **Persistent Storage**: Chat messages are stored permanently in ArangoDB - **Session Management**: Organize conversations by session ID - **Automatic Indexing**: Efficient retrieval with automatic session-based indexing - **Message Ordering**: Messages are retrieved in chronological order - **Memory Integration**: Works seamlessly with LangChain's memory components Quick Start ----------- .. code-block:: python from arango import ArangoClient from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory from langchain_core.messages import HumanMessage, AIMessage # Connect to ArangoDB client = ArangoClient("http://localhost:8529") db = client.db("langchain_demo", username="root", password="openSesame") # Initialize chat history for a specific session chat_history = ArangoChatMessageHistory( session_id="user_123", db=db, collection_name="chat_sessions" ) # Add messages to the conversation chat_history.add_message(HumanMessage(content="Hello, how are you?")) chat_history.add_message(AIMessage(content="I'm doing well, thank you! How can I help you today?")) # Retrieve all messages in the session messages = chat_history.messages for message in messages: print(f"{message.type}: {message.content}") Configuration ------------- Constructor Parameters ~~~~~~~~~~~~~~~~~~~~~ .. py:class:: ArangoChatMessageHistory(session_id, db, collection_name="ChatHistory", window=3) :param session_id: Unique identifier for the chat session (string or int) :param db: ArangoDB database instance from python-arango :param collection_name: Name of the collection to store messages (default: "ChatHistory") :param window: Message window size for future windowing feature (default: 3) The class automatically: - Creates the collection if it doesn't exist - Creates a persistent index on ``session_id`` for efficient queries - Handles message serialization and deserialization Core Methods ------------ Adding Messages ~~~~~~~~~~~~~~ .. code-block:: python from langchain_core.messages import HumanMessage, AIMessage, SystemMessage # Add different types of messages chat_history.add_message(HumanMessage(content="What is machine learning?")) chat_history.add_message(AIMessage(content="Machine learning is a subset of AI...")) chat_history.add_message(SystemMessage(content="System: Conversation started")) # Messages are automatically timestamped and stored with session context Retrieving Messages ~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get all messages for the current session all_messages = chat_history.messages # Messages are returned in chronological order (most recent first in database, # but converted to proper order for LangChain) for i, message in enumerate(all_messages): print(f"Message {i+1}: [{message.type}] {message.content}") Clearing History ~~~~~~~~~~~~~~~ .. code-block:: python # Clear all messages for the current session chat_history.clear() # Verify the session is cleared print(f"Messages after clear: {len(chat_history.messages)}") Integration with LangChain Memory --------------------------------- Conversation Buffer Memory ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from langchain.memory import ConversationBufferMemory from langchain_openai import ChatOpenAI from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory # Create chat history chat_history = ArangoChatMessageHistory( session_id="conversation_1", db=db, collection_name="conversations" ) # Create memory with persistent storage memory = ConversationBufferMemory( chat_memory=chat_history, return_messages=True, memory_key="chat_history" ) # Use with any LangChain chain llm = ChatOpenAI(model="gpt-3.5-turbo") # The memory will automatically persist conversations conversation_input = {"input": "Tell me about Python programming"} Conversation Summary Memory ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from langchain.memory import ConversationSummaryMemory # Create summary memory with persistent storage summary_memory = ConversationSummaryMemory( llm=ChatOpenAI(model="gpt-3.5-turbo"), chat_memory=chat_history, return_messages=True ) # Conversation summaries are also persisted Integration with Chains ----------------------- QA Chain with Memory ~~~~~~~~~~~~~~~~~~~ .. code-block:: python from langchain_arangodb.chains import ArangoGraphQAChain from langchain_arangodb.graphs import ArangoGraph # Set up graph and chat history graph = ArangoGraph(database=db) chat_history = ArangoChatMessageHistory( session_id="qa_session_1", db=db, collection_name="qa_conversations" ) # Create memory memory = ConversationBufferMemory( chat_memory=chat_history, return_messages=True ) # Create QA chain with persistent memory qa_chain = ArangoGraphQAChain.from_llm( llm=ChatOpenAI(model="gpt-3.5-turbo"), graph=graph, memory=memory, verbose=True ) # Conversations are automatically persisted response1 = qa_chain.run("What entities are in our knowledge graph?") response2 = qa_chain.run("Tell me more about the first one you mentioned") Conversation Chain ~~~~~~~~~~~~~~~~~ .. code-block:: python from langchain.chains import ConversationChain # Create a simple conversation chain with persistent memory conversation = ConversationChain( llm=ChatOpenAI(model="gpt-3.5-turbo"), memory=ConversationBufferMemory( chat_memory=ArangoChatMessageHistory( session_id="simple_chat", db=db ), return_messages=True ), verbose=True ) # Each interaction is persisted response1 = conversation.predict(input="Hi, I'm interested in learning about databases") response2 = conversation.predict(input="What makes ArangoDB special?") response3 = conversation.predict(input="Can you elaborate on the multi-model aspect?") Advanced Usage -------------- Multiple Sessions ~~~~~~~~~~~~~~~~ .. code-block:: python # Manage different conversation sessions user_sessions = {} def get_chat_history(user_id: str) -> ArangoChatMessageHistory: if user_id not in user_sessions: user_sessions[user_id] = ArangoChatMessageHistory( session_id=f"user_{user_id}", db=db, collection_name="user_conversations" ) return user_sessions[user_id] # Use for different users alice_history = get_chat_history("alice") bob_history = get_chat_history("bob") # Each user maintains separate conversation history alice_history.add_message(HumanMessage(content="Hello from Alice")) bob_history.add_message(HumanMessage(content="Hello from Bob")) Custom Collection Management ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Use different collections for different purposes support_history = ArangoChatMessageHistory( session_id="support_ticket_123", db=db, collection_name="customer_support" ) training_history = ArangoChatMessageHistory( session_id="training_session_1", db=db, collection_name="ai_training_conversations" ) # Each collection can have different retention policies or indexes Session Analytics ~~~~~~~~~~~~~~~~ .. code-block:: python # Query conversation statistics directly from ArangoDB def get_session_stats(db, collection_name: str, session_id: str) -> dict: query = """ FOR doc IN @@collection FILTER doc.session_id == @session_id COLLECT WITH COUNT INTO length RETURN { message_count: length, session_id: @session_id } """ bind_vars = { "@collection": collection_name, "session_id": session_id } result = list(db.aql.execute(query, bind_vars=bind_vars)) return result[0] if result else {"message_count": 0, "session_id": session_id} # Get conversation statistics stats = get_session_stats(db, "chat_sessions", "user_123") print(f"Session user_123 has {stats['message_count']} messages") Data Structure -------------- Storage Format ~~~~~~~~~~~~~ Messages are stored in ArangoDB with the following structure: .. code-block:: json { "_key": "auto_generated_key", "_id": "collection_name/auto_generated_key", "_rev": "revision_id", "session_id": "user_123", "role": "human", "content": "Hello, how are you?", "time": "2024-01-01T12:00:00Z" } **Field Descriptions:** - ``session_id``: Groups messages by conversation session - ``role``: Message type (human, ai, system, etc.) - ``content``: The actual message content - ``time``: Timestamp for message ordering (automatically added by ArangoDB) Indexing Strategy ~~~~~~~~~~~~~~~~~ The class automatically creates a persistent index on ``session_id`` to ensure efficient retrieval: .. code-block:: python // Automatic index creation CREATE INDEX session_idx ON ChatHistory (session_id) OPTIONS {type: "persistent", unique: false} This index enables fast filtering of messages by session while maintaining good performance even with large message volumes. Best Practices -------------- Session ID Management ~~~~~~~~~~~~~~~~~~~~~ 1. **Use descriptive session IDs**: Include user context or conversation type 2. **Avoid special characters**: Stick to alphanumeric characters and underscores 3. **Include timestamps for analytics**: Consider formats like ``user_123_2024_01_01`` .. code-block:: python # Good session ID patterns session_id = f"user_{user_id}_{datetime.now().strftime('%Y_%m_%d')}" session_id = f"support_ticket_{ticket_id}" session_id = f"training_{model_version}_{session_counter}" Memory Management ~~~~~~~~~~~~~~~~~ 1. **Choose appropriate memory types** based on conversation length 2. **Implement session cleanup** for privacy or storage management 3. **Monitor collection size** and implement archiving if needed .. code-block:: python # Cleanup old sessions def cleanup_old_sessions(db, collection_name: str, days_old: int = 30): cutoff_date = datetime.now() - timedelta(days=days_old) query = """ FOR doc IN @@collection FILTER doc.time < @cutoff_date REMOVE doc IN @@collection """ bind_vars = { "@collection": collection_name, "cutoff_date": cutoff_date.isoformat() } db.aql.execute(query, bind_vars=bind_vars) Error Handling ~~~~~~~~~~~~~~ .. code-block:: python from arango.exceptions import ArangoError try: chat_history = ArangoChatMessageHistory( session_id="test_session", db=db, collection_name="chat_test" ) chat_history.add_message(HumanMessage(content="Test message")) messages = chat_history.messages except ValueError as e: print(f"Invalid session ID: {e}") except ArangoError as e: print(f"Database error: {e}") except Exception as e: print(f"Unexpected error: {e}") Performance Considerations -------------------------- 1. **Session ID indexing**: Automatic indexing ensures O(log n) lookup performance 2. **Message ordering**: Uses ArangoDB's built-in sorting capabilities 3. **Batch operations**: Consider bulk operations for high-volume scenarios 4. **Collection sizing**: Monitor and archive old conversations as needed Example: Complete Chat Application ---------------------------------- .. code-block:: python from arango import ArangoClient from langchain_openai import ChatOpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory class ChatApplication: def __init__(self, db_url: str, username: str, password: str): # Initialize ArangoDB connection self.client = ArangoClient(db_url) self.db = self.client.db("chat_app", username=username, password=password) # Initialize LLM self.llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7) # Session storage self.sessions = {} def get_conversation(self, session_id: str) -> ConversationChain: """Get or create a conversation for a session.""" if session_id not in self.sessions: # Create persistent chat history chat_history = ArangoChatMessageHistory( session_id=session_id, db=self.db, collection_name="app_conversations" ) # Create memory with chat history memory = ConversationBufferMemory( chat_memory=chat_history, return_messages=True ) # Create conversation chain conversation = ConversationChain( llm=self.llm, memory=memory, verbose=True ) self.sessions[session_id] = conversation return self.sessions[session_id] def chat(self, session_id: str, message: str) -> str: """Send a message and get a response.""" conversation = self.get_conversation(session_id) return conversation.predict(input=message) def get_history(self, session_id: str) -> list: """Get conversation history for a session.""" chat_history = ArangoChatMessageHistory( session_id=session_id, db=self.db, collection_name="app_conversations" ) return chat_history.messages def clear_session(self, session_id: str): """Clear a conversation session.""" if session_id in self.sessions: del self.sessions[session_id] chat_history = ArangoChatMessageHistory( session_id=session_id, db=self.db, collection_name="app_conversations" ) chat_history.clear() # Usage example app = ChatApplication("http://localhost:8529", "root", "openSesame") # Start conversations with different users response1 = app.chat("user_alice", "Hello, I need help with Python programming") response2 = app.chat("user_bob", "What's the weather like?") response3 = app.chat("user_alice", "Can you explain list comprehensions?") # Get conversation history alice_history = app.get_history("user_alice") print(f"Alice has {len(alice_history)} messages in her conversation") # Clear a session when done app.clear_session("user_bob") Troubleshooting --------------- Common Issues ~~~~~~~~~~~~~ **ValueError: Please ensure that the session_id parameter is provided** - Ensure session_id is not None, empty string, or 0 - Use descriptive, non-empty session identifiers **Database connection errors** - Verify ArangoDB is running and accessible - Check connection credentials and database permissions - Ensure the database exists or the user has create permissions **Index creation failures** - Verify the user has index creation permissions - Check if the collection already has conflicting indexes - Ensure adequate disk space for index creation **Message retrieval issues** - Verify session_id matches exactly (case-sensitive) - Check if messages exist in the collection using ArangoDB web interface - Ensure proper message format in the database