ArangoGraphQAChain

This guide demonstrates how to use the ArangoGraphQAChain for question-answering against an ArangoDB graph database.

Basic Setup

First, let’s set up the necessary imports and create a basic instance:

from langchain_arangodb import ArangoGraphQAChain, ArangoGraph, ArangoChatMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from arango import ArangoClient

# Initialize ArangoDB connection
client = ArangoClient()
db = client.db("your_database", username="user", password="pass")

# Create graph instance
graph = ArangoGraph(db)

# Initialize LLM
llm = ChatOpenAI(temperature=0)

# Create the chain
chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True  # Be cautious with this setting
)

Individual Method Usage

1. Basic Query Execution

The simplest way to use the chain is with a direct query:

response = chain.invoke("Who starred in Pulp Fiction?")
print(response["result"])

2. Using Custom Input/Output Keys

You can customize the input and output keys:

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    input_key="question",
    output_key="answer"
)

response = chain.invoke("Who directed Inception?")
print(response["answer"])

3. Limiting Results

Control the number of results returned:

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    top_k=5,  # Return only top 5 results
    output_list_limit=16,  # Limit list length in response
    output_string_limit=128  # Limit string length in response
)

4. Query Explanation Mode

Get query explanation without execution:

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    execute_aql_query=False  # Only explain, don't execute
)

explanation = chain.invoke("Find all movies released after 2020")
print(explanation["aql_result"])  # Contains query plan

5. Read-Only Mode

Enforce read-only operations:

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    force_read_only_query=True  # Prevents write operations
)

6. Custom AQL Examples

Provide example AQL queries for better generation:

example_queries = """
FOR m IN Movies
    FILTER m.year > 2020
    RETURN m.title

FOR a IN Actors
    FILTER a.awards > 0
    RETURN a.name
"""

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    aql_examples=example_queries
)

7. Detailed Output

Get more detailed output including AQL query and results:

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    return_aql_query=True,
    return_aql_result=True
)

response = chain.invoke("Who acted in The Matrix?")
print("Query:", response["aql_query"])
print("Raw Results:", response["aql_result"])
print("Final Answer:", response["result"])

8. Query Cache

Enable query caching to reuse past queries, reducing response time and LLM cost:

# Initialize Embedding Model (required for query cache)
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    use_query_cache=True, # Enables query caching (default: False)
    embedding=embedding, # Required if use_query_cache is True
    query_cache_collection_name="Queries", # Optional (default: "Queries")
    query_cache_similarity_threshold=0.80, # Only fetch cached queries with similarity >= 0.80 (default: 0.80)
)

query1 = "Who directed The Matrix?"
response1 = chain.invoke({"query": query1, "use_query_cache": False}) # Disable query cache to force fresh LLM generation
print(response1["result"])

# Cache the query and its result if you are satisfied
chain.cache_query() # Caches the most recent query by default

# Alternatively, you can cache a query-AQL pair manually
chain.cache_query(
    text="Who directed The Matrix?",
    aql="FOR m IN Movies FILTER m.title == 'The Matrix' RETURN m.director"
)

# Similar query: uses exact match or vector similarity to fetch a cached AQL query and its result
query2 = "Who is the director of The Matrix?"
response2 = chain.invoke({
    "query": query2,
    "query_cache_similarity_threshold": 0.90
}) # Adjust threshold if needed
print(response2["result"])

# Clear all cached queries
chain.clear_query_cache()

# Or, clear a specific cached query
chain.clear_query_cache(text="Who directed The Matrix?")

9. Chat History

Enable context-aware responses by including chat history:

# Initialize chat message history (required for chat history)
history = ArangoChatMessageHistory(
    session_id="user_123",
    db=db,
    collection_name="chat_sessions"
)

chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    include_history=True, # Enables chat history (default: False)
    chat_history_store=history, # Instance of ArangoChatMessageHistory. Required if include_history is True
    max_history_messages=10  # Optional: maximum number of messages to include (default: 10)
)

query = "What movies were released in 1999?"
response = chain.invoke({"query": query, "include_history": False}) # Disable chat history (on function call only)
print(response["result"])

query = "Among all those movies, which one is directed by Lana Wachowski?"
response = chain.invoke({"query": query}) # include_history already set to True in the chain, enabling the LLM to understand what "those movies" refer to
print(response["result"])

Complete Workflow Example

Here’s a complete workflow showing how to use multiple features together:

from langchain_arangodb import ArangoGraphQAChain, ArangoGraph, ArangoChatMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from arango import ArangoClient

# 1. Setup Database Connection
client = ArangoClient()
db = client.db("movies_db", username="user", password="pass")

# 2. Initialize Graph
graph = ArangoGraph(db)

# 3. Create Collections and Sample Data
if not db.has_collection("Movies"):
    movies = db.create_collection("Movies")
    movies.insert({"_key": "matrix", "title": "The Matrix", "year": 1999})

if not db.has_collection("Actors"):
    actors = db.create_collection("Actors")
    actors.insert({"_key": "keanu", "name": "Keanu Reeves"})

if not db.has_collection("ActedIn"):
    acted_in = db.create_collection("ActedIn", edge=True)
    acted_in.insert({
        "_from": "Actors/keanu",
        "_to": "Movies/matrix"
    })

# 4. Refresh Schema
graph.refresh_schema()

# 5. Initialize Chain with Advanced Features
llm = ChatOpenAI(temperature=0)

# 6. Initialize Embedding Model (required for query cache)
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# 7. Initialize chat message history (required for chat history)
history = ArangoChatMessageHistory(
    session_id="user_123",
    db=db,
    collection_name="chat_sessions"
)

# 8. Initialize Chain with Advanced Features
chain = ArangoGraphQAChain.from_llm(
    llm=llm,
    graph=graph,
    allow_dangerous_requests=True,
    top_k=5,
    force_read_only_query=True,
    return_aql_query=True,
    return_aql_result=True,
    output_list_limit=20,
    output_string_limit=200,
    use_query_cache=True,
    embedding=embedding,
    query_cache_collection_name="Queries",
    query_cache_similarity_threshold=0.80,
    include_history=True,
    chat_history_store=history,
    max_history_messages=10
)

# 9. Run Multiple Queries
queries = [
    "Who acted in The Matrix?",
    "Who starred in The Matrix?",
    "What is the last name of this actor?"
    "What movies were released in 1999?",
    "List all actors in the database"
]

for query in queries:
    print(f"\nProcessing query: {query}")
    response = chain.invoke(query)

    print("AQL Query:", response["aql_query"])
    print("Raw Results:", response["aql_result"])
    print("Final Answer:", response["result"])
    chain.cache_query()
    print("-" * 50)

Security Considerations

  1. Always use appropriate database credentials with minimal required permissions

  2. Be cautious with allow_dangerous_requests=True

  3. Use force_read_only_query=True when only read operations are needed

  4. Monitor and log query execution in production environments

  5. Regularly review and update AQL examples to prevent injection risks

Error Handling

The chain includes built-in error handling:

try:
    response = chain.invoke("Find all movies")
except ValueError as e:
    if "Maximum amount of AQL Query Generation attempts" in str(e):
        print("Failed to generate valid AQL after multiple attempts")
    elif "Write operations are not allowed" in str(e):
        print("Attempted write operation in read-only mode")
    else:
        print(f"Other error: {e}")

The chain will automatically attempt to fix invalid AQL queries up to max_aql_generation_attempts times (default: 3) before raising an error.