ArangoGraph =========== The ``ArangoGraph`` class provides an interface to interact with ArangoDB for graph operations in LangChain. Installation ------------ .. code-block:: bash pip install langchain-arangodb Basic Usage ----------- .. code-block:: python from langchain_arangodb.graphs.arangodb_graph import ArangoGraph, get_arangodb_client # Connect to ArangoDB db = get_arangodb_client( url="http://localhost:8529", dbname="_system", username="root", password="password" ) # Initialize ArangoGraph graph = ArangoGraph(db) Factory Methods --------------- get_arangodb_client ~~~~~~~~~~~~~~~~~~~~ Creates a connection to ArangoDB. .. code-block:: python from langchain_arangodb.graphs.arangodb_graph import get_arangodb_client # Using direct credentials db = get_arangodb_client( url="http://localhost:8529", dbname="_system", username="root", password="password" ) # Using environment variables # ARANGODB_URL # ARANGODB_DBNAME # ARANGODB_USERNAME # ARANGODB_PASSWORD db = get_arangodb_client() from_db_credentials ~~~~~~~~~~~~~~~~~~~ Alternative constructor that creates an ArangoGraph instance directly from credentials. .. code-block:: python graph = ArangoGraph.from_db_credentials( url="http://localhost:8529", dbname="_system", username="root", password="password" ) Core Methods ------------ add_graph_documents ~~~~~~~~~~~~~~~~~~~ Adds graph documents to the database. .. code-block:: python from langchain_core.documents import Document from langchain_arangodb.graphs.graph_document import GraphDocument, Node, Relationship # Create nodes and relationships nodes = [ Node(id="1", type="Person", properties={"name": "Alice"}), Node(id="2", type="Company", properties={"name": "Acme"}) ] relationship = Relationship( source=nodes[0], target=nodes[1], type="WORKS_AT", properties={"since": 2020} ) # Create graph document doc = GraphDocument( nodes=nodes, relationships=[relationship], source=Document(page_content="Employee record") ) # Add to database graph.add_graph_documents( graph_documents=[doc], include_source=True, graph_name="EmployeeGraph", update_graph_definition_if_exists=True, capitalization_strategy="lower" ) Example: Using LLMGraphTransformer .. code-block:: python from langchain.experimental import LLMGraphTransformer from langchain_core.chat_models import ChatOpenAI from langchain_openai import OpenAIEmbeddings # Text to transform into a graph text = "Bob knows Alice, John knows Bob." # Initialize transformer with ChatOpenAI transformer = LLMGraphTransformer( llm=ChatOpenAI(temperature=0) ) # Create graph document from text graph_doc = transformer.create_graph_doc(text) # Add to ArangoDB with embeddings graph.add_graph_documents( [graph_doc], graph_name="people_graph", use_one_entity_collection=False, # Creates 'Person' node collection and 'KNOWS' edge collection update_graph_definition_if_exists=True, include_source=True, embeddings=OpenAIEmbeddings(), embed_nodes=True # Embeds 'Alice' and 'Bob' nodes ) query ~~~~~ Executes AQL queries against the database. .. code-block:: python # Simple query result = graph.query("FOR doc IN users RETURN doc") # Query with parameters result = graph.query( "FOR u IN users FILTER u.age > @min_age RETURN u", params={"min_age": 21} ) explain ~~~~~~~ Gets the query execution plan. .. code-block:: python plan = graph.explain( "FOR doc IN users RETURN doc" ) Schema Management ----------------- refresh_schema ~~~~~~~~~~~~~~ Updates the internal schema representation. .. code-block:: python graph.refresh_schema( sample_ratio=0.1, # Sample 10% of documents graph_name="MyGraph", include_examples=True ) generate_schema ~~~~~~~~~~~~~~~ Generates a schema representation of the database. .. code-block:: python schema = graph.generate_schema( sample_ratio=0.1, graph_name="MyGraph", include_examples=True, list_limit=32, schema_include_views=True ) set_schema ~~~~~~~~~~ Sets a custom schema. .. code-block:: python custom_schema = { "collections": { "users": {"fields": ["name", "age"]}, "products": {"fields": ["name", "price"]} } } graph.set_schema(custom_schema) Schema Properties ----------------- schema ~~~~~~ Gets the current schema as a dictionary. .. code-block:: python current_schema = graph.schema schema_json ~~~~~~~~~~~~ Gets the schema as a JSON string. .. code-block:: python schema_json = graph.schema_json schema_yaml ~~~~~~~~~~~ Gets the schema as a YAML string. .. code-block:: python schema_yaml = graph.schema_yaml get_structured_schema ~~~~~~~~~~~~~~~~~~~~~ Gets the schema in a structured format. .. code-block:: python structured_schema = graph.get_structured_schema Internal Utility Methods ------------------------ These methods are used internally but may be useful for advanced use cases: _sanitize_collection_name ~~~~~~~~~~~~~~~~~~~~~~~~~ Sanitizes collection names to be valid in ArangoDB. .. code-block:: python safe_name = graph._sanitize_collection_name("My Collection!") # Returns: "My_Collection_" _sanitize_input ~~~~~~~~~~~~~~~~ Sanitizes input data by truncating long strings and lists. .. code-block:: python sanitized = graph._sanitize_input( {"list": [1,2,3,4,5,6]}, list_limit=5, string_limit=100 ) _hash ~~~~~ Generates a hash string for a value. .. code-block:: python hash_str = graph._hash("some value") _process_source ~~~~~~~~~~~~~~~~ Processes a source document for storage. .. code-block:: python from langchain_core.documents import Document source = Document( page_content="test content", metadata={"author": "Alice"} ) source_id = graph._process_source( source=source, source_collection_name="sources", source_embedding=[0.1, 0.2, 0.3], embedding_field="embedding", insertion_db=db ) _import_data ~~~~~~~~~~~~~ Bulk imports data into collections. .. code-block:: python data = { "users": [ {"_key": "1", "name": "Alice"}, {"_key": "2", "name": "Bob"} ] } graph._import_data(db, data, is_edge=False) Example Workflow ---------------- Here's a complete example demonstrating a typical workflow using ArangoGraph to create a knowledge graph from documents: .. code-block:: python from langchain_core.documents import Document from langchain_core.embeddings import Embeddings from langchain_arangodb.graphs.arangodb_graph import ArangoGraph, get_arangodb_client from langchain_arangodb.graphs.graph_document import GraphDocument, Node, Relationship # 1. Setup embeddings (example using OpenAI - you can use any embeddings model) from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings() # 2. Connect to ArangoDB and initialize graph db = get_arangodb_client( url="http://localhost:8529", dbname="knowledge_base", username="root", password="password" ) graph = ArangoGraph(db) # 3. Create sample documents with relationships documents = [ Document( page_content="Alice is a software engineer at Acme Corp.", metadata={"source": "employee_records", "date": "2024-01-01"} ), Document( page_content="Bob is a project manager working with Alice on Project X.", metadata={"source": "project_docs", "date": "2024-01-02"} ) ] # 4. Create nodes and relationships for each document graph_documents = [] for doc in documents: # Extract entities and relationships (simplified example) if "Alice" in doc.page_content: alice_node = Node(id="alice", type="Person", properties={"name": "Alice", "role": "Software Engineer"}) company_node = Node(id="acme", type="Company", properties={"name": "Acme Corp"}) works_at_rel = Relationship( source=alice_node, target=company_node, type="WORKS_AT" ) graph_doc = GraphDocument( nodes=[alice_node, company_node], relationships=[works_at_rel], source=doc ) graph_documents.append(graph_doc) if "Bob" in doc.page_content: bob_node = Node(id="bob", type="Person", properties={"name": "Bob", "role": "Project Manager"}) project_node = Node(id="project_x", type="Project", properties={"name": "Project X"}) manages_rel = Relationship( source=bob_node, target=project_node, type="MANAGES" ) works_with_rel = Relationship( source=bob_node, target=alice_node, type="WORKS_WITH" ) graph_doc = GraphDocument( nodes=[bob_node, project_node], relationships=[manages_rel, works_with_rel], source=doc ) graph_documents.append(graph_doc) # 5. Add documents to the graph with embeddings graph.add_graph_documents( graph_documents=graph_documents, include_source=True, # Store original documents graph_name="CompanyGraph", update_graph_definition_if_exists=True, embed_source=True, # Generate embeddings for documents embed_nodes=True, # Generate embeddings for nodes embed_relationships=True, # Generate embeddings for relationships embeddings=embeddings, batch_size=100, capitalization_strategy="lower" ) # 6. Query the graph # Find all people who work at Acme Corp employees = graph.query(""" FOR v, e IN 1..1 OUTBOUND (FOR c IN ENTITY FILTER c.type == 'Company' AND c.name == 'Acme Corp' RETURN c)._id ENTITY_EDGE RETURN { name: v.name, role: v.role, company: 'Acme Corp' } """) # Find all projects and their managers projects = graph.query(""" FOR v, e IN 1..1 INBOUND (FOR p IN ENTITY FILTER p.type == 'Project' RETURN p)._id ENTITY_EDGE FILTER e.type == 'MANAGES' RETURN { project: v.name, manager: e._from } """) # 7. Generate and inspect schema schema = graph.generate_schema( sample_ratio=1.0, # Use all documents for schema graph_name="CompanyGraph", include_examples=True, schema_include_views=True ) print("Schema:", schema) # 8. Error handling for queries try: # Complex query with potential for errors result = graph.query(""" FOR v, e, p IN 1..3 OUTBOUND (FOR p IN ENTITY FILTER p.name == 'Alice' RETURN p)._id ENTITY_EDGE RETURN p """) except ArangoServerError as e: print(f"Query error: {e}") This workflow demonstrates: 1. Setting up the environment with embeddings 2. Connecting to ArangoDB 3. Creating documents with structured relationships 4. Adding documents to the graph with embeddings 5. Querying the graph using AQL 6. Schema management 7. Error handling The example creates a simple company knowledge graph with: - People (employees) - Companies - Projects - Various relationships (WORKS_AT, MANAGES, WORKS_WITH) - Document sources with embeddings Key Features Used: - Document embedding - Node and relationship embedding - Source document storage - Graph schema management - AQL queries - Error handling - Batch processing Best Practices -------------- 1. Always use appropriate capitalization strategy for consistency 2. Use batch operations for large data imports 3. Consider using embeddings for semantic search capabilities 4. Implement proper error handling for database operations 5. Use schema management for better data organization Error Handling -------------- .. code-block:: python from arango.exceptions import ArangoServerError try: result = graph.query("FOR doc IN nonexistent RETURN doc") except ArangoServerError as e: print(f"Database error: {e}")