API Reference

This section provides detailed API documentation for all modules in LangChain ArangoDB.

Vector Stores

class langchain_arangodb.vectorstores.arangodb_vector.SearchType(*values)[source]

Bases: str, Enum

Enumerator of the search types.

VECTOR = 'vector'
HYBRID = 'hybrid'
class langchain_arangodb.vectorstores.arangodb_vector.ArangoVector(embedding: Embeddings, embedding_dimension: int, database: StandardDatabase, collection_name: str = 'documents', search_type: SearchType = SearchType.VECTOR, embedding_field: str = 'embedding', text_field: str = 'text', vector_index_name: str = 'vector_index', distance_strategy: DistanceStrategy = DistanceStrategy.COSINE, num_centroids: int = 1, relevance_score_fn: Callable[[float], float] | None = None, keyword_index_name: str = 'keyword_index', keyword_analyzer: str = 'text_en', rrf_constant: int = 60, rrf_search_limit: int = 100)[source]

Bases: VectorStore

ArangoDB vector store implementation for LangChain.

This class provides a vector store implementation using ArangoDB as the backend. It supports both vector similarity search and hybrid search (vector + keyword) capabilities.

Parameters:
  • embedding (langchain.embeddings.base.Embeddings) – The embedding function to use for converting text to vectors. Must implement the langchain.embeddings.base.Embeddings interface.

  • embedding_dimension (int) – The dimensionality of the embedding vectors. Must match the output dimension of the embedding function.

  • database (arango.database.StandardDatabase) – The ArangoDB database instance to use for storage and retrieval.

  • collection_name (str) – The name of the ArangoDB collection to store documents. Defaults to “documents”.

  • search_type (SearchType) – The type of search to perform. Can be either SearchType.VECTOR for pure vector similarity search or SearchType.HYBRID for combining vector and keyword search. Defaults to SearchType.VECTOR.

  • embedding_field (str) – The field name in the document to store the embedding vector Defaults to “embedding”.

  • text_field (str) – The field name in the document to store the text content. Defaults to “text”.

  • vector_index_name (str) – The name of the vector index to create in ArangoDB. This index enables efficient vector similarity search. Defaults to “vector_index”.

  • distance_strategy (DistanceStrategy) – The distance metric to use for vector similarity. Can be either DistanceStrategy.COSINE or DistanceStrategy.EUCLIDEAN_DISTANCE. Defaults to DistanceStrategy.COSINE.

  • num_centroids (int) – The number of centroids to use for the vector index. Higher values can improve search accuracy but increase memory usage. Defaults to 1.

  • relevance_score_fn (Optional[Callable[[float], float]]) – Optional function to normalize the relevance score. If not provided, uses the default normalization for the distance strategy.

  • keyword_index_name (str) – The name of the ArangoDB View created to enable Full-Text-Search capabilities. Only used if search_type is set to SearchType.HYBRID. Defaults to “keyword_index”.

  • keyword_analyzer (str) – The text analyzer to use for keyword search. Must be one of the supported analyzers in ArangoDB. Defaults to “text_en”.

  • rrf_constant (int) – The constant used in Reciprocal Rank Fusion (RRF) for hybrid search. Higher values give more weight to lower-ranked results. Defaults to 60.

  • rrf_search_limit (int) – The maximum number of results to consider in RRF scoring. Defaults to 100.

property embeddings: Embeddings

Access the query embedding object if available.

retrieve_vector_index() dict[str, Any] | None[source]

Retrieve the vector index from the collection.

create_vector_index() None[source]

Create the vector index on the collection.

delete_vector_index() None[source]

Delete the vector index from the collection.

retrieve_keyword_index() dict[str, Any] | None[source]

Retrieve the keyword index from the collection.

create_keyword_index() None[source]

Create the keyword index on the collection.

delete_keyword_index() None[source]

Delete the keyword index from the collection.

add_embeddings(texts: Iterable[str], embeddings: List[List[float]], metadatas: List[dict] | None = None, ids: List[str] | None = None, batch_size: int = 500, use_async_db: bool = False, insert_text: bool = True, **kwargs: Any) List[str][source]

Add embeddings to the vectorstore.

add_texts(texts: Iterable[str], metadatas: List[dict] | None = None, ids: List[str] | None = None, **kwargs: Any) List[str][source]

Add texts to the vector store.

This method embeds the provided texts using the embedding function and stores them in ArangoDB along with their embeddings and metadata.

Parameters:
  • texts (Iterable[str]) – An iterable of text strings to add to the vector store.

  • metadatas (Optional[List[dict]]) – Optional list of metadata dictionaries to associate with each text. Each dictionary can contain arbitrary key-value pairs that will be stored alongside the text and embedding.

  • ids (Optional[List[str]]) – Optional list of unique identifiers for each text. If not provided, IDs will be generated using a hash of the text content.

  • kwargs (Any) – Additional keyword arguments passed to add_embeddings.

Returns:

List of document IDs that were added to the vector store.

Return type:

List[str]

# Add simple texts
texts = ["hello world", "hello arango", "test document"]
ids = vector_store.add_texts(texts)
print(f"Added {len(ids)} documents")

# Add texts with metadata
texts = ["Machine learning tutorial", "Python programming guide"]
metadatas = [
    {"category": "AI", "difficulty": "beginner"},
    {"category": "Programming", "difficulty": "intermediate"}
]
ids = vector_store.add_texts(texts, metadatas=metadatas)

# Add texts with custom IDs
texts = ["Document 1", "Document 2"]
custom_ids = ["doc_001", "doc_002"]
ids = vector_store.add_texts(texts, ids=custom_ids)
similarity_search(query: str, k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, embedding: List[float] | None = None, filter_clause: str = '', search_type: SearchType | None = None, vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool | None = None, **kwargs: Any) List[Document]

Search for similar documents using vector similarity or hybrid search.

This method performs a similarity search using either pure vector similarity or a hybrid approach combining vector and keyword search. The search type can be overridden for individual queries.

Parameters:
  • query (str) – The text query to search for.

  • k (int) – The number of most similar documents to return. Defaults to 4.

  • return_fields (set[str]) – Set of additional document fields to return in results. The _key and text fields are always returned.

  • use_approx (bool) – Whether to use approximate nearest neighbor search. Enables faster but potentially less accurate results. Defaults to True.

  • embedding (Optional[List[float]]) – Optional pre-computed embedding for the query. If not provided, the query will be embedded using the embedding function.

  • filter_clause (str) – Optional AQL filter clause to apply to the search. Can be used to filter results based on document properties.

  • search_type (Optional[SearchType]) – Override the default search type for this query. Can be either SearchType.VECTOR or SearchType.HYBRID.

  • vector_weight (float) – Weight to apply to vector similarity scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_weight (float) – Weight to apply to keyword search scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_search_clause (str) – Optional AQL filter clause to apply Full Text Search. If empty, a default search clause will be used.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved. If specified, the metadata will be added to the Document.metadata field.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

  • kwargs (Any) – Additional keyword arguments.

Returns:

List of Document objects if stream is None or False, Iterator if stream=True.

Return type:

Union[List[Document], Iterator[Document]]

# Simple vector search (batch mode)
results = vector_store.similarity_search("hello", k=1)
print(results[0].page_content)

# Search with metadata filtering (batch mode)
results = vector_store.similarity_search(
    "machine learning",
    k=2,
    filter_clause="doc.category == 'AI'",
    return_fields={"category", "difficulty"}
)

# Hybrid search with custom weights (batch mode)
results = vector_store.similarity_search(
    "neural networks",
    k=3,
    search_type=SearchType.HYBRID,
    vector_weight=0.8,
    keyword_weight=0.2
)

# Streaming mode (memory efficient for large k)
for doc in vector_store.similarity_search(
    "query", k=10000, stream=True
):
    process_document(doc)
similarity_search_with_score(query: str, k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, embedding: List[float] | None = None, filter_clause: str = '', search_type: SearchType | None = None, vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool = True) Iterator[tuple[Document, float]][source]
similarity_search_with_score(query: str, k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, embedding: List[float] | None = None, filter_clause: str = '', search_type: SearchType | None = None, vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool | None = None) List[tuple[Document, float]]

Search for similar documents and return their similarity scores.

Similar to similarity_search but returns a tuple of (Document, score) for each result. The score represents the similarity between the query and the document.

Parameters:
  • query (str) – The text query to search for.

  • k (int) – The number of most similar documents to return. Defaults to 4.

  • return_fields (set[str]) – Set of additional document fields to return in results. The _key and text fields are always returned.

  • use_approx (bool) – Whether to use approximate nearest neighbor search. Enables faster but potentially less accurate results. Defaults to True.

  • embedding (Optional[List[float]]) – Optional pre-computed embedding for the query. If not provided, the query will be embedded using the embedding function.

  • filter_clause (str) – Optional AQL filter clause to apply to the search. Can be used to filter results based on document properties.

  • search_type (Optional[SearchType]) – Override the default search type for this query. Can be either SearchType.VECTOR or SearchType.HYBRID.

  • vector_weight (float) – Weight to apply to vector similarity scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_weight (float) – Weight to apply to keyword search scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_search_clause (str) – Optional AQL filter clause to apply Full Text Search. If empty, a default search clause will be used.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

Returns:

List of tuples containing (Document, score) pairs if stream is None or False, Iterator if stream=True.

Return type:

Union[List[tuple[Document, float]], Iterator[tuple[Document, float]]]

# Batch mode (default)
results = vector_store.similarity_search_with_score("query", k=100)
for doc, score in results:
    print(f"Score: {score}, Content: {doc.page_content[:50]}")

# Streaming mode (memory efficient)
for doc, score in vector_store.similarity_search_with_score(
    "query", k=10000, stream=True
):
    process_document(doc, score)
similarity_search_by_vector(embedding: List[float], k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, filter_clause: str = '', metadata_clause: str = '', stream: bool = True, **kwargs: Any) Iterator[Document][source]
similarity_search_by_vector(embedding: List[float], k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, filter_clause: str = '', metadata_clause: str = '', stream: bool | None = None, **kwargs: Any) List[Document]

Return docs most similar to embedding vector.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • return_fields (set[str]) – Fields to return in the result. For example, {“foo”, “bar”} will return the “foo” and “bar” fields of the document, in addition to the _key & text field. Defaults to an empty set.

  • use_approx (bool) – Whether to use approximate vector search via ANN. Defaults to True. If False, exact vector search will be used.

  • filter_clause (str) – Filter clause to apply to the query.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved. If specified, the metadata will be added to the Document.metadata field.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

  • kwargs (Any) – Additional keyword arguments.

Returns:

List of Documents if stream is None or False, Iterator if stream=True.

Return type:

Union[List[Document], Iterator[Document]]

# Batch mode (default)
docs = vector_store.similarity_search_by_vector(embedding, k=100)

# Streaming mode (memory efficient)
for doc in vector_store.similarity_search_by_vector(
    embedding, k=10000, stream=True
):
    process_document(doc)
similarity_search_by_vector_and_keyword(query: str, embedding: List[float], k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, filter_clause: str = '', vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool = True) Iterator[Document][source]
similarity_search_by_vector_and_keyword(query: str, embedding: List[float], k: int = 4, return_fields: set[str] = set(), use_approx: bool = True, filter_clause: str = '', vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool | None = None) List[Document]

Return docs most similar to query using hybrid search.

Parameters:
  • query (str) – Query text to search for.

  • embedding (List[float]) – Embedding vector for the query.

  • k (int) – Number of Documents to return. Defaults to 4.

  • return_fields (set[str]) – Fields to return in the result. For example, {“foo”, “bar”} will return the “foo” and “bar” fields of the document, in addition to the _key & text field. Defaults to an empty set.

  • use_approx (bool) – Whether to use approximate vector search via ANN. Defaults to True. If False, exact vector search will be used.

  • filter_clause (str) – Filter clause to apply to the query.

  • vector_weight (float) – Weight to apply to vector similarity scores in hybrid search. Defaults to 1.0.

  • keyword_weight (float) – Weight to apply to keyword search scores in hybrid search. Defaults to 1.0.

  • keyword_search_clause (str) – Optional AQL filter clause to apply Full Text Search. If empty, a default search clause will be used.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved. If specified, the metadata will be added to the Document.metadata field.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

Returns:

List of Documents if stream is None or False, Iterator if stream=True.

Return type:

Union[List[Document], Iterator[Document]]

# Batch mode (default)
docs = vector_store.similarity_search_by_vector_and_keyword(
    query, embedding, k=100
)

# Streaming mode (memory efficient)
for doc in vector_store.similarity_search_by_vector_and_keyword(
    query, embedding, k=10000, stream=True
):
    process_document(doc)
similarity_search_by_vector_with_score(embedding: List[float], k: int = 4, return_fields: set[str] = {}, use_approx: bool = True, filter_clause: str = '', metadata_clause: str = '', stream: bool | None = None) List[tuple[Document, float]] | Iterator[tuple[Document, float]][source]

Return docs most similar to embedding vector with scores.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • return_fields (set[str]) – Fields to return in the result. For example, {“foo”, “bar”} will return the “foo” and “bar” fields of the document, in addition to the _key & text field. Defaults to an empty set.

  • use_approx (bool) – Whether to use approximate vector search via ANN. Defaults to True. If False, exact vector search will be used.

  • filter_clause (str) – Filter clause to apply to the query.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved. If specified, the metadata will be added to the Document.metadata field.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

Returns:

List of tuples containing (Document, score) pairs if stream is None or False, Iterator if stream=True.

Return type:

Union[List[tuple[Document, float]], Iterator[tuple[Document, float]]]

# Batch mode (default)
results = vector_store.similarity_search_by_vector_with_score(
    embedding, k=100
)

# Streaming mode (memory efficient)
for doc, score in vector_store.similarity_search_by_vector_with_score(
    embedding, k=10000, stream=True
):
    process_document(doc, score)
similarity_search_by_vector_and_keyword_with_score(query: str, embedding: List[float], k: int = 4, return_fields: set[str] = {}, use_approx: bool = True, filter_clause: str = '', vector_weight: float = 1.0, keyword_weight: float = 1.0, keyword_search_clause: str = '', metadata_clause: str = '', stream: bool | None = None) List[tuple[Document, float]] | Iterator[tuple[Document, float]][source]

Run hybrid similarity search combining vector and keyword search with scores.

Parameters:
  • query (str) – Query text to search for.

  • embedding (List[float]) – Embedding vector for the query.

  • k (int) – Number of results to return. Defaults to 4.

  • return_fields (set[str]) – Fields to return in the result. For example, {“foo”, “bar”} will return the “foo” and “bar” fields of the document, in addition to the _key & text field. Defaults to an empty set.

  • use_approx (bool) – Whether to use approximate vector search via ANN. Defaults to True. If False, exact vector search will be used.

  • filter_clause (str) – Filter clause to apply to the query.

  • vector_weight (float) – Weight to apply to vector similarity scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_weight (float) – Weight to apply to keyword search scores in hybrid search. Only used when search_type is SearchType.HYBRID. Defaults to 1.0.

  • keyword_search_clause (str) – Optional AQL filter clause to apply Full Text Search. If empty, a default search clause will be used.

  • metadata_clause (str) – Optional AQL clause to return additional metadata once the top k results are retrieved. If specified, the metadata will be added to the Document.metadata field.

  • stream (Optional[bool]) – If True, returns an iterator that yields results one at a time. This reduces memory usage for large k values. If None or False, returns all results as a list. Defaults to None (batch mode).

Returns:

List of tuples containing (Document, score) pairs if stream is None or False, Iterator if stream=True.

Return type:

Union[List[tuple[Document, float]], Iterator[tuple[Document, float]]]

# Batch mode (default)
results = vector_store.similarity_search_by_vector_and_keyword_with_score(
    query, embedding, k=100
)

# Streaming mode (memory efficient)
for doc, score in (
    vector_store.similarity_search_by_vector_and_keyword_with_score(
        query, embedding, k=10000, stream=True
    )
):
    process_document(doc, score)
delete(ids: List[str] | None = None, **kwargs: Any) bool | None[source]

Delete by vector ID or other criteria.

Parameters:
  • ids (Optional[List[str]]) – List of ids to delete.

  • kwargs (Any) – Other keyword arguments that can be used to delete vectors.

Returns:

True if deletion is successful, None if no ids are provided, or raises an exception if an error occurs.

Return type:

Optional[bool]

get_by_ids(ids: Sequence[str], /) list[Document][source]

Get documents by their IDs.

Parameters:

ids (Sequence[str]) – List of ids to get.

Returns:

List of Documents with the given ids.

Return type:

list[Document]

Search for documents using Maximal Marginal Relevance (MMR).

MMR optimizes for both similarity to the query and diversity among the results. It helps avoid returning redundant or very similar documents by balancing relevance and diversity in the selection process.

Parameters:
  • query (str) – The text query to search for.

  • k (int) – The number of documents to return. Defaults to 4.

  • fetch_k (int) – The number of documents to fetch for MMR selection. Should be larger than k to allow for diversity selection. Defaults to 20.

  • lambda_mult (float) – Controls the diversity vs relevance tradeoff. Values between 0 and 1, where 0 = maximum diversity, 1 = maximum relevance. Defaults to 0.5.

  • return_fields (set[str]) – Set of additional document fields to return in results. The _key and text fields are always returned.

  • use_approx (bool) – Whether to use approximate nearest neighbor search. Enables faster but potentially less accurate results. Defaults to True.

  • embedding (Optional[List[float]]) – Optional pre-computed embedding for the query. If not provided, the query will be embedded using the embedding function.

  • kwargs (Any) – Additional keyword arguments passed to the search methods.

Returns:

List of Document objects selected by MMR algorithm.

Return type:

List[Document]

# Search with balanced diversity and relevance
results = vector_store.max_marginal_relevance_search(
    "machine learning",
    k=3,
    fetch_k=10
)

# Emphasize diversity over relevance
diverse_results = vector_store.max_marginal_relevance_search(
    "neural networks",
    k=5,
    fetch_k=20,
    lambda_mult=0.1  # More diverse results
)

# Emphasize relevance over diversity
relevant_results = vector_store.max_marginal_relevance_search(
    "deep learning",
    k=3,
    fetch_k=15,
    lambda_mult=0.9  # More relevant results
)
classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: List[dict] | None = None, database: StandardDatabase | None = None, collection_name: str = 'documents', search_type: SearchType = SearchType.VECTOR, embedding_field: str = 'embedding', text_field: str = 'text', vector_index_name: str = 'vector_index', distance_strategy: DistanceStrategy = DistanceStrategy.COSINE, num_centroids: int = 1, ids: List[str] | None = None, overwrite_index: bool = False, insert_text: bool = True, keyword_index_name: str = 'keyword_index', keyword_analyzer: str = 'text_en', rrf_constant: int = 60, rrf_search_limit: int = 100, **kwargs: Any) ArangoVector[source]

Create an ArangoVector instance from a list of texts.

This is a convenience method that creates a new ArangoVector instance, embeds the provided texts, and stores them in ArangoDB.

Parameters:
  • texts (List[str]) – List of text strings to add to the vector store.

  • embedding (langchain.embeddings.base.Embeddings) – The embedding function to use for converting text to vectors.

  • metadatas (Optional[List[dict]]) – Optional list of metadata dictionaries to associate with each text.

  • database (Optional[arango.database.StandardDatabase]) – The ArangoDB database instance to use.

  • collection_name (str) – The name of the ArangoDB collection to use. Defaults to “documents”.

  • search_type (SearchType) – The type of search to perform. Can be either SearchType.VECTOR or SearchType.HYBRID. Defaults to SearchType.VECTOR.

  • embedding_field (str) – The field name to store embeddings. Defaults to “embedding”.

  • text_field (str) – The field name to store text content. Defaults to “text”.

  • vector_index_name (str) – The name of the vector index. Defaults to “vector_index”.

  • distance_strategy (DistanceStrategy) – The distance metric to use. Can be DistanceStrategy.COSINE or DistanceStrategy.EUCLIDEAN_DISTANCE. Defaults to DistanceStrategy.COSINE.

  • num_centroids (int) – Number of centroids for vector index. Defaults to 1.

  • ids (Optional[List[str]]) – Optional list of unique identifiers for each text.

  • overwrite_index (bool) – Whether to delete and recreate existing indexes. Defaults to False.

  • insert_text (bool) – Whether to store the text content in the database. Required for hybrid search. Defaults to True.

  • keyword_index_name (str) – Name of the keyword search index. Defaults to “keyword_index”.

  • keyword_analyzer (str) – Text analyzer for keyword search. Defaults to “text_en”.

  • rrf_constant (int) – Constant for RRF scoring in hybrid search. Defaults to 60.

  • rrf_search_limit (int) – Maximum results for RRF scoring. Defaults to 100.

  • kwargs (Any) – Additional keyword arguments passed to the constructor.

Returns:

A new ArangoVector instance with the texts embedded and stored.

Return type:

ArangoVector

from arango import ArangoClient
from langchain_arangodb.vectorstores import ArangoVector
from langchain_community.embeddings import OpenAIEmbeddings

# Connect to ArangoDB
client = ArangoClient("http://localhost:8529")
db = client.db("test", username="root", password="openSesame")

# Create vector store from texts
texts = ["hello world", "hello arango", "test document"]
metadatas = [{"source": "doc1"}, {"source": "doc2"}, {"source": "doc3"}]

vector_store = ArangoVector.from_texts(
    texts=texts,
    embedding=OpenAIEmbeddings(),
    metadatas=metadatas,
    database=db,
    collection_name="test_collection"
)

# Create hybrid search store
hybrid_store = ArangoVector.from_texts(
    texts=["Machine learning algorithms", "Deep neural networks"],
    embedding=OpenAIEmbeddings(),
    database=db,
    search_type=SearchType.HYBRID,
    collection_name="hybrid_docs",
    overwrite_index=True  # Clean start
)
classmethod from_existing_collection(collection_name: str, text_properties_to_embed: List[str], embedding: Embeddings, database: StandardDatabase, embedding_field: str = 'embedding', text_field: str = 'text', vector_index_name: str = 'vector_index', batch_size: int = 1000, aql_return_text_query: str = '', insert_text: bool = False, skip_existing_embeddings: bool = False, search_type: SearchType = SearchType.VECTOR, keyword_index_name: str = 'keyword_index', keyword_analyzer: str = 'text_en', rrf_constant: int = 60, rrf_search_limit: int = 100, **kwargs: Any) ArangoVector[source]

Create an ArangoVector instance from an existing ArangoDB collection.

This method reads documents from an existing collection, extracts specified text properties, embeds them, and creates a new vector store.

Parameters:
  • collection_name (str) – Name of the existing ArangoDB collection.

  • text_properties_to_embed (List[str]) – List of document properties containing text to embed. These properties will be concatenated to create the text for embedding.

  • embedding (Embeddings) – The embedding function to use for converting text to vectors.

  • database (StandardDatabase) – The ArangoDB database instance to use.

  • embedding_field (str) – The field name to store embeddings. Defaults to “embedding”.

  • text_field (str) – The field name to store text content. Defaults to “text”. Only used if insert_text is True.

  • vector_index_name (str) – The name of the vector index. Defaults to “vector_index”.

  • batch_size (int) – Number of documents to process in each batch. Defaults to 1000.

  • aql_return_text_query (str) – Custom AQL query to extract text from properties. Defaults to “RETURN doc[p]”.

  • insert_text (bool) – Whether to store the concatenated text in the database. Required for hybrid search. Defaults to False.

  • skip_existing_embeddings (bool) – Whether to skip documents that already have embeddings. Defaults to False.

  • search_type (SearchType) – The type of search to perform. Can be either SearchType.VECTOR or SearchType.HYBRID. Defaults to SearchType.VECTOR.

  • keyword_index_name (str) – Name of the keyword search index. Defaults to “keyword_index”.

  • keyword_analyzer (str) – Text analyzer for keyword search. Defaults to “text_en”.

  • rrf_constant (int) – Constant for RRF scoring in hybrid search. Defaults to 60.

  • rrf_search_limit (int) – Maximum results for RRF scoring. Defaults to 100.

  • kwargs (Any) – Additional keyword arguments passed to the constructor.

Returns:

A new ArangoVector instance with embeddings created from the collection.

Return type:

ArangoVector

find_entity_clusters(threshold: float = 0.8, k: int = 4, use_approx: bool = True, use_subset_relations: bool = False, merge_similar_entities: bool = False) List[Dict[str, Any]] | Dict[str, List[Dict[str, Any]]][source]

Find similar documents within the collection for entity resolution.

This method compares documents within the collection to each other and returns entities grouped with their most similar documents. Each entity is returned with a list of the top k most similar entities based on the chosen similarity function. similarity function: [COSINE, EUCLIDEAN_DISTANCE, JACCARD, APPROX_NEAR_COSINE, APPROX_NEAR_L2] NOTE: for JACCARD, use_approx is automatically set to False

Parameters:
  • threshold (float) – Minimum similarity score for documents to be considered similar. Defaults to 0.8.

  • k (int) – Number of similar documents to return for each entity. Defaults to 4.

  • use_approx (bool) – Whether to use approximate nearest neighbor search. Defaults to True.

  • use_subset_relations (bool) – Whether to analyze subset relations. Defaults to False.

  • merge_similar_entities (bool) – Whether to merge similar entities based on subset relationships. Only effective when use_subset_relations=True. When True, merges subset groups into their superset groups to create consolidated, non-overlapping clusters. Defaults to False.

Returns:

Return format depends on parameters:

  • Basic clustering (use_subset_relations=False and merge_similar_entities=False): List[Dict] with format: {‘entity’: entity_key, ‘similar’: [list_of_keys]}

  • With subset analysis (use_subset_relations=True, merge_similar_entities=False): Dict with keys: ‘similar_entities’, ‘subset_relationships’

  • With merging (use_subset_relations=True, merge_similar_entities=True): Dict with keys: ‘similar_entities’, ‘subset_relationships’, ‘merged_entities’

Return type:

Union[List[Dict[str, Any]], Dict[str, List[Dict[str, Any]]]]

Chat Message Histories

class langchain_arangodb.chat_message_histories.arangodb.ArangoChatMessageHistory(session_id: str | int, db: StandardDatabase, collection_name: str = 'ChatHistory', window: int = 3, *args: Any, **kwargs: Any)[source]

Bases: BaseChatMessageHistory

Chat message history stored in an ArangoDB database.

This class provides persistent storage for chat message histories using ArangoDB as the backend. It supports session-based message storage with automatic collection creation and indexing.

Parameters:
  • session_id (Union[str, int]) – Unique identifier for the chat session.

  • db (arango.database.StandardDatabase) – ArangoDB database instance for storing chat messages.

  • collection_name (str) – Name of the ArangoDB collection to store messages. Defaults to “ChatHistory”.

  • window (int) – Maximum number of messages to keep in memory (currently unused). Defaults to 3.

  • args (Any) – Additional positional arguments passed to BaseChatMessageHistory.

  • kwargs (Any) – Additional keyword arguments passed to BaseChatMessageHistory.

from arango import ArangoClient
from langchain_arangodb.chat_message_histories import ArangoChatMessageHistory

# Connect to ArangoDB
client = ArangoClient("http://localhost:8529")
db = client.db("test", username="root", password="openSesame")

# Create chat message history
history = ArangoChatMessageHistory(
    session_id="user_123",
    db=db,
    collection_name="chat_sessions"
)

# Add messages
history.add_user_message("Hello! How are you?")
history.add_ai_message("I'm doing well, thank you!")

# Add QA message
history.add_qa_message(
    user_input="Who is the first character?",
    aql_query="FOR doc IN Characters LIMIT 1 RETURN doc",
    result="The first character is Arya Stark."
)

# Retrieve messages
messages = history.messages
print(f"Found {len(messages)} messages")

# Retrieve messages by role
human_messages = history.get_messages(role="human")
ai_messages = history.get_messages(role="ai")
qa_messages = history.get_messages(role="qa")

# Clear session
history.clear()
property messages: List[BaseMessage]

Retrieve the messages from ArangoDB.

Retrieves all messages for the current session from the ArangoDB collection, sorted by timestamp in descending order (most recent first).

Returns:

List of chat messages for the current session, sorted in reverse chronological order (most recent first).

Return type:

List[BaseMessage]

# Get all messages for the session
messages = history.messages
for msg in messages:
    print(f"{msg.type}: {msg.content}")

# Check if session has any messages
if history.messages:
    print(f"Session has {len(history.messages)} messages")
else:
    print("No messages in this session")
get_messages(role: str | None = None, n_messages: int = 10, excluded_fields: list[str] = ['_id', '_key', '_rev', 'session_id', 'time']) list[source]

Retrieve messages from ArangoDB, optionally filtered by role.

Parameters:
  • role (Optional[str]) – Optional filter to retrieve messages of a specific role.

  • n_messages (int) – Number of messages to retrieve.

  • excluded_fields (list[str]) – Fields to exclude from the returned messages.

# Get all types of messages, default is 10 messages
messages = history.get_messages()

# Get the first 20 human messages
messages = history.get_messages(role="human", n_messages=20)

# Get the first 20 AI messages
messages = history.get_messages(role="ai", n_messages=20)
add_message(message: BaseMessage) None[source]

Append the message to the record in ArangoDB.

Stores a single chat message in the ArangoDB collection associated with the current session. The message is stored with its type, content, and session identifier.

Parameters:

message (BaseMessage) – The chat message to add to the history.

from langchain_core.messages import HumanMessage, AIMessage

# Add user message
user_msg = HumanMessage(content="What is the weather like?")
history.add_message(user_msg)

# Add AI response
ai_msg = AIMessage(content="I don't have access to current weather data.")
history.add_message(ai_msg)

# Or use convenience methods
history.add_user_message("Hello!")
history.add_ai_message("Hi there!")
add_qa_message(user_input: str, aql_query: str, result: str) None[source]

Add a QA message to the chat history.

Parameters:
  • user_input (str) – The user’s input.

  • aql_query (str) – The AQL query to execute.

  • result (str) – The result of the AQL query.

history.add_qa_message(
    user_input="Who is the first character?",
    aql_query="FOR doc IN Characters LIMIT 1 RETURN doc",
    result="The first character is Arya Stark."
)
clear() None[source]

Clear session memory from ArangoDB.

Removes all messages associated with the current session from the ArangoDB collection. The collection itself is preserved for future use.

# Add some messages
history.add_user_message("Hello")
history.add_ai_message("Hi!")
print(f"Messages before clear: {len(history.messages)}")  # Output: 2

# Clear all messages for this session
history.clear()
print(f"Messages after clear: {len(history.messages)}")   # Output: 0

# Collection still exists for future messages
history.add_user_message("Starting fresh conversation")
print(f"New message count: {len(history.messages)}")      # Output: 1

Graph Stores

langchain_arangodb.graphs.arangodb_graph.get_arangodb_client(url: str | None = None, dbname: str | None = None, username: str | None = None, password: str | None = None) Any[source]

Get the Arango DB client from credentials.

Parameters:
  • url (str) – Arango DB url. Can be passed in as named arg or set as environment var ARANGODB_URL. Defaults to “http://localhost:8529”.

  • dbname (str) – Arango DB name. Can be passed in as named arg or set as environment var ARANGODB_DBNAME. Defaults to “_system”.

  • username (str) – Can be passed in as named arg or set as environment var ARANGODB_USERNAME. Defaults to “root”.

  • password (str) – Can be passed in as named arg or set as environment var ARANGODB_PASSWORD. Defaults to “”.

Returns:

An arango.database.StandardDatabase.

Return type:

Any

Raises:
  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

class langchain_arangodb.graphs.arangodb_graph.ArangoGraph(db: StandardDatabase, generate_schema_on_init: bool = True, schema_sample_ratio: float = 0, schema_graph_name: str | None = None, schema_include_examples: bool = True, schema_list_limit: int = 32, schema_string_limit: int = 256, schema_include_views: bool = False, schema_include_indexes: bool = False)[source]

Bases: GraphStore

ArangoDB wrapper for graph operations.

Parameters:
  • db (StandardDatabase) – The ArangoDB database instance.

  • generate_schema_on_init (bool) – Whether to generate the graph schema on initialization. Defaults to True.

  • schema_sample_ratio (float) – The ratio of documents/edges to sample in relation to the Collection size to generate each Collection Schema. If 0, one document/edge is used per Collection. Defaults to 0.

  • schema_graph_name (Optional[str]) – The name of an existing ArangoDB Graph to specifically use to generate the schema. If None, the entire database will be used. Defaults to None.

  • schema_include_examples (bool) – Whether to include example values fetched from a sample documents as part of the schema. Defaults to True. Lists of size higher than schema_list_limit will be excluded from the schema, even if schema_include_examples is set to True. Defaults to True.

  • schema_list_limit (int) – The maximum list size the schema will include as part of the example values. If the list is longer than this limit, a string describing the list will be used in the schema instead. Default is 32.

  • schema_string_limit (int) – The maximum number of characters to include in a string. If the string is longer than this limit, a string describing the string will be used in the schema instead. Default is 256.

  • schema_include_views (bool) – Whether to include ArangoDB Views and Analyzers as part of the schema passed to the AQL Generation prompt. Default is False.

  • schema_include_indexes (bool :return: None :rtype: None :raises ArangoClientError: If the ArangoDB client cannot be created. :raises ArangoServerError: If the ArangoDB server cannot be reached. :raises ArangoCollectionError: If the collection cannot be created.) – Whether to include ArangoDB Indexes as part of the collection schema passed to the AQL Generation prompt. Default is False.

Security note: Make sure that the database connection uses credentials

that are narrowly-scoped to only include necessary permissions. Failure to do so may result in data corruption or loss, since the calling code may attempt commands that would result in deletion, mutation of data if appropriately prompted or reading sensitive data if such data is present in the database. The best way to guard against such negative outcomes is to (as appropriate) limit the permissions granted to the credentials used with this tool.

See https://python.langchain.com/docs/security for more information.

property db: StandardDatabase
property schema: Dict[str, Any]

Returns the schema of the Graph Database as a structured object

property get_structured_schema: Dict[str, Any]

Returns the schema of the Graph Database as a structured object

property schema_json: str

Returns the schema of the Graph Database as a JSON string

Returns:

The schema of the Graph Database as a JSON string

Return type:

str

property schema_yaml: str

Returns the schema of the Graph Database as a YAML string

Returns:

The schema of the Graph Database as a YAML string

Return type:

str

set_schema(schema: Dict[str, Any]) None[source]

Sets a custom schema for the ArangoDB Database.

Parameters:

schema (Dict[str, Any]) – The schema to set.

Returns:

None

Return type:

None

refresh_schema(sample_ratio: float = 0, graph_name: str | None = None, include_examples: bool = True, list_limit: int = 32) None[source]

Refresh the graph schema information.

Parameters:

Parameters:
  • sample_ratio (float) – A float (0 to 1) to determine the ratio of documents/edges sampled in relation to the Collection size to generate each Collection Schema. If 0, one document/edge is used per Collection. Defaults to 0.

  • graph_name (Optional[str]) – The name of an existing ArangoDB Graph to specifically use to generate the schema. If None, the entire database will be used. Defaults to None.

  • include_examples (bool) – Whether to include example values fetched from a sample documents as part of the schema. Defaults to True. Lists of size higher than list_limit will be excluded from the schema, even if schema_include_examples is set to True. Defaults to True.

  • list_limit (int) – The maximum list size the schema will include as part of the example values. If the list is longer than this limit, a string describing the list will be used in the schema instead. Default is 32.

Returns:

None

Return type:

None

Raises:
  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

generate_schema(sample_ratio: float = 0, graph_name: str | None = None, include_examples: bool = True, list_limit: int = 32, schema_string_limit: int = 256, schema_include_views: bool = False, schema_include_indexes: bool = False) Dict[str, List[Dict[str, Any]]][source]

Generates the schema of the ArangoDB Database and returns it

Parameters:
  • sample_ratio (float) – A ratio (0 to 1) to determine the ratio of documents/edges used (in relation to the Collection size) to render each Collection Schema. If 0, one document/edge is used per Collection.

  • graph_name (Optional[str]) – The name of the graph to use to generate the schema. If None, the entire database will be used.

  • include_examples (bool) – A flag whether to scan the database for example values and use them in the graph schema. Default is True.

  • list_limit (int) – The maximum number of elements to include in a list. If the list is longer than this limit, a string describing the list will be used in the schema instead. Default is 32.

  • schema_string_limit (int) – The maximum number of characters to include in a string. If the string is longer than this limit, a string describing the string will be used in the schema instead. Default is 128.

  • schema_include_views (bool) – Whether to include ArangoDB Views and Analyzers as part of the schema passed to the AQL Generation prompt. Default is False.

  • schema_include_indexes (bool) – Whether to include ArangoDB Indexes as part of the collection schema passed to the AQL Generation prompt. Default is False.

Returns:

A dictionary containing the graph schema and collection schema.

Return type:

Dict[str, List[Dict[str, Any]]]

Raises:
  • ValueError – If the sample ratio is not between 0 and 1.

  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

query(query: str, params: dict = {}) List[Any][source]

Execute an AQL query and return the results.

Parameters:
  • query (str) – The AQL query to execute.

  • params (dict) – Additional arguments piped to the function. Defaults to None.

  • list_limit (Optional[int]) – Removes lists above list_limit size that have been returned from the AQL query.

  • string_limit (Optional[int]) – Removes strings above string_limit size that have been returned from the AQL query.

  • remaining_params (Optional[dict]) – Remaining params are passed to the AQL query execution. Defaults to None.

Returns:

A list of dictionaries containing the query results.

Return type:

List[Any]

Raises:
  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

explain(query: str, params: dict = {}) List[Dict[str, Any]][source]

Explain an AQL query without executing it.

Parameters:
  • query (str) – The AQL query to explain.

  • params (dict) – Additional arguments piped to the function. Defaults to None.

Returns:

A list of dictionaries containing the query explanation.

Return type:

List[Dict[str, Any]]

Raises:
  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

add_graph_documents(graph_documents: List[GraphDocument], include_source: bool = False, graph_name: str | None = None, update_graph_definition_if_exists: bool = False, batch_size: int = 1000, use_one_entity_collection: bool = True, insert_async: bool = False, source_collection_name: str | None = None, source_edge_collection_name: str | None = None, entity_collection_name: str | None = None, entity_edge_collection_name: str | None = None, embeddings: Embeddings | None = None, embedding_field: str = 'embedding', embed_source: bool = False, embed_nodes: bool = False, embed_relationships: bool = False, capitalization_strategy: str = 'none') None[source]

Constructs nodes & relationships in the graph based on the provided GraphDocument objects.

Parameters:
  • graph_documents (List[GraphDocument]) – The GraphDocument objects to add to the graph.

  • include_source (bool) – Whether to include the source document in the graph.

  • graph_name (Optional[str]) – The name of the graph to add the documents to.

  • update_graph_definition_if_exists (bool) – Whether to update the graph definition if it already exists.

  • batch_size (int) – The number of documents to process in each batch.

  • use_one_entity_collection (bool) – Whether to use one entity collection for all nodes.

  • insert_async (bool) – Whether to insert the documents asynchronously.

  • source_collection_name (Union[str, None]) – The name of the source collection.

  • source_edge_collection_name (Union[str, None]) – The name of the source edge collection.

  • entity_collection_name (Union[str, None]) – The name of the entity collection.

  • entity_edge_collection_name (Union[str, None]) – The name of the entity edge collection.

  • embeddings (Union[Embeddings, None]) – The embeddings model to use.

  • embedding_field (str) – The field to use for the embedding.

  • embed_source (bool) – Whether to embed the source document.

  • embed_nodes (bool) – Whether to embed the nodes.

  • embed_relationships (bool) – Whether to embed the relationships.

  • capitalization_strategy (str) – The capitalization strategy to use.

Returns:

None

Return type:

None

Raises:
  • ValueError – If the capitalization strategy is not ‘lower’, ‘upper’, or ‘none’.

  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

  • ArangoCollectionError – If the collection cannot be created.

classmethod from_db_credentials(url: str | None = None, dbname: str | None = None, username: str | None = None, password: str | None = None) Any[source]

Convenience constructor that builds Arango DB from credentials.

Parameters:
  • url (str) – Arango DB url. Can be passed in as named arg or set as environment var ARANGODB_URL. Defaults to “http://localhost:8529”.

  • dbname (str) – Arango DB name. Can be passed in as named arg or set as environment var ARANGODB_DBNAME. Defaults to “_system”.

  • username (str) – Can be passed in as named arg or set as environment var ARANGODB_USERNAME. Defaults to “root”.

  • password (str) – Can be passed in as named arg or set as environment var ARANGODB_USERNAME. Defaults to “root”.

Returns:

An arango.database.StandardDatabase.

Return type:

Any

Raises:
  • ArangoClientError – If the ArangoDB client cannot be created.

  • ArangoServerError – If the ArangoDB server cannot be reached.

Chains

Question answering over a graph.

class langchain_arangodb.chains.graph_qa.arangodb.ArangoGraphQAChain(*, name: str | None = None, memory: ~langchain_classic.base_memory.BaseMemory | None = None, callbacks: list[~langchain_core.callbacks.base.BaseCallbackHandler] | ~langchain_core.callbacks.base.BaseCallbackManager | None = None, verbose: bool = <factory>, tags: list[str] | None = None, metadata: dict[str, ~typing.Any] | None = None, callback_manager: ~langchain_core.callbacks.base.BaseCallbackManager | None = None, graph: ~langchain_arangodb.graphs.arangodb_graph.ArangoGraph, embedding: ~langchain_core.embeddings.embeddings.Embeddings | None = None, query_cache_collection_name: str = 'Queries', aql_generation_chain: ~langchain_core.runnables.base.Runnable[~typing.Dict[str, ~typing.Any], ~typing.Any], aql_fix_chain: ~langchain_core.runnables.base.Runnable[~typing.Dict[str, ~typing.Any], ~typing.Any], qa_chain: ~langchain_core.runnables.base.Runnable[~typing.Dict[str, ~typing.Any], ~typing.Any], input_key: str = 'query', output_key: str = 'result', use_query_cache: bool = False, query_cache_similarity_threshold: float = 0.8, include_history: bool = False, max_history_messages: int = 10, chat_history_store: ~langchain_arangodb.chat_message_histories.arangodb.ArangoChatMessageHistory | None = None, top_k: int = 10, aql_examples: str = '', return_aql_query: bool = False, return_aql_result: bool = False, max_aql_generation_attempts: int = 3, execute_aql_query: bool = True, allow_dangerous_requests: bool = False, output_list_limit: int = 32, output_string_limit: int = 256, force_read_only_query: bool = False)[source]

Bases: Chain

Chain for question-answering against a graph by generating AQL statements.

Security note: Make sure that the database connection uses credentials

that are narrowly-scoped to only include necessary permissions. Failure to do so may result in data corruption or loss, since the calling code may attempt commands that would result in deletion, mutation of data if appropriately prompted or reading sensitive data if such data is present in the database. The best way to guard against such negative outcomes is to (as appropriate) limit the permissions granted to the credentials used with this tool.

See https://python.langchain.com/docs/security for more information.

graph: ArangoGraph

The ArangoGraph instance to use for the chain.

embedding: Embeddings | None

Embedding model to use for the chain.

query_cache_collection_name: str

Name of the collection for storing queries.

aql_generation_chain: Runnable[Dict[str, Any], Any]

Chain to use for AQL generation.

aql_fix_chain: Runnable[Dict[str, Any], Any]

Chain to use for AQL fix.

qa_chain: Runnable[Dict[str, Any], Any]

Chain to use for QA.

input_key: str

Key to use for the input.

output_key: str

Key to use for the output.

use_query_cache: bool

Whether to use the query cache.

query_cache_similarity_threshold: float

Similarity threshold for matching cached queries.

include_history: bool

Whether to include the chat history in the prompt.

max_history_messages: int

Maximum number of messages to include in the chat history.

chat_history_store: ArangoChatMessageHistory | None

ArangoChatMessageHistory instance to store chat history.

top_k: int

Number of results to return from the query

aql_examples: str

Specifies the set of AQL Query Examples that promote few-shot-learning

return_aql_query: bool

Specify whether to return the AQL Query in the output dictionary

return_aql_result: bool

Specify whether to return the AQL JSON Result in the output dictionary

max_aql_generation_attempts: int

Specify the maximum amount of AQL Generation attempts that should be made

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

execute_aql_query: bool

If False, the AQL Query is only explained & returned, not executed

allow_dangerous_requests: bool

Forced user opt-in to acknowledge that the chain can make dangerous requests.

output_list_limit: int

Maximum list length to include in the response prompt. Truncated if longer.

output_string_limit: int

Maximum string length to include in the response prompt. Truncated if longer.

force_read_only_query: bool

If True, the query is checked for write operations and raises an error if a write operation is detected.

property input_keys: List[str]

Get the input keys for the chain.

property output_keys: List[str]

Get the output keys for the chain.

classmethod from_llm(llm: BaseLanguageModel, *, qa_prompt: BasePromptTemplate | None = None, aql_generation_prompt: BasePromptTemplate | None = None, aql_fix_prompt: BasePromptTemplate | None = None, **kwargs: Any) ArangoGraphQAChain[source]

Initialize from LLM.

Parameters:
  • llm (BaseLanguageModel) – The large language model to use.

  • embedding (Embeddings) – The embedding model to use.

  • use_query_cache (bool) – If True, enables reuse of similar past queries from cache.

  • query_cache_similarity_threshold (float) – The similarity threshold to consider a query as a match in the cache.

  • query_cache_collection_name (str) – Name of the collection for storing queries.

  • include_history (bool) – If True, includes recent chat history in the prompt to provide context for query generation.

  • max_history_messages (int) – The maximum number of messages to include in the chat history.

  • chat_history_store (ArangoChatMessageHistory) – ArangoChatMessageHistory instance to store chat history.

  • qa_prompt (BasePromptTemplate) – The prompt to use for the QA chain.

  • aql_generation_prompt (BasePromptTemplate) – The prompt to use for the AQL generation chain.

  • aql_fix_prompt (BasePromptTemplate) – The prompt to use for the AQL fix chain.

  • kwargs (Any) – Additional keyword arguments.

Returns:

The initialized ArangoGraphQAChain.

Return type:

ArangoGraphQAChain

Raises:

ValueError – If the LLM is not provided.

cache_query(text: str | None = None, aql: str | None = None) str[source]

Cache a query generated by the LLM only if it’s not already stored.

Parameters:
  • text – The text of the query to cache.

  • aql – The AQL query to cache.

Returns:

A message indicating the result of the operation.

clear_query_cache(text: str | None = None) str[source]

Clear the query cache.

Parameters:

text (str) – The text of the query to delete from the cache.

Returns:

A message indicating the result of the operation.

Query Constructors

Utilities

class langchain_arangodb.vectorstores.utils.DistanceStrategy(*values)[source]

Bases: str, Enum

Enumerator of the Distance strategies for calculating distances between vectors.

Note that use_approx is not supported for the following distance strategies: - JACCARD - MAX_INNER_PRODUCT - DOT_PRODUCT

EUCLIDEAN_DISTANCE = 'EUCLIDEAN_DISTANCE'
MAX_INNER_PRODUCT = 'MAX_INNER_PRODUCT'
DOT_PRODUCT = 'DOT_PRODUCT'
JACCARD = 'JACCARD'
COSINE = 'COSINE'