Retrievers Gallery¶
Explore the retrievers supported by the ResearchAgent
of the AI Refinery SDK, designed to fetch relevant information from various sources based on user queries. Supported retrievers include:
WebSearchRetriever
: Access real-time web data.AzureAISearchRetriever
: Perform semantic search over Azure hosted vector database index.ElasticSearchRetriever
: Employ Elasticsearch for scalable search solutions.CustomRetriever
: Create you own retrievers, tailored for specific needs.
WebSearchRetriever
¶
The WebSearchRetriever
is designed to perform web searches using external search engines. The currently supported search engine is Google Search. It is ideal for retrieving the latest information public information from the internet.
Configuration Template¶
Here is the configuration template for the WebSearchRetriever
:
- retriever_name: <your-retriever-name> # Required: A custom name for this retriever instance
retriever_class: WebSearchRetriever # Required: Specifies use of the web search retriever
description: <optional-description> # Optional: Brief description of what this retriever is used for
query_transformation_examples: # Optional: Helps transform complex user queries into effective web search queries
- user_query: <example-user-query>
query:
- <transformed-query-1>
- <transformed-query-2>
source_weight: <weight> # Optional: Importance weight relative to other retrievers (default: 1.0)
Use Case¶
The WebSearchRetriever
is well-suited for retrieving publicly available information from the open internet, similar to a traditional search engine. Typical use cases include:
- General knowledge and fact-finding
- News updates and trending topics
- Technical explanations or documentation
- Comparative research on tools, services, or ideas
- Any query requiring up-to-date or web-accessible content
AzureAISearchRetriever
¶
The AzureAISearchRetriever
is designed to perform vector-based searches over an index hosted on Azure. It is ideal for retrieving information from pre-indexed datasets.
Configuration Template¶
Here are the configuration template for the AzureAISearchRetriever
:
- retriever_name: <your-retriever-name> # Required: A custom name for this retriever instance
retriever_class: AzureAISearchRetriever # Required: Use this retriever for Azure-hosted vector search
description: <optional-description> # Optional: Brief explanation of what this retriever is used for
aisearch_config:
base_url: <your-base-url> # Required: Base URL of your Azure vector search endpoint
api_key: <your-api-key> # Required: Azure AISearch service API key
index: <your-index-name> # Required: Name of the vector index to search
embedding_column: <embedding-column-name> # Required: Column in your index containing embedded data
embedding_config:
model: <embedding-model-name> # Required: Must match the model used during indexing
top_k: <number-of-results> # Optional: Number of top documents to retrieve
content_column: # Required: Column(s) containing retrievable content
- <content-column-1>
- <content-column-2>
aggregate_column: <optional-aggregate-column> # Optional: Used to group chunks by document
meta_data: # Optional: Metadata fields to enrich the response
- column_name: <source-column-name> # Required within meta_data
load_name: <display-name> # Required within meta_data
query_transformation_examples: # Optional: User-to-search query examples for improved relevance
- user_query: <example-user-query>
query:
- <transformed-query-1>
- <transformed-query-2>
source_weight: <weight-value> # Optional: Importance weight relative to other retrievers (default: 1.0)
Use Case¶
The AzureAISearchRetriever
is ideal for retrieving information from pre-indexed datasets via semantic search. It's best used in scenarios such as:
- Internal knowledge base queries
- Organizational content search
- Semantic search over embedded data
ElasticSearchRetriever
¶
The ElasticSearchRetriever
is designed to perform vector-based searches over an index hosted in ElasticSearch. It also works well for retrieving information from structured or pre-indexed datasets.
Configuration Template¶
Here is the configuration template for the ElasticSearchRetriever
:
- retriever_name: <your-retriever-name> # Required: A custom name for this retriever instance
retriever_class: ElasticSearchRetriever # Required: Use this retriever for ElasticSearch-based vector search
description: <optional-description> # Optional: Brief explanation of what this retriever is used for
elasticsearch_config:
base_url: <your-elasticsearch-url> # Required: Endpoint of your ElasticSearch service
api_key: <your-api-key> # Required: Service API key
index: <your-index-name> # Required: Name of the ElasticSearch index
embedding_column: <embedding-column-name> # Required: Column storing vector embeddings
embedding_config:
model: <embedding-model-name> # Required: Must match the model used during data embedding
top_k: <number-of-results> # Optional: Number of top documents to retrieve
content_column: # Required: Column(s) containing content to retrieve
- <content-column-1>
- <content-column-2>
aggregate_column: <optional-aggregate-column> # Optional: Group chunks by original document
meta_data: # Optional: Metadata fields to include in results
- column_name: <metadata-field> # Required within meta_data
load_name: <display-label> # Required within meta_data
threshold: <float-between-0-and-1> # Optional: Filters out low-quality chunks (default: 0.9)
query_transformation_examples: # Optional: Transforms user queries for better search performance
- user_query: <example-user-query>
query:
- <transformed-query-1>
- <transformed-query-2>
source_weight: <weight-value> # Optional: Weight of this retriever relative to others (default: 1.0)
Use Case¶
The ElasticSearchRetriever
is ideal for retrieving semantically relevant information from ElasticSearch-hosted content repositories. It excels in use cases such as:
- Internal knowledge base queries
- Organizational content search
- Semantic search over embedded data
CustomRetriever
¶
The CustomRetriever
allows you to design retrievers tailored to your specific use-cases, enabling retrieval of information from unique or specialized data sources.
Configuration Template¶
Below is an example configuration for setting up a CustomRetriever
:
- retriever_name: <your-retriever-name> # Required: A custom name for this retriever instance
retriever_class: CustomRetriever # Required. CustomRetriever is the type of retriever that retrieves relevant information from a vector database.
description: <optional-description> # Optional. A description of the retriever
# Any other arbritrary config that your CustomRetriever needs
your_arbitrary_config_1: <config-value>
your_arbitrary_config_2: <config-value>
your_arbitrary_config_n: <config-value>
Implementation Instructions¶
Retriever Function Template¶
You need to implement the logic for your CustomRetriever
within a Python function. Below is the template for that function:
async def your_custom_retriever(query: str, your_arbitrary_config_1: Any, ..., your_arbitrary_config_n: Any) -> List[Dict[str, Any]]:
"""
Retrieves information based on the provided query.
Args:
query (str): The query string used to search for relevant information.
your_arbitrary_config_1 (Any): An arbitrary configuration parameter with unspecified type.
your_arbitrary_config_n (Any): Another arbitrary configuration parameter with unspecified type.
Returns:
List[Dict[str, Any]]: A list of dictionaries, each containing:
- "result": A string representing the formatted result.
- "score": A float representing the final score.
"""
pass
All the arbitrary configurations you specified in the retriever's YAML configuration will be passed as input arguments to this function. You will have access to these configurations within your retriever function.
Integration to executor_dict
¶
Once you've defined your retriever function, you need to incorporate it into the executor_dict
of your project using the following format:
executor_dict = {
"<name-of-your-research-agent>": {
"<your-custom-retriever-name>": your_custom_retriever,
}
}
Use Case¶
CustomRetriever
offers flexibility by allowing tailored data retrieval processes. As long as your retriever function is correctly written to return results in the required format, it can effectively integrate with your research agent. Key use cases include:
-
Specialized Data Queries: Customize data access for unique structures and formats.
-
Enhanced Search: Implement specific search algorithms for precise outcomes.
-
API Integration: Seamlessly fetch and incorporate data from external sources.
-
Performance Optimization: Enhance speed and efficiency for large data volumes.
-
Domain-Specific Logic: Utilize custom logic to meet specific criteria.
-
Security and Compliance: Ensure data handling aligns with necessary standards.