Skip to content

Retrievers Gallery

Explore the retrievers supported by the ResearchAgent of the AI Refinery SDK, designed to fetch relevant information from various sources based on user queries. Supported retrievers include:


WebSearchRetriever

The WebSearchRetriever is designed to perform web searches using external search engines. The currently supported search engine is Google Search. It is ideal for retrieving the latest information public information from the internet.

Configuration Template

Here is the configuration template for the WebSearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: WebSearchRetriever    # Required: Specifies use of the web search retriever
  description: <optional-description>    # Optional: Brief description of what this retriever is used for

  query_transformation_examples:         # Optional: Helps transform complex user queries into effective web search queries
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight>                # Optional: Importance weight relative to other retrievers (default: 1.0)

Use Case

The WebSearchRetriever is well-suited for retrieving publicly available information from the open internet, similar to a traditional search engine. Typical use cases include:

  • General knowledge and fact-finding
  • News updates and trending topics
  • Technical explanations or documentation
  • Comparative research on tools, services, or ideas
  • Any query requiring up-to-date or web-accessible content

AzureAISearchRetriever

The AzureAISearchRetriever is designed to perform vector-based searches over an index hosted on Azure. It is ideal for retrieving information from pre-indexed datasets.

Configuration Template

Here are the configuration template for the AzureAISearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: AzureAISearchRetriever  # Required: Use this retriever for Azure-hosted vector search
  description: <optional-description>  # Optional: Brief explanation of what this retriever is used for

  aisearch_config:
    base_url: <your-base-url>  # Required: Base URL of your Azure vector search endpoint
    api_key: <your-api-key>  # Required: Azure AISearch service API key
    index: <your-index-name>  # Required: Name of the vector index to search

    embedding_column: <embedding-column-name>  # Required: Column in your index containing embedded data
    embedding_config:
      model: <embedding-model-name>  # Required: Must match the model used during indexing
    top_k: <number-of-results>  # Optional: Number of top documents to retrieve

    content_column:  # Required: Column(s) containing retrievable content
      - <content-column-1>
      - <content-column-2>

    aggregate_column: <optional-aggregate-column>  # Optional: Used to group chunks by document
    meta_data:  # Optional: Metadata fields to enrich the response
      - column_name: <source-column-name>  # Required within meta_data
        load_name: <display-name>  # Required within meta_data

  query_transformation_examples:  # Optional: User-to-search query examples for improved relevance
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight-value>  # Optional: Importance weight relative to other retrievers (default: 1.0)

Use Case

The AzureAISearchRetriever is ideal for retrieving information from pre-indexed datasets via semantic search. It's best used in scenarios such as:

  • Internal knowledge base queries
  • Organizational content search
  • Semantic search over embedded data

ElasticSearchRetriever

The ElasticSearchRetriever is designed to perform vector-based searches over an index hosted in ElasticSearch. It also works well for retrieving information from structured or pre-indexed datasets.

Configuration Template

Here is the configuration template for the ElasticSearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: ElasticSearchRetriever  # Required: Use this retriever for ElasticSearch-based vector search
  description: <optional-description>  # Optional: Brief explanation of what this retriever is used for

  elasticsearch_config:
    base_url: <your-elasticsearch-url>  # Required: Endpoint of your ElasticSearch service
    api_key: <your-api-key>  # Required: Service API key
    index: <your-index-name>  # Required: Name of the ElasticSearch index

    embedding_column: <embedding-column-name>  # Required: Column storing vector embeddings
    embedding_config:
      model: <embedding-model-name>  # Required: Must match the model used during data embedding
    top_k: <number-of-results>  # Optional: Number of top documents to retrieve

    content_column:  # Required: Column(s) containing content to retrieve
      - <content-column-1>
      - <content-column-2>

    aggregate_column: <optional-aggregate-column>  # Optional: Group chunks by original document
    meta_data:  # Optional: Metadata fields to include in results
      - column_name: <metadata-field>  # Required within meta_data
        load_name: <display-label>  # Required within meta_data

  threshold: <float-between-0-and-1>  # Optional: Filters out low-quality chunks (default: 0.9)

  query_transformation_examples:  # Optional: Transforms user queries for better search performance
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight-value>  # Optional: Weight of this retriever relative to others (default: 1.0)

Use Case

The ElasticSearchRetriever is ideal for retrieving semantically relevant information from ElasticSearch-hosted content repositories. It excels in use cases such as:

  • Internal knowledge base queries
  • Organizational content search
  • Semantic search over embedded data

CustomRetriever

The CustomRetriever allows you to design retrievers tailored to your specific use-cases, enabling retrieval of information from unique or specialized data sources.

Configuration Template

Below is an example configuration for setting up a CustomRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance          
  retriever_class: CustomRetriever # Required. CustomRetriever is the type of retriever that retrieves relevant information from a vector database.             
  description: <optional-description>  # Optional. A description of the retriever  

  # Any other arbritrary config that your CustomRetriever needs
  your_arbitrary_config_1: <config-value>
  your_arbitrary_config_2: <config-value>
  your_arbitrary_config_n: <config-value>

Implementation Instructions

Retriever Function Template

You need to implement the logic for your CustomRetriever within a Python function. Below is the template for that function:

async def your_custom_retriever(query: str, your_arbitrary_config_1: Any, ..., your_arbitrary_config_n: Any) -> List[Dict[str, Any]]:  
    """  
    Retrieves information based on the provided query.  

    Args:  
        query (str): The query string used to search for relevant information.  
        your_arbitrary_config_1 (Any): An arbitrary configuration parameter with unspecified type.  
        your_arbitrary_config_n (Any): Another arbitrary configuration parameter with unspecified type.  

    Returns:  
        List[Dict[str, Any]]: A list of dictionaries, each containing:  
            - "result": A string representing the formatted result.  
            - "score": A float representing the final score.  
    """  
    pass

All the arbitrary configurations you specified in the retriever's YAML configuration will be passed as input arguments to this function. You will have access to these configurations within your retriever function.

Integration to executor_dict

Once you've defined your retriever function, you need to incorporate it into the executor_dict of your project using the following format:

executor_dict = {
    "<name-of-your-research-agent>": {
        "<your-custom-retriever-name>": your_custom_retriever,
    }
}
This step ensures that your function is properly registered and can be executed within the project's framework.

Use Case

CustomRetriever offers flexibility by allowing tailored data retrieval processes. As long as your retriever function is correctly written to return results in the required format, it can effectively integrate with your research agent. Key use cases include:

  • Specialized Data Queries: Customize data access for unique structures and formats.

  • Enhanced Search: Implement specific search algorithms for precise outcomes.

  • API Integration: Seamlessly fetch and incorporate data from external sources.

  • Performance Optimization: Enhance speed and efficiency for large data volumes.

  • Domain-Specific Logic: Utilize custom logic to meet specific criteria.

  • Security and Compliance: Ensure data handling aligns with necessary standards.