Retrievers Gallery¶

Explore the retrievers supported by the ResearchAgent of the AI Refinery SDK, designed to fetch relevant information from various sources based on user queries. Supported retrievers include:

WebSearchRetriever: Access real-time web data.
AzureAISearchRetriever: Perform semantic search over Azure hosted vector database index.
ElasticSearchRetriever: Employ Elasticsearch for scalable search solutions.
CustomRetriever: Create you own retrievers, tailored for specific needs.

`WebSearchRetriever`¶

The WebSearchRetriever is designed to perform web searches using external search engines. The currently supported search engine is Google Search. It is ideal for retrieving the latest information public information from the internet.

Configuration Template¶

Here is the configuration template for the WebSearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: WebSearchRetriever    # Required: Specifies use of the web search retriever
  description: <optional-description>    # Optional: Brief description of what this retriever is used for

  query_transformation_examples:         # Optional: Helps transform complex user queries into effective web search queries
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight>                # Optional: Importance weight relative to other retrievers (default: 1.0)

Use Case¶

The WebSearchRetriever is well-suited for retrieving publicly available information from the open internet, similar to a traditional search engine. Typical use cases include:

General knowledge and fact-finding
News updates and trending topics
Technical explanations or documentation
Comparative research on tools, services, or ideas
Any query requiring up-to-date or web-accessible content

`AzureAISearchRetriever`¶

The AzureAISearchRetriever is designed to perform vector-based searches over an index hosted on Azure. It is ideal for retrieving information from pre-indexed datasets.

Configuration Template¶

Here are the configuration template for the AzureAISearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: AzureAISearchRetriever  # Required: Use this retriever for Azure-hosted vector search
  description: <optional-description>  # Optional: Brief explanation of what this retriever is used for

  aisearch_config:
    base_url: <your-base-url>  # Required: Base URL of your Azure vector search endpoint
    api_key: <your-api-key>  # Required: Azure AISearch service API key
    index: <your-index-name>  # Required: Name of the vector index to search

    embedding_column: <embedding-column-name>  # Required: Column in your index containing embedded data
    embedding_config:
      model: <embedding-model-name>  # Required: Must match the model used during indexing
    top_k: <number-of-results>  # Optional: Number of top documents to retrieve

    content_column:  # Required: Column(s) containing retrievable content
      - <content-column-1>
      - <content-column-2>

    aggregate_column: <optional-aggregate-column>  # Optional: Used to group chunks by document
    meta_data:  # Optional: Metadata fields to enrich the response
      - column_name: <source-column-name>  # Required within meta_data
        load_name: <display-name>  # Required within meta_data

  query_transformation_examples:  # Optional: User-to-search query examples for improved relevance
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight-value>  # Optional: Importance weight relative to other retrievers (default: 1.0)

Use Case¶

The AzureAISearchRetriever is ideal for retrieving information from pre-indexed datasets via semantic search. It's best used in scenarios such as:

Internal knowledge base queries
Organizational content search
Semantic search over embedded data

`ElasticSearchRetriever`¶

The ElasticSearchRetriever is designed to perform vector-based searches over an index hosted in ElasticSearch. It also works well for retrieving information from structured or pre-indexed datasets.

Configuration Template¶

Here is the configuration template for the ElasticSearchRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance
  retriever_class: ElasticSearchRetriever  # Required: Use this retriever for ElasticSearch-based vector search
  description: <optional-description>  # Optional: Brief explanation of what this retriever is used for

  elasticsearch_config:
    base_url: <your-elasticsearch-url>  # Required: Endpoint of your ElasticSearch service
    api_key: <your-api-key>  # Required: Service API key
    index: <your-index-name>  # Required: Name of the ElasticSearch index

    embedding_column: <embedding-column-name>  # Required: Column storing vector embeddings
    embedding_config:
      model: <embedding-model-name>  # Required: Must match the model used during data embedding
    top_k: <number-of-results>  # Optional: Number of top documents to retrieve

    content_column:  # Required: Column(s) containing content to retrieve
      - <content-column-1>
      - <content-column-2>

    aggregate_column: <optional-aggregate-column>  # Optional: Group chunks by original document
    meta_data:  # Optional: Metadata fields to include in results
      - column_name: <metadata-field>  # Required within meta_data
        load_name: <display-label>  # Required within meta_data

  threshold: <float-between-0-and-1>  # Optional: Filters out low-quality chunks (default: 0.9)

  query_transformation_examples:  # Optional: Transforms user queries for better search performance
    - user_query: <example-user-query>
      query:
        - <transformed-query-1>
        - <transformed-query-2>

  source_weight: <weight-value>  # Optional: Weight of this retriever relative to others (default: 1.0)

Use Case¶

The ElasticSearchRetriever is ideal for retrieving semantically relevant information from ElasticSearch-hosted content repositories. It excels in use cases such as:

Internal knowledge base queries
Organizational content search
Semantic search over embedded data

`CustomRetriever`¶

The CustomRetriever allows you to design retrievers tailored to your specific use-cases, enabling retrieval of information from unique or specialized data sources.

Configuration Template¶

Below is an example configuration for setting up a CustomRetriever:

- retriever_name: <your-retriever-name>  # Required: A custom name for this retriever instance          
  retriever_class: CustomRetriever # Required. CustomRetriever is the type of retriever that retrieves relevant information from a vector database.             
  description: <optional-description>  # Optional. A description of the retriever  

  # Any other arbritrary config that your CustomRetriever needs
  your_arbitrary_config_1: <config-value>
  your_arbitrary_config_2: <config-value>
  your_arbitrary_config_n: <config-value>

Implementation Instructions¶

Retriever Function Template¶

You need to implement the logic for your CustomRetriever within a Python function. Below is the template for that function:

async def your_custom_retriever(query: str, your_arbitrary_config_1: Any, ..., your_arbitrary_config_n: Any) -> List[Dict[str, Any]]:  
    """  
    Retrieves information based on the provided query.  

    Args:  
        query (str): The query string used to search for relevant information.  
        your_arbitrary_config_1 (Any): An arbitrary configuration parameter with unspecified type.  
        your_arbitrary_config_n (Any): Another arbitrary configuration parameter with unspecified type.  

    Returns:  
        List[Dict[str, Any]]: A list of dictionaries, each containing:  
            - "result" (str): A string representing the retrieved text content.  
            - "score" (int or float): A numeric relevance score indicating how well the result matches the query.  
            - "source" (str or None): A string representing an identifier for the source of the retrieved item, or None if not available.  

        Note: If an error occurs or no documents are found, return [{"result": "", "score": 0, "source": None}].
    """  
    pass

All the arbitrary configurations you specified in the retriever's YAML configuration will be passed as input arguments to this function. You will have access to these configurations within your retriever function.

⚠️ Warning: The previous output format with only "result" and "score" fields is still supported for existing implementations, but please update to the new format soon as the old format may be deprecated in future versions.

Integration to `executor_dict`¶

Once you've defined your retriever function, you need to incorporate it into the executor_dict of your project using the following format:

executor_dict = {
    "<name-of-your-research-agent>": {
        "<your-custom-retriever-name>": your_custom_retriever,
    }
}

This step ensures that your function is properly registered and can be executed within the project's framework.

Use Case¶

CustomRetriever offers flexibility by allowing tailored data retrieval processes. As long as your retriever function is correctly written to return results in the required format, it can effectively integrate with your research agent. Key use cases include:

Specialized Data Queries: Customize data access for unique structures and formats.
Enhanced Search: Implement specific search algorithms for precise outcomes.
API Integration: Seamlessly fetch and incorporate data from external sources.
Performance Optimization: Enhance speed and efficiency for large data volumes.
Domain-Specific Logic: Utilize custom logic to meet specific criteria.
Security and Compliance: Ensure data handling aligns with necessary standards.

Retrievers Gallery¶

WebSearchRetriever¶

Configuration Template¶

Use Case¶

AzureAISearchRetriever¶

Configuration Template¶

Use Case¶

ElasticSearchRetriever¶

Configuration Template¶

Use Case¶

CustomRetriever¶

Configuration Template¶

Implementation Instructions¶

Retriever Function Template¶

Integration to executor_dict¶

Use Case¶

`WebSearchRetriever`¶

`AzureAISearchRetriever`¶

`ElasticSearchRetriever`¶

`CustomRetriever`¶

Integration to `executor_dict`¶