Skip to content

Analytics Agent

The AnalyticsAgent is a built-in utility agent in the AI Refinery SDK for handling queries that require data analysis. For instance, if you have a dataset with information on breast cancer, and your query is "What is the distribution of radius over benign vs malignant tumors?", the AnalyticsAgent will

  • generate a pandas/postgres command (depending on your data source)
  • send the command to the AI Refinery SDK backend to be executed
  • receive the results from the command execution
  • generate a textual answer based on the results recieved

This documentation outlines the workflow and configurations needed to use the AnalyticsAgent.

Workflow Overview

The workflow of AnalyticsAgent consists of five components:

  1. Data hosting: You have the flexibility to store your dataset in CSV files or a PostgreSQL database. Furthermore, there's no need to upload your data to the AI Refinery service backend. All data analytics commands generated by AI Refinery will be executed directly on the SDK backend on your end. Additionally, you can host your data on remote servers and databases, simply providing us with the necessary configuration details for access.

  2. Executor configuration: To utilize the AnalyticsAgent, you must configure either a Postgres or Pandas executor. These executors will process commands sent from the AI Refinery service (generated by the AnalyticsAgent) on the SDK backend. For both executor types, it is essential to define the database tables (for Postgres) or dataframes (for Pandas) that the executor can access. Additionally, you have the option to specify column names along with their descriptions. To set up your executor, you have two options:

    • Automatic Instantiation via Project YAML File: Include all required configuration details for your executor in the project's YAML file. This approach allows the system to automatically instantiate the executor based on the specified parameters.

    • Manual Instantiation: Alternatively, you can manually create an instance of the executor. Once instantiated, add it to the custom_agent_gallery to integrate it with your project.

  3. Command generation: When a user query is received by the AnalyticsAgent, hosted on the AI Refinery service, it generates a command tailored to the type of executor you've configured---either Pandas or Postgres. This command is then dispatched to the AI Refinery SDK backend for execution.

  4. Command execution: The executor you configured receives the command from the AnalyticsAgent and carries out the execution. Once completed, the results are returned to the AnalyticsAgent hosted on the AI Refinery service.

  5. Answer generation: The AnalyticsAgent will then interpret the results and generate a textual response to the original user query.

Usage

As a built-in utility agent in the AI Refinery SDK, AnalyticsAgent can be easily integrated into your project by adding the necessary configurations to your project YAML file. Specifically, ensure the following configurations are included:

  • Add a utility agent with agent_class: AnalyticsAgent under utility_agents.
  • Ensure the agent_name you chose for your AnalyticsAgent is listed in the agent_list under orchestrator.
  • Add the PandasExecutor or PostgresExecutor configurations under the executor_config if you have not manually created them and added to your custom_agent_gallery.

Quickstart

This section presents an example project with two different AnalyticsAgents: one utilizing a PandasExecutor for CSV files and another using a PostgresExecutor for PostgreSQL databases. Here is the yaml config for this project:

utility_agents:
  - agent_class: AnalyticsAgent
    agent_name: "Breast Cancer Analytics Agent" # A name that you choose for your AnalyticsAgent. This needs to be listed under orchestrator.
    agent_description: "An analytics agent capable of performing in-depth analyses on breast cancer data and providing insightful answers to related questions." # A description for your AnalyticsAgent.
    config: # Required. Configuration of your AnalyticsAgent
      contexts: # Optional. Recommended. Useful if your AnalyticsAgent needs to refer to previous history.
        - "date"
        - "chat_history"
      executor_config: # Required. Configuration of your executor.
        type: PandasExecutor # Required. Type of the executor.
        tables: # Required. The list of tables i.e., DataFrames 
          - name: "BreastCancer" # Required. The name of your DataFrame that you choose.
            file_path: "analytics_data/BreastCancer.csv" # Required. The file path where the file is hosted. Currently, both CSV and parquet files are supported. The path can be both local or a remote URL.

  - agent_class: AnalyticsAgent
    agent_name: "City Information Analytics Agent"  # A name that you choose for your AnalyticsAgent. This needs to be listed under orchestrator.
    agent_description: "An analytics agent designed to process and analyze city data, providing insightful answers to related questions." # A description for your AnalyticsAgent.
    config: # Required. Configuration of your AnalyticsAgent
      contexts: # Optional. Recommended. Useful if your AnalyticsAgent needs to refer to previous history.
        - "date"
        - "chat_history"
      executor_config: # Required. Configuration of your executor.
        type: PostgresExecutor 
        db_config: # Required if you do not manually create the PostgresExecutor and add it to your custom_agent_gallery
          "host": "localhost"
          "port": "5432"
          "user": "postgres"
          "password": "mypassword"
          "database": "city_information"
        tables: # Required. List of database tables with their schema_names
          - name: "world_cities" # Required. Name of your database table
            schema_name: "public" # Required. Schema name of the table
          - name: "city_mayors" # Required. Name of your database table
            schema_name: "public" # Required. Schema name of the table
          - name: "attractions" # Required. Name of your database table
            schema_name: "city_tourism" # Required. Schema name of the table

orchestrator:
  agent_list:
    - agent_name: "Breast Cancer Analytics Agent"
    - agent_name: "City Information Analytics Agent"

If you prefer to create the executors yourself and avoid sharing your credentials, you can do so in a python file and add them to your custom_agent_gallery. See an example below:

import os
import asyncio
from aioconsole import aprint

from air import login, DistillerClient
from air.api import PandasAPI, PostgresAPI


auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")

# db_config for creating the PostgresExecutor. No need to add the db_config in your project configuration if you create your executor here.
analytics_db_config: dict = {
    "host": "localhost",
    "port": "5432",
    "user": "postgres",
    "password": "mypassword",
    "database": "city_information",
}
analytics_db_client = PostgresAPI(analytics_db_config) # creating the database client

# Table information for creating the PandasExecutor
tables = [
    {
        "name": "BreastCancer",
        "file_path": "analytics_data/BreastCancer.csv",
    },
]

analytics_df_client = PandasAPI(tables)  # creating the dataframe client


async def analytics_demo():
    """
    Function for testing analytics agents
    """

    # create a distiller client
    distiller_client = DistillerClient(base_url=base_url)  # instantiating the DistillerClient 

    # creating the project
    distiller_client.create_project(
        config_path="analytics.yaml", project="analytics_example"
    )

    custom_agent_gallery = {
        "Breast Cancer Analytics Agent": {
            "PandasExecutor": analytics_df_client.execute_query, # the execute_query function of the dataframe client needs to be added as the PandasExecutor
        },
        "City Information Analytics Agent": {
            "PostgresExecutor": analytics_db_client.execute_query, # the execute_query function of the database client needs to be added as the PostgresExecutor
        }
    }

    async with distiller_client(
        project="analytics_example",
        uuid="test_user",
        custom_agent_gallery=custom_agent_gallery,
    ) as dc:
        queries = [
            "How many patients are benign?",
            "What is the distribution of radius over benign vs malignant tumors?",
        ]
        for query in queries:
            responses = await dc.query(query=query)
            print(f"----\nQuery: {query}")
            async for response in responses:
                await aprint(f"Response: {response['content']}")


if __name__ == "__main__":
    asyncio.run(analytics_demo())

Template YAML Configuration of AnalyticsAgent

In addition to the configurations mentioned for the example above, the AnalyticsAgent supports several other configurable options. See the template YAML configuration below for all available settings.

agent_class: AnalyticsAgent
agent_name: <name of the agent> # A name that you choose for your analytics agent
agent_description: <description of the agent> #Optional
config:
  executor_config: # Required. Configuration of your executor.
    type: <Type of the executor. PandasExecutor or PostgresExecutor> # Required. 
    db_config: # Required ONLY for PostgresExecutor and ONLY if you do not manually create it and add it to your custom_agent_gallery in your python file
      "host": "<database_host>"  # Required. The address of your database server (e.g., "localhost" or IP address)  
      "port": "<database_port>"  # Required. The port number your database server is listening on (typically "5432" for PostgreSQL)  
      "user": "<database_user>"  # Required. The username you use to connect to the database  
      "password": "<database_password>" # Required. The password associated with the username  
      "database": "<database_name>"  # Required. The name of the specific database you want to connect to  
    tables: # Required. The list of tables  
      - name: <The name of your table.> # Required.
        file_path: <The file path where the csv or parquet file is hosted.> # Required only for PandasExecutor.
        schema_name: <The name of schema under which the table exists> # Required only for PostgresExecutor.
        columns: # Optional. List of columns if you want to provide a description (for all columns or a subset)
          - name: <name of the column> 
            desc: <description of the column> # a description of the column to help the agents better understand your data. The description is generated by the agents if your do not provide it.
  output_style: <"markdown" or "conversational" or "html">  # Optional field
  contexts:  # Optional field
    - "date"
    - "chat_history" # the chat history upto a certain number of rounds
    - "env_variable"
    - "relevant_chat_history" # the chat history that is relevant to the current query
  llm_config:
  # Optional. Customized llm config (if you want the research agent to use a different LLM than the on in your base config)
    model: <model_name>