Analytics Agent¶
The AnalyticsAgent
is a built-in utility agent in the AI Refinery SDK for handling queries that require data analysis. For instance, if you have a dataset with information on breast cancer, and your query is "What is the distribution of radius over benign vs malignant tumors?", the AnalyticsAgent
will
- generate a pandas/postgres command (depending on your data source)
- send the command to the AI Refinery SDK backend to be executed
- receive the results from the command execution
- generate a textual answer based on the results recieved
This documentation outlines the workflow and configurations needed to use the AnalyticsAgent.
Workflow Overview¶
The workflow of AnalyticsAgent
consists of five components:
-
Data hosting: You have the flexibility to store your dataset in CSV files or a PostgreSQL database. Furthermore, there's no need to upload your data to the AI Refinery service backend. All data analytics commands generated by AI Refinery will be executed directly on the SDK backend on your end. Additionally, you can host your data on remote servers and databases, simply providing us with the necessary configuration details for access.
-
Executor configuration: To utilize the
AnalyticsAgent
, you must configure either a Postgres or Pandas executor. These executors will process commands sent from the AI Refinery service (generated by theAnalyticsAgent
) on the SDK backend. For both executor types, it is essential to define the database tables (for Postgres) or dataframes (for Pandas) that the executor can access. Additionally, you have the option to specify column names along with their descriptions. To set up your executor, you have two options:-
Automatic Instantiation via Project YAML File: Include all required configuration details for your executor in the project's YAML file. This approach allows the system to automatically instantiate the executor based on the specified parameters.
-
Manual Instantiation: Alternatively, you can manually create an instance of the executor. Once instantiated, add it to the
custom_agent_gallery
to integrate it with your project.
-
-
Command generation: When a user query is received by the
AnalyticsAgent
, hosted on the AI Refinery service, it generates a command tailored to the type of executor you've configured---either Pandas or Postgres. This command is then dispatched to the AI Refinery SDK backend for execution. -
Command execution: The executor you configured receives the command from the
AnalyticsAgent
and carries out the execution. Once completed, the results are returned to theAnalyticsAgent
hosted on the AI Refinery service. -
Answer generation: The
AnalyticsAgent
will then interpret the results and generate a textual response to the original user query.
Usage¶
As a built-in utility agent in the AI Refinery SDK, AnalyticsAgent
can be easily integrated into your project by adding the necessary configurations to your project YAML file. Specifically, ensure the following configurations are included:
- Add a utility agent with
agent_class: AnalyticsAgent
underutility_agents
. - Ensure the
agent_name
you chose for yourAnalyticsAgent
is listed in theagent_list
underorchestrator
. - Add the
PandasExecutor
orPostgresExecutor
configurations under theexecutor_config
if you have not manually created them and added to yourcustom_agent_gallery
.
Quickstart¶
This section presents an example project with two different AnalyticsAgents
: one utilizing a PandasExecutor for CSV files and another using a PostgresExecutor for PostgreSQL databases. Here is the yaml config for this project:
utility_agents:
- agent_class: AnalyticsAgent
agent_name: "Breast Cancer Analytics Agent" # A name that you choose for your AnalyticsAgent. This needs to be listed under orchestrator.
agent_description: "An analytics agent capable of performing in-depth analyses on breast cancer data and providing insightful answers to related questions." # A description for your AnalyticsAgent.
config: # Required. Configuration of your AnalyticsAgent
contexts: # Optional. Recommended. Useful if your AnalyticsAgent needs to refer to previous history.
- "date"
- "chat_history"
executor_config: # Required. Configuration of your executor.
type: PandasExecutor # Required. Type of the executor.
tables: # Required. The list of tables i.e., DataFrames
- name: "BreastCancer" # Required. The name of your DataFrame that you choose.
file_path: "analytics_data/BreastCancer.csv" # Required. The file path where the file is hosted. Currently, both CSV and parquet files are supported. The path can be both local or a remote URL.
- agent_class: AnalyticsAgent
agent_name: "City Information Analytics Agent" # A name that you choose for your AnalyticsAgent. This needs to be listed under orchestrator.
agent_description: "An analytics agent designed to process and analyze city data, providing insightful answers to related questions." # A description for your AnalyticsAgent.
config: # Required. Configuration of your AnalyticsAgent
contexts: # Optional. Recommended. Useful if your AnalyticsAgent needs to refer to previous history.
- "date"
- "chat_history"
executor_config: # Required. Configuration of your executor.
type: PostgresExecutor
db_config: # Required if you do not manually create the PostgresExecutor and add it to your custom_agent_gallery
"host": "localhost"
"port": "5432"
"user": "postgres"
"password": "mypassword"
"database": "city_information"
tables: # Required. List of database tables with their schema_names
- name: "world_cities" # Required. Name of your database table
schema_name: "public" # Required. Schema name of the table
- name: "city_mayors" # Required. Name of your database table
schema_name: "public" # Required. Schema name of the table
- name: "attractions" # Required. Name of your database table
schema_name: "city_tourism" # Required. Schema name of the table
orchestrator:
agent_list:
- agent_name: "Breast Cancer Analytics Agent"
- agent_name: "City Information Analytics Agent"
If you prefer to create the executors yourself and avoid sharing your credentials, you can do so in a python file and add them to your custom_agent_gallery
. See an example below:
import os
import asyncio
from aioconsole import aprint
from air import login, DistillerClient
from air.api import PandasAPI, PostgresAPI
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
# db_config for creating the PostgresExecutor. No need to add the db_config in your project configuration if you create your executor here.
analytics_db_config: dict = {
"host": "localhost",
"port": "5432",
"user": "postgres",
"password": "mypassword",
"database": "city_information",
}
analytics_db_client = PostgresAPI(analytics_db_config) # creating the database client
# Table information for creating the PandasExecutor
tables = [
{
"name": "BreastCancer",
"file_path": "analytics_data/BreastCancer.csv",
},
]
analytics_df_client = PandasAPI(tables) # creating the dataframe client
async def analytics_demo():
"""
Function for testing analytics agents
"""
# create a distiller client
distiller_client = DistillerClient(base_url=base_url) # instantiating the DistillerClient
# creating the project
distiller_client.create_project(
config_path="analytics.yaml", project="analytics_example"
)
custom_agent_gallery = {
"Breast Cancer Analytics Agent": {
"PandasExecutor": analytics_df_client.execute_query, # the execute_query function of the dataframe client needs to be added as the PandasExecutor
},
"City Information Analytics Agent": {
"PostgresExecutor": analytics_db_client.execute_query, # the execute_query function of the database client needs to be added as the PostgresExecutor
}
}
async with distiller_client(
project="analytics_example",
uuid="test_user",
custom_agent_gallery=custom_agent_gallery,
) as dc:
queries = [
"How many patients are benign?",
"What is the distribution of radius over benign vs malignant tumors?",
]
for query in queries:
responses = await dc.query(query=query)
print(f"----\nQuery: {query}")
async for response in responses:
await aprint(f"Response: {response['content']}")
if __name__ == "__main__":
asyncio.run(analytics_demo())
Template YAML Configuration of AnalyticsAgent
¶
In addition to the configurations mentioned for the example above, the AnalyticsAgent
supports several other configurable options. See the template YAML configuration below for all available settings.
agent_class: AnalyticsAgent
agent_name: <name of the agent> # A name that you choose for your analytics agent
agent_description: <description of the agent> #Optional
config:
executor_config: # Required. Configuration of your executor.
type: <Type of the executor. PandasExecutor or PostgresExecutor> # Required.
db_config: # Required ONLY for PostgresExecutor and ONLY if you do not manually create it and add it to your custom_agent_gallery in your python file
"host": "<database_host>" # Required. The address of your database server (e.g., "localhost" or IP address)
"port": "<database_port>" # Required. The port number your database server is listening on (typically "5432" for PostgreSQL)
"user": "<database_user>" # Required. The username you use to connect to the database
"password": "<database_password>" # Required. The password associated with the username
"database": "<database_name>" # Required. The name of the specific database you want to connect to
tables: # Required. The list of tables
- name: <The name of your table.> # Required.
file_path: <The file path where the csv or parquet file is hosted.> # Required only for PandasExecutor.
schema_name: <The name of schema under which the table exists> # Required only for PostgresExecutor.
columns: # Optional. List of columns if you want to provide a description (for all columns or a subset)
- name: <name of the column>
desc: <description of the column> # a description of the column to help the agents better understand your data. The description is generated by the agents if your do not provide it.
output_style: <"markdown" or "conversational" or "html"> # Optional field
contexts: # Optional field
- "date"
- "chat_history" # the chat history upto a certain number of rounds
- "env_variable"
- "relevant_chat_history" # the chat history that is relevant to the current query
llm_config:
# Optional. Customized llm config (if you want the research agent to use a different LLM than the on in your base config)
model: <model_name>