Employ Genie Agents using the Databricks Agent¶
With AI Refinery, you can seamlessly access agents from third-party vendors and integrate them into your agentic team and workflow. In this tutorial, we provide an example of using a DatabricksAgent
that acts as a Database Assistant and allows the user to talk with their data stored in a Databricks database.
Objective¶
Build a database assistant that allows you to "talk to your data" using one of the AI Refinery's supported third-party agents, the DatabricksAgent
.
Tutorial Description¶
Follow this tutorial for a step-by-step walkthrough of how to build a Genie agent on the Databricks platform, provide it with a database, and connect to it using the DatabricksAgent
of the AI Refinery SDK to talk with your data.
Tutorial Workflow¶
The tutorial explains the following steps in detail:
- Accessing a Databricks account: Connecting to an existing Databricks account or creating a new free trial account.
- Granting authenticated account access: Setting up authenticated access the Databricks account, and acquiring the required credentials to access it.
- Creating Databricks Genie Agent: Setting up a Genie agent on the Databricks platform to allow for an LLM-backed communication with the data.
- Setting up a database: Connecting the Genie agent with an existing database or creating a new sample database on the Databricks platform to experiment with through the AI Refinery.
- Granting database access permissions: Setting up permissions to access the Genie agent, the connected database, and the associated tables to allow for a realistic demonstration of the AI Refinery-Databricks communication capabilities.
- Configuration and Execution examples: Example code to configure a
Databricks Agent
on AI Refinery and interact with it to introduce the advanced analytics capabilities of the Genie agents into the AI Refinery platform.
Creating Databricks Agents ¶
Users can customize a Databricks Agent
through the Databricks platform. The following steps walk you through the creation of an agent, assuming that you have admin privileges on your (personal) account. If you are using a corporate account, you might need to get some permissions from your account admin.
- Access the Databricks Platform: You can sign up for a free Databricks account (full access to compute for 14 days, suitable if you are aiming for production applications). If you already have a corporate account, you can log in using that one to gain access to your data that already live in Databricks.
-
Create Service Principal: Proceed (or ask your account admin) to set up a Service Principal for your account to allow external connections. To do that, go to your account settings, select the Identity and Access tab, click on Add service principal, and name your service principal (e.g., distiller-client).
-
Obtain your account credentials: Three credentials are required:
- First, to obtain your Client ID and Client Secret, click on your created Service Principal, select the Secrets tab, and generate a new key/secret. pair. Save these credentials to a safe location, you will need these to configure your agent in AIR later.
- Then to obtain your Host URL, go to your databricks account homepage and copy the first part of the URL as shown below, along with the https:// prefix:
You will need to export these credentials as environment variables in the terminal where your code will run. Therefore, save the values in your local .env file:
- First, to obtain your Client ID and Client Secret, click on your created Service Principal, select the Secrets tab, and generate a new key/secret. pair. Save these credentials to a safe location, you will need these to configure your agent in AIR later.
-
Create Genie Agent: Set up a Genie Space to connect a Genie agent to your data. If you are using a corporate account and you already have data on the Databricks platform, you can skip this step. If you are using the free trial account and you need data, then you can set up a demo database as follows.
On your Databricks homepage, click on the New button and create a new Notebook:
Then, run the following code in your notebook:
This will set up a sample demo database with a few tables for your Genie agent to connect to. You can further customize your Genie workspace with additional business-specific context, and exemplary SQL queries for database management and exploration. -
Obtain your Genie space ID: You can find that from the URL of your Genie space after you have set it up. The format of that URL is as follows:
Carefully copy your Genie space ID from there, between the/rooms/
field and the?o=
separator.You will also need that to configure your Databricks agent in AIR later, so save it in your local .env file, in the same way that you did for the Client Key and Client Secret.
-
Test your Genie agent (Optional): Try the chat interface in the Databricks platform to talk to your data using your Genie agent, while viewing the data tables or the unstructured data sources it has access to and understand its capabilities.
-
Grant permission for external connections: To allow for the AI Refinery to conect to your genie agent, set up the following permissions:
-
First, give your service account permission to access your Genie space. To do so, click on the Genie tab on the left side, then click on the Share button, and grant "Can Manage" permission to your Service Principal.
-
Then, give your service account permission to access the underlying SQL warehouse. To do so, click on the SQL Warehouse tab on the left side, select the underlying warehouse (your trial account only allows for one warehouse named Serverless), then click on the Permissions button, and grant "Can Use" permission to your Service Principal.
-
Lastly, give your service account permission to access the underlying schema. To do so, click on Catalog tab on the left, then select your database (dbdemos_customer_support if you are using the free trial account), then click on the Permissions button, and grant your Service Principal the four permissions shown below:
-
Execution Steps¶
1. Configuration file¶
To use the DatabricksAgent
, you need to create a yaml file with all the required configuration. You can customize your assistant agent in this simple YAML configuration file.
As a first step, you need to allow the orchestrator to invoke the given DatabricksAgent
by listing the agent in the orchestrator's agent_list
.
See the YAML template below for the DatabricksAgent
configuration.
orchestrator:
agent_list:
- agent_name: "Database Assistant"
utility_agents:
- agent_class: DatabricksAgent
agent_name: "Database Assistant"
agent_description: "The Database Assistant has access to the tables of an Accenture database and can answer questions about the data contained."
config:
client_id: "DATABRICKS_CLIENT_ID" # Required: Environment variable holding Databricks client ID
client_secret: "DATABRICKS_CLIENT_SECRET" # Required: Environment variable holding Databricks client secret
host_url: "DATABRICKS_HOST" # Required: Environment variable holding Databricks host URL
genie_space_id: "GENIE_SPACE_ID" # Required: Environment variable holding Databricks Genie space ID
contexts: # Optional
- "date"
- "chat_history"
2. Python file¶
Before running the demo code, make sure that you have the following environment variables in your local .env file:
# AIR variables
export ACCOUNT="YOUR-ACCOUNT-ID"
export API_KEY=<YOUR-API-KEY>
# Databricks variables
export DATABRICKS_HOST=<YOUR-HOST-URL>
export DATABRICKS_CLIENT_ID=<YOUR-CLIENT-ID>
export DATABRICKS_CLIENT_SECRET=<YOUR-CLIENT-SECRET>
export GENIE_SPACE_ID=<YOUR-GENIE-SPACE-ID>
The following code is an example of the AI Refinery interacting with your Databricks agent to talk with the sample database you set up in the previous steps.
import os
import asyncio
import json
import csv
import datetime
from dotenv import load_dotenv
from air import login, DistillerClient
load_dotenv() # This loads your environment variables from the '.env' file
# Authenticate using environment variables or fallback values.
AUTH = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
def parse_db_output(response):
"""
Helper function to beauty-print Databricks agent's
response in chat and save it locally in csv.
"""
# If tabular data in response of DB agent
try:
# Read as JSON
df = json.loads(response)
# Save locally in a csv file
name = datetime.datetime.now()
with open(f"./{name}.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(df)
# Print in a tabular format in the chat
for row in df:
print(*row, sep=" ")
# Else, print plain text
except json.JSONDecodeError:
print(response)
async def databricks_agent_demo():
"""
Simple demo for a Databricks agent.
We send a set of sample queries that highlight the capabilities of the agent.
The overall objective is talking with your data.
In a workspace-agnostic way, we first ask for an overview of the database.
Then, we ask for information that trigger SQL code generation and execution
to return numerical and tabular data from Databricks.
"""
distiller_client = DistillerClient(base_url=BASE_URL)
distiller_client.create_project(
config_path="example.yaml", project="example-databricks"
)
queries = [
"Give me an overview of this dataset",
"How many rows does the first table have?",
"Show me the first ten rows rows of the first table",
"Show me the first ten rows rows of the first table corresponding to 2nd line support tickets",
"How many tickets did Michele Whyatt handle?",
]
async with distiller_client( # Initialize the distiller client instance
project="example-databricks",
uuid="test_user",
) as dc:
for query in queries: # Send in the above queries one by one
responses = await dc.query(query=query)
print(f"----\nQuery: {query}")
async for response in responses: # Collect the responses to the queries
parse_db_output(response["content"]) # Format and print them in the chat
if __name__ == "__main__":
print("\nDatabricks Agent Demo")
asyncio.run(databricks_agent_demo())