Skip to content

Employ Genie Agents using the Databricks Agent

With AI Refinery, you can seamlessly access agents from third-party vendors and integrate them into your agentic team and workflow. In this tutorial, we provide an example of using a DatabricksAgent that acts as a Database Assistant and allows the user to talk with their data stored in a Databricks database.

Objective

Build a database assistant that allows you to "talk to your data" using one of the AI Refinery's supported third-party agents, the DatabricksAgent.

Tutorial Description

Follow this tutorial for a step-by-step walkthrough of how to build a Genie agent on the Databricks platform, provide it with a database, and connect to it using the DatabricksAgent of the AI Refinery SDK to talk with your data.

Tutorial Workflow

The tutorial explains the following steps in detail:

  • Accessing a Databricks account: Connecting to an existing Databricks account or creating a new free trial account.
  • Granting authenticated account access: Setting up authenticated access the Databricks account, and acquiring the required credentials to access it.
  • Creating Databricks Genie Agent: Setting up a Genie agent on the Databricks platform to allow for an LLM-backed communication with the data.
  • Setting up a database: Connecting the Genie agent with an existing database or creating a new sample database on the Databricks platform to experiment with through the AI Refinery.
  • Granting database access permissions: Setting up permissions to access the Genie agent, the connected database, and the associated tables to allow for a realistic demonstration of the AI Refinery-Databricks communication capabilities.
  • Configuration and Execution examples: Example code to configure a Databricks Agent on AI Refinery and interact with it to introduce the advanced analytics capabilities of the Genie agents into the AI Refinery platform.

Creating Databricks Agents

Users can customize a Databricks Agent through the Databricks platform. The following steps walk you through the creation of an agent, assuming that you have admin privileges on your (personal) account. If you are using a corporate account, you might need to get some permissions from your account admin.

  1. Access the Databricks Platform: You can sign up for a free Databricks account (full access to compute for 14 days, suitable if you are aiming for production applications). If you already have a corporate account, you can log in using that one to gain access to your data that already live in Databricks.
  2. Create Service Principal: Proceed (or ask your account admin) to set up a Service Principal for your account to allow external connections. To do that, go to your account settings, select the Identity and Access tab, click on Add service principal, and name your service principal (e.g., distiller-client).

    Service principal creation

  3. Obtain your account credentials: Three credentials are required:

    • First, to obtain your Client ID and Client Secret, click on your created Service Principal, select the Secrets tab, and generate a new key/secret. pair. Save these credentials to a safe location, you will need these to configure your agent in AIR later. Client Key/Secret creation
    • Then to obtain your Host URL, go to your databricks account homepage and copy the first part of the URL as shown below, along with the https:// prefix: Host URL

    You will need to export these credentials as environment variables in the terminal where your code will run. Therefore, save the values in your local .env file:

    export DATABRICKS_HOST=<YOUR-HOST-URL>
    export DATABRICKS_CLIENT_ID=<YOUR-CLIENT-ID>
    export DATABRICKS_CLIENT_SECRET=<YOUR-CLIENT-SECRET>
    

  4. Create Genie Agent: Set up a Genie Space to connect a Genie agent to your data. If you are using a corporate account and you already have data on the Databricks platform, you can skip this step. If you are using the free trial account and you need data, then you can set up a demo database as follows.

    On your Databricks homepage, click on the New button and create a new Notebook:
    Demo database setup

    Then, run the following code in your notebook:

    %pip install dbdemos
    import dbdemos
    dbdemos.install('aibi-customer-support', catalog='accenture', schema='dbdemos_customer_support')
    
    This will set up a sample demo database with a few tables for your Genie agent to connect to. You can further customize your Genie workspace with additional business-specific context, and exemplary SQL queries for database management and exploration.

  5. Obtain your Genie space ID: You can find that from the URL of your Genie space after you have set it up. The format of that URL is as follows:

    https://{your-workspace-id}.cloud.databricks.com/genie/rooms/YOUR-GENIE-SPACE-ID?o=1234567890
    
    Carefully copy your Genie space ID from there, between the /rooms/ field and the ?o= separator. Obtaining Genie ID

    You will also need that to configure your Databricks agent in AIR later, so save it in your local .env file, in the same way that you did for the Client Key and Client Secret.

    export GENIE_SPACE_ID=<YOUR-GENIE-SPACE-ID>
    
  6. Test your Genie agent (Optional): Try the chat interface in the Databricks platform to talk to your data using your Genie agent, while viewing the data tables or the unstructured data sources it has access to and understand its capabilities.

  7. Grant permission for external connections: To allow for the AI Refinery to conect to your genie agent, set up the following permissions:

    • First, give your service account permission to access your Genie space. To do so, click on the Genie tab on the left side, then click on the Share button, and grant "Can Manage" permission to your Service Principal.
      Access of service account to genie

    • Then, give your service account permission to access the underlying SQL warehouse. To do so, click on the SQL Warehouse tab on the left side, select the underlying warehouse (your trial account only allows for one warehouse named Serverless), then click on the Permissions button, and grant "Can Use" permission to your Service Principal. Access of service account to warehouse

    • Lastly, give your service account permission to access the underlying schema. To do so, click on Catalog tab on the left, then select your database (dbdemos_customer_support if you are using the free trial account), then click on the Permissions button, and grant your Service Principal the four permissions shown below: Access of service account to schema

Execution Steps

1. Configuration file

To use the DatabricksAgent, you need to create a yaml file with all the required configuration. You can customize your assistant agent in this simple YAML configuration file.

As a first step, you need to allow the orchestrator to invoke the given DatabricksAgent by listing the agent in the orchestrator's agent_list.

See the YAML template below for the DatabricksAgent configuration.

orchestrator:
  agent_list:
    - agent_name: "Database Assistant"

utility_agents:
  - agent_class: DatabricksAgent
    agent_name: "Database Assistant"
    agent_description: "The Database Assistant has access to the tables of an Accenture database and can answer questions about the data contained."
    config:
      client_id: "DATABRICKS_CLIENT_ID" # Required: Environment variable holding Databricks client ID
      client_secret: "DATABRICKS_CLIENT_SECRET" # Required: Environment variable holding Databricks client secret
      host_url: "DATABRICKS_HOST" # Required: Environment variable holding Databricks host URL
      genie_space_id: "GENIE_SPACE_ID" # Required: Environment variable holding Databricks Genie space ID
      contexts: # Optional
        - "date"
        - "chat_history"

2. Python file

Before running the demo code, make sure that you have the following environment variables in your local .env file:

  # AIR variables
  export ACCOUNT="YOUR-ACCOUNT-ID"
  export API_KEY=<YOUR-API-KEY>

  # Databricks variables
  export DATABRICKS_HOST=<YOUR-HOST-URL>
  export DATABRICKS_CLIENT_ID=<YOUR-CLIENT-ID>
  export DATABRICKS_CLIENT_SECRET=<YOUR-CLIENT-SECRET>
  export GENIE_SPACE_ID=<YOUR-GENIE-SPACE-ID>

The following code is an example of the AI Refinery interacting with your Databricks agent to talk with the sample database you set up in the previous steps.

import os
import asyncio
import json
import csv
import datetime
from dotenv import load_dotenv

from air import login, DistillerClient

load_dotenv() # This loads your environment variables from the '.env' file

# Authenticate using environment variables or fallback values.
AUTH = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)


def parse_db_output(response):
    """
    Helper function to beauty-print Databricks agent's
    response in chat and save it locally in csv.
    """
    # If tabular data in response of DB agent
    try:
        # Read as JSON
        df = json.loads(response)

        # Save locally in a csv file
        name = datetime.datetime.now()
        with open(f"./{name}.csv", "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerows(df)

        # Print in a tabular format in the chat
        for row in df:
            print(*row, sep="   ")

    # Else, print plain text
    except json.JSONDecodeError:
        print(response)


async def databricks_agent_demo():
    """
    Simple demo for a Databricks agent.
    We send a set of sample queries that highlight the capabilities of the agent.
    The overall objective is talking with your data.
    In a workspace-agnostic way, we first ask for an overview of the database.
    Then, we ask for information that trigger SQL code generation and execution
    to return numerical and tabular data from Databricks.
    """
    distiller_client = DistillerClient(base_url=BASE_URL)
    distiller_client.create_project(
        config_path="example.yaml", project="example-databricks"
    )

    queries = [
        "Give me an overview of this dataset",
        "How many rows does the first table have?",
        "Show me the first ten rows rows of the first table",
        "Show me the first ten rows rows of the first table corresponding to 2nd line support tickets",
        "How many tickets did Michele Whyatt handle?",
    ]

    async with distiller_client( # Initialize the distiller client instance
        project="example-databricks",
        uuid="test_user",
    ) as dc:
        for query in queries: # Send in the above queries one by one
            responses = await dc.query(query=query)
            print(f"----\nQuery: {query}")
            async for response in responses: # Collect the responses to the queries
                parse_db_output(response["content"]) # Format and print them in the chat


if __name__ == "__main__":
    print("\nDatabricks Agent Demo")
    asyncio.run(databricks_agent_demo())