Deep Research Agent¶
The DeepResearchAgent
is a built-in utility agent within the AI Refinery SDK, designed to handle complex user queries through multi-step, structured research and produce comprehensive, citation-supported reports that emphasize clarity, depth, and reliability. Unlike the more general ResearchAgent
, it specializes in delivering well-structured, in-depth reports while ensuring traceability through references.
Workflow Overview¶
The DeepResearchAgent
follows a multi-stage workflow that turns a user query into a comprehensive, citation-supported report:
- Query Clarification (optional): Asks follow-up questions when the original query is unclear or missing context. Responses can be provided via a terminal or a custom input handler.
- Research Planning: Decomposes the query into structured research questions, defining the scope and direction of the investigation.
- Iterative Research: Investigates each sub-question, collecting supporting evidence and references.
- Report Synthesis: Drafts findings into a coherent, well-structured report with inline citations and references.
- Audio Generation (optional): Converts the final report into an audio narration.

Usage¶
As a built-in utility agent in the AI Refinery SDK, DeepResearchAgent
can be easily integrated into your project by adding the necessary configurations to your project YAML file. Specifically, ensure the following configurations are included:
- Add a utility agent with
agent_class: DeepResearchAgent
underutility_agents
. - Ensure the
agent_name
you chose for yourDeepResearchAgent
is listed in theagent_list
underorchestrator
.
Quickstart¶
To quickly set up a project with a DeepResearchAgent
, use the following YAML configuration.
utility_agents:
- agent_class: DeepResearchAgent
agent_name: "Deep Research Agent" # Required. Descriptive name for the agent
config:
return_intermediate_results: true # Optional. If true, return intermediate steps and reasoning (default: false)
human_in_the_loop: true # Optional. If true, agent may ask clarifying questions (default: true)
strategy_mode: "balanced" # Optional. Strategy mode: "exploratory" | "focused" | "balanced" (default)
speech_synthesis_config:
mode: "dual_podcast_overview" # Optional. Audio generation mode:
# - "extended_audio"
# - "single_podcast_overview" (default)
# - "dual_podcast_overview"
human_agent_config:
user_input_method: "Terminal" # Optional. Input method: "Terminal" (default) | "Custom"
orchestrator:
agent_list:
- agent_name: "Deep Research Agent" # Must match the name defined above
Template YAML Configuration of DeepResearchAgent
¶
utility_agents:
- agent_class: DeepResearchAgent
agent_name: <Name of the Agent> # Required. A descriptive name for the agent.
config:
return_intermediate_results: <true or false>
# Optional. If true, return intermediate steps and reasoning. Defaults to false.
human_in_the_loop: <true or false>
# Optional. If true, the agent may ask follow-up or clarifying questions
# based on the user query. Defaults to true.
strategy_mode: <"exploratory" | "focused" | "balanced">
# Optional. Determines the research strategy:
# - "exploratory": broad coverage across many aspects
# - "focused": deeper investigation into fewer aspects
# - "balanced": balance between breadth and depth to ensure both coverage
# and meaningful detail (default)
speech_synthesis_config:
mode: <"extended_audio" | "single_podcast_overview" | "dual_podcast_overview">
# Optional. Configures speech synthesis for audio output of the final report.
# Selects the audio generation mode:
# - "extended_audio": full-length narration (single speaker)
# - "single_podcast_overview": short podcast-style summary (single speaker) (default)
# - "dual_podcast_overview": short podcast-style summary (two speakers)
human_agent_config:
user_input_method: <"Terminal" | "Custom">
# Optional. Configures how the agent collects user input for query clarification.
# Supported modes:
# - "Terminal": command-line input (default)
# - "Custom": integrate your own input method (e.g., a web UI)
Message Schema for SDK Integration¶
This section explains how the Deep Research Agent communicates with the airefinery-sdk
during execution.
If the return_intermediate_results
is set to True
, messages are sent continuously from the backend to report progress, reasoning, discovered references, final results, and other information related to the agent’s execution. These follow the schema to ensure a consistent structure across all pipeline stages, making them easy to parse and process.
Schema Overview¶
Each message has a consistent status + payload structure:
-
status
→ High‑level classification of the message type (e.g., pipeline_step, reference). -
payload
→ A structured object carrying the detailed information for that status.
Note: Each
status
value has its own corresponding payload schema.
This schema serves several purposes:
-
Ensures type-safe communication between pipeline components and clients.
-
Makes it easier for developers to filter and process both intermediate and final results by standardizing message formats.
Detailed definitions of each
status
and its corresponding payload are provided in the Status and Payload Schemas sections below.
Example¶
When streaming results, each message contains both status
and content
:
{
"status": "ir_progress", // Message status
"content": {
"type": "ir_progress", // Discriminator for this payload type
"processed_tasks": 3, // Number of completed Iterative Research tasks so far
"total_tasks": 10 // Total number of Iterative Research tasks planned
}
}
Accessing Status and Payload¶
In client code, you can read these fields directly:
# message received from DeepResearchAgent
status = message["status"] # "ir_progress"
payload = message["content"] # structured payload
Status¶
The field status
is defined as DeepResearchStatus
, an enum
that provides a predefined set of constant values for categorizing messages in a type-safe way.
It represents the high-level categories of messages sent to the client, and each value maps to a specific payload schema that determines which payload schema is expected in the message.
Status | Meaning | Payload Schema |
---|---|---|
PIPELINE_STEP |
A major stage in the research pipeline. | DeepResearchPipelineStepPayload |
IR_PROGRESS |
Progress updates for iterative research tasks. | DeepResearchIRProgressPayload |
RESEARCH_QUESTIONS |
Research questions generated by the planning step. | DeepResearchResearchQuestionsPayload |
THOUGHT_STATUS |
Updates on reasoning steps or intermediate thought processes. | DeepResearchThoughtStatusPayload |
REFERENCE |
References or sources discovered during research. | DeepResearchReferencePayload |
SUMMARY_STATISTICS |
Final statistics summarizing runtime and resource usage. | DeepResearchSummaryStatisticsPayload |
Example¶
Checking the status
against an enum
value:
if response["status"] == DeepResearchStatus.PIPELINE_STEP:
print("This message is a pipeline step.")
elif response["status"] == DeepResearchStatus.REFERENCE:
print("This message contains reference data.")
Payload Schemas¶
Each payload corresponds to a DeepResearchStatus
and carries structured data for rendering or logging.
DeepResearchPipelineStepPayload
¶
Status
This payload corresponds to PIPELINE_STEP
.
Description
Reports a high-level pipeline step. Contains a step_key
(from DeepResearchStep
) and a human-readable info
message that describes the agent’s current stage.
Example Payload
{
"type": "pipeline_step", // Discriminator for this payload type
"step_key": DeepResearchStep.START_FOLLOW_UP, // DeepResearchStep enum value
"info": "Checking if follow-up is needed..." // Human-readable status message
}
Fields
Field | Type | Description |
---|---|---|
step_key | DeepResearchStep |
One of the enum values representing the step |
info | str |
Human-readable description of the step |
DeepResearchStep
¶
DeepResearchStep
is an enum
that defines fine-grained identifiers for specific pipeline stages.
These values populate the step_key
field of the payload, providing detailed visibility into the agent’s execution flow.
Value | Workflow Stage | Description |
---|---|---|
START_FOLLOW_UP |
Query Clarification | Begin clarification stage |
END_FOLLOW_UP_POS |
Query Clarification | Clarification successful |
END_FOLLOW_UP_NEG |
Query Clarification | Clarification not required |
FAIL_CLARIFICATION |
Query Clarification | Clarification failed |
START_RESEARCH_PLANNER |
Research Planning | Begin planning research tasks |
FAIL_RESEARCH_PLANNER |
Research Planning | Planning failed |
START_QUERY_REWRITER |
Research Planning | Begin rewriting the query |
END_QUERY_REWRITER |
Research Planning | Query successfully rewritten |
END_QUERY_REWRITER_NO_FEEDBACK |
Research Planning | Rewriting skipped (no user feedback given) |
START_SEARCH_BACKGROUND |
Research Planning | Begin background search |
END_SEARCH_BACKGROUND |
Research Planning | Background search complete |
FAIL_SEARCH_BACKGROUND |
Research Planning | Background search failed |
START_ITERATIVE_RESEARCH |
Iterative Research | Begin iterative research process |
ITERATIVE_RESEARCH_TASK_FAILED |
Iterative Research | An interative research task failed |
ITERATIVE_RESEARCH_PIPELINE_ABORTED |
Iterative Research | Iterative research process aborted |
START_AUTHOR |
Report Synthesis | Begin drafting report |
END_AUTHOR |
Report Synthesis | Report drafting complete |
FAIL_AUTHOR |
Report Synthesis | Report drafting failed |
START_AUDIO |
Audio Generation | Begin generating audio narration |
END_AUDIO |
Audio Generation | Audio generation complete |
FAIL_AUDIO |
Audio Generation | Audio generation failed |
Using
DeepResearchStep
enums ensures type safety and allows clients to respond precisely to each stage in the pipeline. For example, you can check thepayload.step_key
with a condition likeif payload.step_key == DeepResearchStep.START_FOLLOW_UP:
.
DeepResearchIRProgressPayload
¶
Status
This payload corresponds to IR_PROGRESS
.
Description
Provides progress updates during iterative research, showing how many tasks have been completed out of the total.
Example Payload
{
"type": "ir_progress", // Discriminator for this payload type
"processed_tasks": 3, // Number of completed Iterative Research tasks so far
"total_task": 10 // Total number of Iterative Research tasks planned
}
Fields
Field | Type | Description |
---|---|---|
processed_tasks | int |
Number of iterative research tasks completed |
total_task | int |
Total number of planned iterative research tasks |
DeepResearchResearchQuestionsPayload
¶
Status
This payload corresponds to RESEARCH_QUESTIONS
.
Description
Generated during the planning stage, containing the research questions that guide later steps in the pipeline.
Example Payload
{
"type": "research_questions", // Discriminator for this payload type
"questions": [ // List of generated research questions
"What are the latest advancements in renewable energy storage?",
"How does grid stability change with high solar penetration?"
]
}
Fields
Field | Type | Description |
---|---|---|
questions | list[str] |
List of generated research questions |
DeepResearchThoughtStatusPayload
¶
Status
This payload corresponds to THOUGHT_STATUS
.
Description
Provides updates on reasoning steps for a specific research question while iterative research is in progress.
Example Payload
{
"type": "thought_status", // Discriminator for this payload type
"question_id": 2, // The research question this thought belongs to
"thought": "Analyzing the economic impact of subsidies..." // Brief reasoning summary
}
Fields
Field | Type | Description |
---|---|---|
question_id | int |
ID of the related research question |
thought | str |
Human-readable summary of reasoning data |
DeepResearchReferencePayload
¶
Status
This payload corresponds to REFERENCE
.
Description
Streams references discovered during research, linked to the relevant research question.
Example Payload
{
"type": "reference", // Discriminator for this payload type
"question_id": 1, // The research question these references support
"references": { // Map of source URL -> short description/title
"https://example.com/study1": "Study on renewable energy storage",
"https://example.com/report2": "Government policy report"
}
}
Fields
Field | Type | Description |
---|---|---|
question_id | int |
ID of the related research question. |
references | dict[str, str] |
Mapping of URL → description. |
DeepResearchSummaryStatisticsPayload
¶
Status
This payload corresponds to SUMMARY_STATISTICS
.
Description
Summarizes overall runtime and resource usage after the entire DeepResearchAgent
run.
Example Payload
{
"type": "summary_statistics", // Discriminator for this payload type
"used_time": 12.5, // Total runtime (in minutes)
"website_num": 42 // Number of unique websites visited
}
Fields
Field | Type | Description |
---|---|---|
used_time | float |
Total runtime (in minutes) |
website_num | int |
Number of unique websites visited |
Unified Payload Type¶
All payloads are wrapped in a discriminated union under DeepResearchPayloadType
. This guarantees type-safe parsing
: validators pick the correct model automatically from the type field, keeping client handling simple and reliable.
DeepResearchPayloadType = Annotated[
Union[
DeepResearchPipelineStepPayload,
DeepResearchIRProgressPayload,
DeepResearchResearchQuestionsPayload,
DeepResearchThoughtStatusPayload,
DeepResearchReferencePayload,
DeepResearchSummaryStatisticsPayload,
],
Field(discriminator="type"),
]
This means:
- Every payload has a
type
field (e.g.,"pipeline_step"
,"reference"
). - The
type
value determines which schema should be applied.
Example¶
import json
from pydantic import TypeAdapter, ValidationError
try:
status = response["status"]
raw_content = response["content"] # the raw JSON payload
# Create a TypeAdapter that knows about all payload schemas
payload_adapter = TypeAdapter(DeepResearchPayloadType)
# Validate the payload
payload = payload_adapter.validate_python(json.loads(raw_content))
if isinstance(payload, DeepResearchReferencePayload):
print("Received DeepResearchReferencePayload")
elif isinstance(payload, DeepResearchPipelineStepPayload):
print("Received DeepResearchPipelineStepPayload")
except ValidationError:
# Schema is wrong (unknown type / missing fields)
print("Invalid payload received:", raw_content[:100], "...")