Image Generation Agent¶
This documentation provides an overview of the ImageGenerationAgent
class configuration, and example usage.
The ImageGenerationAgent
class is a utility agent within the AI Refinery SDK, designed to assist with the generation of images based on user queires. Users can provide either:
- a textual description of the image they want to generate, or
- an image to use as a reference, along with a textual description of the desired image.
The former is referred to as text-to-image, and the latter as image-to-image. Example use cases include:
- "An inspiring image that evokes adventure and dreams, perfect for career motivation" (text-to-image), and
- "Generate an image of a Wikipedia soccer ball concept design" (provided with an image of the Wikipedia logo, for image-to-image).
Usage¶
As a built-in utility agent in the AI Refinery SDK, you can easily integrate ImageGenerationAgent
into your project by updating your project YAML file with the following configurations:
- Add a utility agent with
agent_class: ImageGenerationAgent
underutility_agents.
- Ensure the
agent_name
you chose for yourImageGenerationAgent
is listed in theagent_list
underorchestrator.
For a tutorial of this agent, visit this link.
Quickstart¶
To quickly set up a project with a ImageGenerationAgent
, use the following YAML configuration. Note that additional agents can be added per your needs. You can add more agents and retrievers as needed. Refer to the next section for a detailed overview of configurable options for ImageGenerationAgent.
utility_agents:
- agent_class: ImageGenerationAgent
agent_name: "Image Generation Agent"
agent_description: "This agent can help you generate an image from a prompt."
config:
text2image_config:
model: flux_schnell/text2image # The name of the model for text-to-image generation
image2image_config:
model: flux_schnell/image2image # The name of the model for text-guided image-to-image generation
rewriter_config: True # Use prompt rewriter for image-to-image generation
orchestrator:
agent_list:
- agent_name: "Image Generation Agent" # The name you chose for your ImageGenerationAgent above.
The rewriter_config
option enables automatic enhancement of your input query for image-to-image generation. It refines the prompt, making it more descriptive based on the provided image, which can lead to improved image generation results. This feature is designed to assist developers in creating more detailed and accurate prompts for image-to-image generation.
Template YAML Configuration of ImageGenerationAgent
¶
In addition to the configurations mentioned for the example above, the ImageGenerationAgent
supports several other configurable options. See the template YAML configuration below for all available settings.
agent_class: ImageGenerationAgent
agent_name: <name of the agent> # A name that you choose for your ImageGenerationAgent
agent_description: <description of the agent> #Optional
config:
# Optional configurations for ImageGenerationAgent
output_style: <"markdown" or "conversational" or "html"> # Optional field
contexts: # Optional field
- "date"
- "chat_history"
- "chat_summary"
text2image_config:
model: <model_name_for_text2img>
image2image_config:
model: <model_name_for_img2img>
rewriter_config: <True or False>