Skip to content

Image Generation Agent

This documentation provides an overview of the ImageGenerationAgent class configuration, and example usage.

The ImageGenerationAgent class is a utility agent within the AI Refinery SDK, designed to assist with the generation of images based on user queires. Users can provide either:

  1. a textual description of the image they want to generate, or
  2. an image to use as a reference, along with a textual description of the desired image.

The former is referred to as text-to-image, and the latter as image-to-image. Example use cases include:

  • "An inspiring image that evokes adventure and dreams, perfect for career motivation" (text-to-image), and
  • "Generate an image of a Wikipedia soccer ball concept design" (provided with an image of the Wikipedia logo, for image-to-image).

Usage

As a built-in utility agent in the AI Refinery SDK, you can easily integrate ImageGenerationAgent into your project by updating your project YAML file with the following configurations:

  • Add a utility agent with agent_class: ImageGenerationAgent under utility_agents.
  • Ensure the agent_name you chose for your ImageGenerationAgent is listed in the agent_list under orchestrator.

For a tutorial of this agent, visit this link.

Quickstart

To quickly set up a project with a ImageGenerationAgent, use the following YAML configuration. Note that additional agents can be added per your needs. You can add more agents and retrievers as needed. Refer to the next section for a detailed overview of configurable options for ImageGenerationAgent.

utility_agents:
  - agent_class: ImageGenerationAgent
    agent_name: "Image Generation Agent"
    agent_description: "This agent can help you generate an image from a prompt."
    config:
      text2image_config:
        model: flux_schnell/text2image # The name of the model for text-to-image generation
      image2image_config:
        model: flux_schnell/image2image # The name of the model for text-guided image-to-image generation
      rewriter_config: True # Use prompt rewriter for image-to-image generation

orchestrator:
  agent_list:
    - agent_name: "Image Generation Agent" # The name you chose for your ImageGenerationAgent above.

The rewriter_config option enables automatic enhancement of your input query for image-to-image generation. It refines the prompt, making it more descriptive based on the provided image, which can lead to improved image generation results. This feature is designed to assist developers in creating more detailed and accurate prompts for image-to-image generation.

Template YAML Configuration of ImageGenerationAgent

In addition to the configurations mentioned for the example above, the ImageGenerationAgent supports several other configurable options. See the template YAML configuration below for all available settings.

agent_class: ImageGenerationAgent
agent_name: <name of the agent> # A name that you choose for your ImageGenerationAgent
agent_description: <description of the agent> #Optional
config:
# Optional configurations for ImageGenerationAgent
  output_style: <"markdown" or "conversational" or "html">  # Optional field
  contexts:  # Optional field
  - "date"
  - "chat_history"
  - "chat_summary"
  text2image_config:
    model: <model_name_for_text2img>
  image2image_config:
    model: <model_name_for_img2img>
  rewriter_config: <True or False>