Skip to content

Utilize the Image Generation Agent for Soccer Ball Concept Design

Overview

The Image Generation Agent is a utility agent designed to generate an image based on user queries. Users can provide either:

  1. a textual description of the image they want to generate, or
  2. an image to use as a reference, along with a textual description of the desired image.

The former is referred to as text-to-image, and the latter as text-guided image-to-image. In this tutorial, we show how to leverage the agent to create a concept design.

Steps

1. Configuration

To utilize the Image Generation Agent, you need to define its configration in a YAML file. Consider an agentic framework that includes only the Image Generation Agent for simplicity. The configuration is as follows

orchestrator:
  agent_list:
    - agent_name: "Image Understanding Agent"

utility_agents:
  - agent_class: ImageGenerationAgent
    agent_name: "Image Generation Agent"
    agent_description: "This agent can help you generate an image from a prompt."
    config:
      text2image_config:
        model: flux_schnell/text2image # The name of the model for text-to-image generation
      image2image_config:
        model: flux_schnell/image2image # The name of the model for text-guided image-to-image generation
      rewriter_config: False # Use prompt rewriter for image-to-image generation

orchestrator:
  agent_list:
    - agent_name: "Image Generation Agent" # The name you chose for your ImageGenerationAgent above.

The rewriter_config option enables automatic enhancement of your input query for image-to-image generation. It refines the prompt, making it more descriptive based on the provided image, which can lead to improved image generation results. This feature is designed to assist developers in creating more detailed and accurate prompts for image-to-image generation.

In this tutorial, we will test the agent with and without rewriter_config enabled and compare the results.

2. Python file

Request the framework to generate an image of a Wikipedia soccer ball concept design using a reference image from this Wikipedia logo URL: https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2@2x.png. The python script with the request and image is

import os
import asyncio

from openai import AsyncOpenAI

from air import login, DistillerClient
from air import utils


auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

async def image_generation():
    # create a distiller client
    distiller_client = DistillerClient()

    # upload your config file to register a new distiller project
    distiller_client.create_project(config_path="example.yaml", project="example")

    async with distiller_client(
        project="example",
        uuid="test_user",
    ) as dc:
        # If you want to use text2img, remove the image param, otherwise, pass in your image as a base 64, or url
        responses = await dc.query(
            query="Generate an image of a wikipedia soccer ball concept design",
            image=utils.image_to_base64(
                "https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2@2x.png"
            ),
        )

        async for response in responses:

            if (response["role"] == "Image Generation Agent") and (response["image"]):
                generated_base64_image = response["image"]["image_data"]
                utils.save_base64_image(
                    generated_base64_image,
                    "<CHANGE_THIS_TO_THE_FILENAME>",
                )

            else:
                print(response)


if __name__ == "__main__":
    print("Image Generation")
    asyncio.run(image_generation())
Replace <CHANGE_THIS_TO_THE_FILENAME> with the local path and filename where you want to save the generated image.

3. Output

The generated output without rewriter_config is as follows:

wiki soccer ball without rewriter

To enable the rewriter_config, set rewriter_config: True in the configuration YAML file and rerun the Python script. The generated output with rewriter_config enabled is as follows:

wiki soccer ball with rewriter

This functionality provides more granular control over the quality of the prompts users provide.