Utilize the Image Generation Agent for Soccer Ball Concept Design¶
Overview¶
The Image Generation Agent is a utility agent designed to generate an image based on user queries. Users can provide either:
- a textual description of the image they want to generate, or
- an image to use as a reference, along with a textual description of the desired image.
The former is referred to as text-to-image, and the latter as text-guided image-to-image. In this tutorial, we show how to leverage the agent to create a concept design.
Steps¶
1. Configuration¶
To utilize the Image Generation Agent, you need to define its configration in a YAML file. Consider an agentic framework that includes only the Image Generation Agent for simplicity. The configuration is as follows
orchestrator:
agent_list:
- agent_name: "Image Understanding Agent"
utility_agents:
- agent_class: ImageGenerationAgent
agent_name: "Image Generation Agent"
agent_description: "This agent can help you generate an image from a prompt."
config:
text2image_config:
model: flux_schnell/text2image # The name of the model for text-to-image generation
image2image_config:
model: flux_schnell/image2image # The name of the model for text-guided image-to-image generation
rewriter_config: False # Use prompt rewriter for image-to-image generation
orchestrator:
agent_list:
- agent_name: "Image Generation Agent" # The name you chose for your ImageGenerationAgent above.
The rewriter_config
option enables automatic enhancement of your input query for image-to-image generation. It refines the prompt, making it more descriptive based on the provided image, which can lead to improved image generation results. This feature is designed to assist developers in creating more detailed and accurate prompts for image-to-image generation.
In this tutorial, we will test the agent with and without rewriter_config
enabled and compare the results.
2. Python file¶
Request the framework to generate an image of a Wikipedia soccer ball concept design using a reference image from this Wikipedia logo URL: https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2@2x.png. The python script with the request and image is
import os
import asyncio
from openai import AsyncOpenAI
from air import login, DistillerClient
from air import utils
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
async def image_generation():
# create a distiller client
distiller_client = DistillerClient()
# upload your config file to register a new distiller project
distiller_client.create_project(config_path="example.yaml", project="example")
async with distiller_client(
project="example",
uuid="test_user",
) as dc:
# If you want to use text2img, remove the image param, otherwise, pass in your image as a base 64, or url
responses = await dc.query(
query="Generate an image of a wikipedia soccer ball concept design",
image=utils.image_to_base64(
"https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2@2x.png"
),
)
async for response in responses:
if (response["role"] == "Image Generation Agent") and (response["image"]):
generated_base64_image = response["image"]["image_data"]
utils.save_base64_image(
generated_base64_image,
"<CHANGE_THIS_TO_THE_FILENAME>",
)
else:
print(response)
if __name__ == "__main__":
print("Image Generation")
asyncio.run(image_generation())
<CHANGE_THIS_TO_THE_FILENAME>
with the local path and filename where you want to save the generated image.
3. Output¶
The generated output without rewriter_config
is as follows:
To enable the rewriter_config
, set rewriter_config: True
in the configuration YAML file and rerun the Python script. The generated output with rewriter_config
enabled is as follows:
This functionality provides more granular control over the quality of the prompts users provide.