Skip to content

Integrating Prompt Compression and Reranking in AIR

This tutorial demonstrates how to use the prompt compression and reranking capabilities within the AIR framework. It covers two key features: automatic document reranking (pre-configured) and configurable prompt compression.


Introduction

In complex AI systems, efficiently retrieving and processing information is crucial. This tutorial introduces two complementary capabilities:

  • Document Reranking: Automatically improves the relevance of retrieved documents by reordering them based on their pertinence to the query. The number of documents returned is pre-configured, but the reranker model can be selected.

  • Prompt Compression: Reduces the size of input prompts without losing essential information, enabling faster and more cost-effective processing. This feature is fully configurable, allowing fine-tuning of compression rates based on specific needs.

This tutorial showcases how to leverage these capabilities within a research agent in AIR, enhancing its ability to answer user queries through intelligent document processing.

Overview of the Flow

The process involves several steps:

  1. User Query Input: The user provides a query.
  2. Information Retrieval: The agent retrieves documents from various sources using the user's query.
  3. Reranking: The reranker API reorders the retrieved documents based on their relevance.
  4. Compression: The prompt compression API reduces the size of the top-ranked documents.
  5. Response Generation: The agent formats the compressed documents into a prompt and generates a comprehensive response with in-line citations and numbered references for source traceability.

Below is a textual representation of the flow:

User Query
Information Retrieval (from multiple sources)
Retrieved Documents
Reranker API
Ranked Documents
Prompt Compression API
Compressed Documents
Response Generation
Final Answer

Configuration Overview

The ResearchAgent is configured using a YAML configuration file. While document reranking operates automatically with optimal pre-configured settings, prompt compression can be customized through configuration parameters.

Here is the relevant configuration snippet:

base_config:
  reranker_config:
    model: "BAAI/bge-reranker-large" # a reranker from our model catalog

  compression_config:
    model: "microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank" # a compression model from our model catalog

orchestrator:
  agent_list:
    - agent_name: "Research Agent"

utility_agents:
  - agent_class: ResearchAgent
    agent_name: "Research Agent"
    agent_description: "This agent can help you in research the information needed by the user on the internet."
    config:
      compression_rate: 0.4
      retriever_config_list:
        - retriever_name: "Internet Search" # A name you choose for your retriever
          retriever_class: WebSearchRetriever # WebSearchRetriever is the type of retriever that performs web search via Google. 
          description: "This data source can collect the latest news / information from the open internet to answer any queries." # Optional. A description of the retrievar

⚠️ Warning: The reranker_top_k parameter is no longer supported and has been deprecated, please update your YAML config.

Explanation of Configuration Parameters

  • reranker_config:

    • Purpose: Specifies which reranker model to use for document reranking. Though the number of top documents returned by the reranking API is pre-configured, the choice of reranker model can be customized to suit different use cases.
    • Usage: Select from available reranker models in the model catalog.
  • compression_config:

    • Purpose: Specifies which compression model to use for prompt compression.
    • Usage: Choose from available compression models in the model catalog.
  • compression_rate:

    • Purpose: Defines the proportion to which the retrieved documents should be compressed.
    • Usage: A value between 0 and 1. For example, 0.4 compresses the documents to 40% of their original size.
    • No Compression: Setting this to 1 means no compression will be applied.
  • retriever_config_list:
    • Purpose: Defines the retrievers (data sources) used by the research agent to find relevant information for user queries. Each retriever is configured with a name, a retriever class, and a description of its purpose.

Project Execution

Next, use our DistillerClient API to create a distiller client. This client will interface with the AI Refinery service to run your project. Below is a function that sets up the distiller client. Here's what it does:

  • Instantiates a DistillerClient.
  • Creates a project named example using the configuration specified in the example.yaml file.
  • Runs the project in interactive mode.
import os

from air import DistillerClient
from dotenv import load_dotenv

load_dotenv() # loads your API_KEY from your local '.env' file
api_key=str(os.getenv("API_KEY"))

def interactive():
    distiller_client = DistillerClient(api_key=api_key)

    # upload your config file to register a new distiller project
    distiller_client.create_project(config_path="example.yaml", project="example")

    distiller_client.interactive(
        project="example",
        uuid="test_user",
    )


if __name__ == "__main__":

    # Run Interactive Mode
    print("\nInteractive Mode")
    interactive()

Sample Output

Let's consider a sample user query and observe how the system processes it.

User Query:

"Research the future of generative AI in Customer Growth"

System Processing:

  1. Information Retrieval:
    • Retrieves documents from sources like industry reports, academic papers, and news articles using the user's query.
  2. Reranking:
    • Reranks the documents to prioritize the most relevant ones concerning the query.
  3. Compression:

    • Compresses the top-ranked documents to include only essential information, reducing the prompt size to 40% of the original.
    • Example:
      • Original Text:
        * Which industries stand to gain the most?
           * What activities will deliver the most value for organizations?
           * How do—and will—workers feel about the technology?
           * What safeguards are needed to ensure responsible use of gen AI?
        
        In this visual _Explainer_, we’ve compiled all the answers we have so far—in 15 charts. We expect this space to evolve rapidly and will continue to roll out our research as that happens. To stay up to date on this topic, register for our email alerts on “artificial intelligence” here.
        
        ##
        
        Gen AI finds its legs
        
        The advanced machine learning that powers gen AI–enabled products has been decades in the making. But since ChatGPT came off the starting block in late 2022, new iterations of gen AI technology have been released several times a month. In March 2023 alone, there were six major steps forward, including new customer relationship management solutions and support for the financial services industry.
        
        _Source: What every CEO should know about generative AI_
        
      • Compressed Text:
        industries gain most? activities value for organizations? workers feel technology? safeguards responsible use gen AI? compiled answers in 15 charts expect space to evolve rapidly roll out research. register email alerts on artificial intelligence. Gen AI finds legs advanced machine learning gen AI products decades in making ChatGPT late 2022 new iterations gen AI technology released. March 2023 six major steps forward new customer relationship management solutions support for financial services industry. every CEO know about generative
        
  4. Response Generation:

    • Formats the prompt with the compressed documents.
    • Generates a comprehensive report with numbered references that correspond to the original sources.

Final Answer:

# Future of Generative AI in Customer Growth

The future of generative AI in customer growth is promising, with significant shifts expected in the way businesses approach customer experience, marketing, and sales. According to recent studies, companies that apply generative AI to customer initiatives can expect a 25% higher revenue after five years, driven by increased productivity and improved customer relevance [1].

## Key Trends in Generative AI for Customer Growth

Several trends are expected to shape the future of generative AI in customer growth:

* **Hyper-Personalization**: Generative AI will enable businesses to create customized experiences for their customers, using data and analytics to tailor products and services to individual preferences and behaviors [2].
* **Multimodal AI**: The integration of multiple sensory modalities, such as text, image, audio, and video, will become increasingly important in generative AI, enabling more comprehensive and human-like interactions [3].
* **Agentic AI**: The shift from reactive to proactive AI systems will continue, with agentic AI capable of planning and executing tasks autonomously [4].
* **Generative Search**: The rise of generative search will revolutionize the way customers interact with businesses, providing instant answers and displacing traditional link-based search results [5].

## Benefits of Generative AI in Customer Growth

The adoption of generative AI in customer growth can bring numerous benefits, including:

* **Improved Customer Relevance**: Generative AI can help businesses create more personalized and relevant experiences for their customers, driving increased loyalty and engagement [6].
* **Increased Productivity**: Automating routine tasks and augmenting human capabilities, generative AI can significantly improve productivity and efficiency in customer-facing operations [7].
* **Enhanced Customer Experience**: Generative AI can enable businesses to create more immersive and interactive experiences for their customers, driving increased satisfaction and loyalty [8].

## Challenges and Risks

While the benefits of generative AI in customer growth are significant, there are also challenges and risks to consider, including:

* **Ethical Concerns**: The use of generative AI raises important ethical concerns, such as bias, transparency, and accountability [9].
* **Job Displacement**: The automation of routine tasks and the augmentation of human capabilities can lead to job displacement and the need for workers to develop new skills [10].

## Conclusion

The future of generative AI in customer growth is promising, with significant opportunities for businesses to improve customer relevance, increase productivity, and enhance customer experience. However, it is essential to address the challenges and risks associated with generative AI, including ethical concerns and job displacement.

## References

[1] Accenture. (2023). Generative AI and Customer Growth.

[2] Kellton. (2025). Generative AI Trends 2026: Transform Work Everyday.

[3] Bernard Marr. (2025). 10 Generative AI Trends in 2026 That Will Transform Work and Life.

[4] Kellton. (2025). Generative AI Trends 2026: Transform Work Everyday.

[5] Kellton. (2025). Generative AI Trends 2026: Transform Work Everyday.

[6] Accenture. (2023). Generative AI and Customer Growth.

[7] Accenture. (2023). Generative AI and Customer Growth.

[8] Kellton. (2025). Generative AI Trends 2026: Transform Work Everyday.

[9] Conference Board. (2025). HR Future of Generative AI.

[10] Wolters Kluwer. (2025). Artificial Intelligence Survey.

Conclusion

By integrating the prompt compression and reranker APIs, the AIR system efficiently processes user queries, retrieves and prioritizes relevant information, and generates detailed, high-quality responses.