PII Masking Module Documentation¶

Overview¶

The PII Masking Module is a lightweight yet robust wrapper around Microsoft Presidio that ensures certain categories of personally identifiable information (PII) are never exposed to backend systems or language model agents on AI Refinery. It is designed for conversational and agentic AI platforms, offering secure, frontend-based redaction of PII including emails, phone numbers, names, and more.

This module is fully configurable (the behavior and settings of the system can be customized by the user via a config file), reversible (masking can be undone through a placeholder mapping), and toggleable (the feature can be turned on/off by the user), making it adaptable for both production-grade privacy enforcement and local development needs.

Note: In this documentation, "PII" refers to the data types that can qualify as personally identifiable information or personal data as listed in Presidio's documentation.

Why Use It?¶

User Privacy by Default: Ensures that PII included in inputs (e.g., names, emails, IDs) are masked before hitting any backend API, websocket, or agent runtime. No raw PII ever leaves the client without deliberate demasking.

Configurable via Project YAML File: PII masking is now toggled and configured directly inside our project's YAML file (e.g., pii_example.yaml, pii_search_example.yaml). This centralizes privacy settings alongside agent orchestration and utility configs. Example:

base_config:
  pii_masking:
    enable: True
    config:
      common_entities: [EMAIL_ADDRESS, PHONE_NUMBER]
      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params:
            new_value: "[EMAIL]"

Plug-and-Play: The masking layer works seamlessly with all agents. Whether it's a stateless echo bot or a search agent, PII redaction is handled transparently at the client level — no changes needed in the agent logic.
Structured Placeholders: Every detected PII entity is replaced with a type-annotated placeholder such as [EMAIL_1], [PERSON_2], ensuring clarity and traceability across multi-turn exchanges (this is customizable by the user, who can define if they want to replace, redact or hash the information - these are what we call the 'operators')
Default Masking Entities: If users enable PII masking (enable: True) in their YAML file but do not specify any entities or operators, the system automatically falls back to the defaults in pii_handler.yaml. By default, the following PII entities are masked using the replace operator:
```
- PERSON
- PHONE_NUMBER
- EMAIL_ADDRESS
- CREDIT_CARD
- US_SSN
- US_BANK_NUMBER
- US_PASSPORT
- LOCATION
- DATE_TIME
- IP_ADDRESS
```
Each entity will be replaced with a structured placeholder like [EMAIL_1], [PERSON_2], etc., unless overridden.
Session-Based Metadata Tracking: Masking and unmasking operations share state within a session, not per query. This allows consistent unmasking of repeated entities across multiple messages — ideal for chat-based flows.
Dual Demo Modes (Interactive + Batch): You can explore the module either interactively or with predefined query samples:
- pii_example.py: A minimal interactive echoing agent demo that allows you to input queries and receive masked responses in real-time (see 'Example 1: pii_example.py and pii_example.yaml' under 'Examples')
- pii_search_example.py: A batch-style search agent demo that processes multiple sample queries. You can toggle between modes by commenting/uncommenting:
```
# asyncio.run(pii_demo())       # <- Batch demo
# interactive()                 # <- Interactive mode
```
  (see 'Example 2: pii_search_example.py and pii_search_example.yaml' under 'Examples')
Frontend-Only Rehydration: Original content is restorable only locally and only temporarily for display or user confirmation — never transmitted or persistently stored.
Privacy Enhancing Feature: Supports data minimization and security of PII that might be used in inputs, in line with global data privacy and protection standards, especially in production environments.

Core Design Philosophy¶

Backend-Neutral Privacy¶

PII redaction is performed on the client (SDK) side, before PII reaches:

agent functions,
REST or web-socket endpoints,
logging pipelines,
or persistent databases.

Each detected entity is substituted with a consistent, format-preserving placeholder (e.g., [EMAIL_1], [PERSON_2]) to maintain context integrity.

Reversible — But Only During Session¶

Masked outputs are reversible in memory for the duration of a single client session using PIIHandler.
This enables frontend-only rehydration of redacted content for display, verification, or QA purposes.
No PII is ever persisted or sent back to the server.

Microsoft Presidio Integration¶

The PII Masking Module is built on top of Microsoft's Presidio framework, providing robust, customizable, and language-aware detection and masking of PII.

Our system leverages three key components from Presidio:

AnalyzerEngine¶

Detects PII entities (e.g., names, emails, credit cards) in raw text using both pattern-based and ML-based recognizers.

AnonymizerEngine¶

Performs masking or redaction operations based on configuration. In your case, it generates structured placeholder tokens such as [EMAIL_1], [PHONE_2].

DeanonymizeEngine¶

Allows controlled, reversible recovery of original PII values using internally managed session-bound metadata.

YAML-Driven, Not Hardcoded¶

The module now fully adopts YAML-driven configuration. Instead of toggling flags in Python code, you (as the user) specify:

Whether masking is enabled (enable: True)
Which entities to monitor (common_entities)
How each entity should be masked (entity_operator_mapping)

Example:

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - EMAIL_ADDRESS
        - PHONE_NUMBER
      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params:
            new_value: "[EMAIL]"

This makes the system more declarative, scalable, and CI/CD-friendly.

One Masking Context Per Session¶

Unlike traditional systems that handle masking on a per-query basis, our implementation shares the masking state across the entire session. This enables:

Reuse of consistent placeholders across turns

(e.g., the same phone number will always map to [PHONE_1])
Accurate demasking of multi-turn agent conversations
More natural and trust-preserving UX in chat environments

Agent-Agnostic By Design¶

Whether you're using:

a CustomAgent that simply echoes masked text,
a SearchAgent that performs document retrieval,
or a chain-of-thought multi-agent orchestration,

...no changes are needed within the agents. PII protection wraps around the full query life cycle — from input, through orchestration, to output — without interfering with agent logic.

System Flow¶

1. User Input Received¶

A query containing PII is submitted via a DistillerClient or AsyncDistillerClient instance.
The session is initialized with a YAML configuration (e.g., pii_example.yaml) that enables or disables masking, and defines which entities to protect.

2. PII Detection & Masking (Client-Side Only)¶

PIIHandler.mask_text() is invoked to scn the input for configured common_entities.
For each match:
- A format-preserving placeholder is generated (e.g., [PHONE_1], [EMAIL_2])
- A mapping between the original value and the placeholder is recorded per session
If the same entity/value appears in multiple queries, the same placeholder will be reused.

Example:

Original Input:
"Hi, I'm John. Email me at john.doe@company.com or call (212) 555-1234."

Masked Output:
"Hi, I'm [PERSON_1]. Email me at [EMAIL_1] or call [PHONE_1]."

3. Masked Query Sent to Agent(s)¶

The masked version of the query is passed to agents through the orchestrator defined in the YAML.
No raw PII reaches:
- Agent logic
- Backend APIs
- Database logs
- Internal storage
The agents operate entirely on placeholders.

4. Agent Produces Response (Still Masked)¶

Agent responses are not altered unless frontend demasking is explicitly triggered.
By default, responses that include placeholders (e.g., [EMAIL_1]) will remain masked when returned to the client.

5. Optional: Demasking for Display¶

If enabled by the client application (e.g., CLI, notebook, frontend), the response can be passed through PIIHandler.demask_text() to reverse placeholders back into original values.
This rehydration occurs:
- Locally only
- Temporarily in memory
- Without logging or persisting raw PII

6. Session Ends → PII is Cleared¶

When the session ends (or the client is explicitly closed), the PIIHandler clears:
- The placeholder-to-PII mapping
- Metadata used for demasking
This ensures PII is never cached, stored, or retrievable after the session.

Enabling or Disabling PII Masking¶

The PII Masking Module is now controlled entirely through our project YAML configuration. This provides a clean, centralized, and declarative interface for enabling or disabling masking on a per-project basis.

How it Works¶

To enable masking, include the following in your YAML config where you define your agents (e.g., pii_example.yaml, pii_search_example.yaml):

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - EMAIL_ADDRESS
        - PHONE_NUMBER
        ...
      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params:
            new_value: "[EMAIL]"
        ...

To disable masking, you can either not include the pii_masking block in your config file, or explicitly set:

base_config:
  pii_masking:
    enable: False

If pii_masking.enable is missing or set to False, PII masking will be skipped entirely — no detection, no substitution, no metadata tracking.

Runtime Behavior¶

When a project is registered via DistillerClient.create_project(config_path=...), the system:

Reads the pii_masking block from the provided YAML config
Initializes the PIIHandler accordingly
- Enables masking and loads overrides if enable: True
- Disables masking if enable: False or absent
- If user specifies enable: True but does not provide any entities (PERSON, PHONE_NUMBER) or operators (replace, redact, hash), it defaults to our pii_handler.yaml configurations for what to mask, where we essentially merely replace the following entities (which we mentioned above) with a placeholder:
```
  - PERSON
  - PHONE_NUMBER
  - EMAIL_ADDRESS
  - CREDIT_CARD
  - US_SSN
  - US_BANK_NUMBER
  - US_PASSPORT
  - LOCATION
  - DATE_TIME
  - IP_ADDRESS
```

This behavior applies to both AsyncDistillerClient and DistillerClient

Default Configuration File¶

Default PII YAML Configuration: `pii_handler.yaml`¶

pii_handler.yaml is the default configuration file used by the PIIHandler class to control how PII is detected and masked. It is embedded within the SDK (usually under air/distiller/pii_handler/pii_handler.yaml) and automatically loaded when the user enables masking by setting base_config.pii_masking.enable: true in their project config but does not provide further customization details via the base_config.pii_masking.config section of their YAML project file (like pii_example.yaml).
pii_handler.yaml defines:
- What to detect (common_entities)
  
  A list of PII entity types (e.g., EMAIL_ADDRESS, PERSON, CREDIT_CARD) that should be scanned in user queries.
- How to mask each type (entity_operator_mapping)
  
  For each entity, you specify a masking strategy (e.g., replace, redact, or hash) and optionally define a custom placeholder.
This is what it looks like:

common_entities:
  - PERSON
  - PHONE_NUMBER
  - EMAIL_ADDRESS
  - CREDIT_CARD
  - US_SSN
  - US_BANK_NUMBER
  - US_PASSPORT
  - LOCATION
  - DATE_TIME
  - IP_ADDRESS

entity_operator_mapping:
  CREDIT_CARD:
    operator: replace
    params:
      new_value: "[CREDIT_CARD]"

  US_SSN:
    operator: replace
    params:
      new_value: "[US_SSN]"

  US_BANK_NUMBER:
    operator: replace
    params:
      new_value: "[US_BANK_NUMBER]"

  US_PASSPORT:
    operator: replace
    params:
      new_value: "[US_PASSPORT]"

  PERSON:
    operator: replace
    params:
      new_value: "[PERSON]"

  PHONE_NUMBER:
    operator: replace
    params:
      new_value: "[PHONE]"

  EMAIL_ADDRESS:
    operator: replace
    params:
      new_value: "[EMAIL]"

  LOCATION:
    operator: replace
    params:
      new_value: "[LOCATION]"

  DATE_TIME:
    operator: replace
    params:
      new_value: "[DATE]"

  IP_ADDRESS:
    operator: replace
    params:
      new_value: "[IP]"

  DEFAULT:
    operator: replace
    params:
      new_value: "<PII>"

Examples¶

Configuration: Authentication¶

In order to be able to make use of our AI Refinery agents which we can now mask leveraging our PII Masking Module feature, you first need to authenticate with an ACCOUNT number and API_KEY which need to be granted to you. Next you have to create an environment file in the same directory as the example files (.env file) containins:

ACCOUNT=<your_account_name>
API_KEY=<your_api_key_value>

In the examples provided below, pii_example.py (from Example 1) and pii_search_example.py (from Example 2) are setup to work with this file

Example 1: pii_example.py and pii_example.yaml¶

Purpose¶

A minimal interactive demo that lets you enter queries via the terminal.

It's ideal for understanding how PII masking integrates into a live session and how placeholder substitution works in real-time.

This uses:

DistillerClient (synchronous wrapper)
A simple Echoing Agent
A project config defined in pii_example.yaml, including masking rules

How It Works¶

You authenticate and create a new project using pii_example.yaml.
You register an Echoing Agent, which simply returns your masked input.
You can interactively enter text, and the PII masking is handled before anything reaches the agent.
The masked response is printed, and frontend demasking (in memory only) restores original values if needed.

`pii_example.py`¶

# pii_example.py

import os
from typing import Any, Awaitable, Callable, Dict, Union, cast
from dotenv import load_dotenv
from air import DistillerClient, login

# Authenticate
load_dotenv()
auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

async def echoing_agent(query: str) -> str:
    """A minimal agent that just echoes queries. PII masking is handled by DistillerClient before this."""
    return f"Processed query:\n{query}"

def interactive():
    """Launch interactive demo with registered simple agent."""
    distiller_client = DistillerClient()
    distiller_client.create_project(config_path="pii_example.yaml", project="pii-demo")

    executor_dict = {"Echoing Agent": echoing_agent}

    distiller_client.interactive(
        project="pii-demo",
        uuid="some-uuid",
        executor_dict=cast(Dict[str, Union[Callable[..., Any], Dict[str, Callable[..., Any]]]], executor_dict),
    )

if __name__ == "__main__":
    print("\n[PII Demo] Interactive Mode")
    interactive()

`pii_example.yaml`¶

orchestrator:
  agent_list:
    - agent_name: "Echoing Agent"

utility_agents:
  - agent_class: CustomAgent
    agent_name: "Echoing Agent"
    agent_description: "This agent receives a query with PII already masked by the distiller client and either responds or echoes your query."
    config:
      output_style: "conversational"

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - PHONE_NUMBER
        - EMAIL_ADDRESS
        - CREDIT_CARD
        - US_SSN
        - US_BANK_NUMBER
        - US_PASSPORT
        - LOCATION
        - DATE_TIME
        - IP_ADDRESS

      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params: { new_value: "[EMAIL]" }
        PERSON:
          operator: replace
          params: { new_value: "[PERSON]" }
        PHONE_NUMBER:
          operator: replace
          params: { new_value: "[PHONE]" }
        CREDIT_CARD:
          operator: replace
          params: { new_value: "[CREDIT_CARD]" }
        US_SSN:
          operator: replace
          params: { new_value: "[US_SSN]" }
        US_BANK_NUMBER:
          operator: replace
          params: { new_value: "[US_BANK_NUMBER]" }
        US_PASSPORT:
          operator: replace
          params: { new_value: "[US_PASSPORT]" }
        LOCATION:
          operator: replace
          params: { new_value: "[LOCATION]" }
        DATE_TIME:
          operator: replace
          params: { new_value: "[DATE]" }
        IP_ADDRESS:
          operator: replace
          params: { new_value: "[IP]" }

Example 2: pii_search_example.py and pii_search_example.yaml¶

Purpose¶

This example is designed for scripted testing, where a batch of hardcoded queries is sent to an agent.

You can observe how each PII element is masked, and how the system behaves across multiple PII types.

It uses:

AsyncDistillerClient
A simple SearchAgent
The same PII masking engine and configuration logic as Example 1

Flexible Modes¶

The script supports two modes:

Demo mode (enabled by default) — runs through sample queries programmatically
Interactive mode — comment out the demo and uncomment the interactive section at the bottom to run it live.

`pii_search_example.py`¶

# pii_search_example.py

import asyncio, os, uuid
from typing import Any, Awaitable, Callable, Dict, Union, cast
from dotenv import load_dotenv
from air import login
from air.distiller.client import AsyncDistillerClient

# Authenticate
load_dotenv()
auth = login(account=str(os.getenv("ACCOUNT")), api_key=str(os.getenv("API_KEY")))

async def search_agent(query: str) -> str:
    """Defining a search agent to test PII masking, which is handled by DistillerClient before this."""
    return f"Processed query:\n{query}"

async def pii_demo():
    queries = [
        "Hi, I'm Henry. My number is 4111 1111 1111 1111.",
        "Can you book a meeting with Dr. Jane Doe at (212) 555-7890 on May 4th?",
        "The IP address 192.168.0.1 should be allowed in the firewall.",
        "Email my updated resume to recruiter@company.com.",
        "Her SSN is 123-45-6789 and passport is X1234567.",
    ]

    distiller_client = AsyncDistillerClient()
    distiller_client.create_project(config_path="pii_search_example.yaml", project="pii-demo")
    session_id = str(uuid.uuid4())

    await distiller_client.connect(
        project="pii-demo",
        uuid=session_id,
        executor_dict={"Search Agent": search_agent},
    )

    print("\n[PII Demo] Running Sample Queries\n")

    for i, query in enumerate(queries, 1):
        print(f"Query {i}:\nOriginal: {query}")
        try:
            responses = await distiller_client.query(query)
            async for response in responses:
                print(f"Masked Output:\n{response['content']}\n{'-'*50}")
        except Exception as e:
            print(f"[ERROR] Failed to process query {i}: {e}")
            print("-" * 50)

    await distiller_client.close()

def interactive():
    distiller_client = AsyncDistillerClient()
    distiller_client.create_project(config_path="pii_search_example.yaml", project="pii-demo")
    executor_dict = {"Search Agent": search_agent}
    distiller_client.interactive(
        project="pii-demo",
        uuid="some-uuid",
        executor_dict=cast(Dict[str, Union[Callable[..., Any], Dict[str, Callable[..., Any]]]], executor_dict),
    )

if __name__ == "__main__":
    print("\n[PII Demo] Sample Queries")
    asyncio.run(pii_demo())

    # To try live interaction, comment out the line above and uncomment the next lines:
    # print("\n[PII Demo] Interactive Mode")
    # interactive()

`pii_search_example.yaml`¶

orchestrator:
  agent_list:
    - agent_name: "Search Agent"

utility_agents:
  - agent_class: SearchAgent
    agent_name: "Search Agent"
    agent_description: "This agent receives a query with or without PII already masked by the distiller client, performs searches and replies to user."
    config:
      output_style: "conversational"

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - PHONE_NUMBER
        - EMAIL_ADDRESS
        - CREDIT_CARD
        - US_SSN
        - US_BANK_NUMBER
        - US_PASSPORT
        - LOCATION
        - DATE_TIME
        - IP_ADDRESS

      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params: { new_value: "[EMAIL]" }
        PERSON:
          operator: replace
          params: { new_value: "[PERSON]" }
        PHONE_NUMBER:
          operator: replace
          params: { new_value: "[PHONE]" }
        CREDIT_CARD:
          operator: replace
          params: { new_value: "[CREDIT_CARD]" }
        US_SSN:
          operator: replace
          params: { new_value: "[US_SSN]" }
        US_BANK_NUMBER:
          operator: replace
          params: { new_value: "[US_BANK_NUMBER]" }
        US_PASSPORT:
          operator: replace
          params: { new_value: "[US_PASSPORT]" }
        LOCATION:
          operator: replace
          params: { new_value: "[LOCATION]" }
        DATE_TIME:
          operator: replace
          params: { new_value: "[DATE]" }
        IP_ADDRESS:
          operator: replace
          params: { new_value: "[IP]" }

For reference¶

Example	Mode	Client Used	Purpose
`pii_example.py`	Interactive	`DistillerClient`	Try queries manually
`pii_search_example.py`	Scripted (or Interactive)	`AsyncDistillerClient`	Batch-test masking behavior across PII types + try queries manually with a more complex agent

Example Interaction¶

Input:

Hey, please call me at (212) 555-8124 and send the report to john.doe@company.com.

PII Identified:

[PII MASKING] Detected and masked the following PII types:
 - PHONE_NUMBER at [24:38] -> '(212) 555-8124' -> [PHONE_1]
 - EMAIL_ADDRESS at [67:89] -> 'john.doe@company.com' -> [EMAIL_1]

Masking by PIIHandler.mask_text():

Hey, please call me at [PHONE_1] and send the report to [EMAIL_1].

Agent Output:

Processed query:
Hey, please call me at [PHONE_1] and send the report to [EMAIL_1].

Unmasked (frontend-only) Unmasked View:

Processed query:
Hey, please call me at (212) 555-8124 and send the report to john.doe@company.com.

This view is reconstructed locally in-memory using metadata saved during masking. The demasking is only available for the session and is never persisted or sent to any backend.

Supported PII Types and Operators¶

Supported PII Types¶

The PII masking module leverages Microsoft Presidio to detect a broad range of commonly regulated or personal data types. All supported types must be explicitly listed in the YAML config under common_entities.

Entity Type	Placeholder Format	Example Match	Description
`EMAIL_ADDRESS`	`[EMAIL_1]`	`john.doe@example.com`	Email addresses
`PHONE_NUMBER`	`[PHONE_1]`	`(212) 555-8124`	US or international phone numbers
`PERSON`	`[PERSON_1]`	`Jane Doe`	First and last names
`CREDIT_CARD`	`[CREDIT_CARD_1]`	`4111 1111 1111 1111`	Visa/Mastercard/Amex credit cards
`US_SSN`	`[US_SSN_1]`	`123-45-6789`	U.S. Social Security Numbers
`US_BANK_NUMBER`	`[US_BANK_NUMBER_1]`	`987654321`	U.S. bank account numbers
`US_PASSPORT`	`[US_PASSPORT_1]`	`X1234567`	U.S. passport numbers
`LOCATION`	`[LOCATION_1]`	`1600 Amphitheatre Parkway`	Physical address, city, state, ZIP
`DATE_TIME`	`[DATE_1]`	`May 4th`, `01/01/2024`	Absolute or relative dates and times
`IP_ADDRESS`	`[IP_1]`	`192.168.0.1`, `2001:db8::1`	IPv4 and IPv6 addresses

To activate detection for a type, include it under common_entities in your YAML config. The default pii_handler.yamland the examples already include all types above.

Supported PII Operators¶

Each entity type can be individually configured in the YAML using one of the supported operators below. You define the operator under entity_operator_mapping.

`replace`¶

Replaces the original PII with a structured placeholder (e.g., [EMAIL_1])
Default behavior if not specified

EMAIL_ADDRESS:
  operator: replace
  params:
    new_value: "[EMAIL]"

`redact`¶

Completely removes the PII from the text (no placeholder left behind)

PHONE_NUMBER:
  operator: redact

Input:

Call me at (212) 555-8124

Masked:

Call me at

`hash`¶

Replaces the original PII with a hashed representation (irreversible)

US_SSN:
  operator: hash

Input:

SSN is 123-45-6789

Masked:

SSN is 7e7cf1d9dcd21e...

`DEFAULT` Handler (Fallback)¶

To apply a global fallback to any undefined entity type, use the DEFAULT key:

DEFAULT:
  operator: replace
  params:
    new_value: "<PII>"

If Presidio detects an entity type not explicitly listed in entity_operator_mapping, this operator will apply.

Advanced Customization¶

The PII Masking Module is highly flexible and allows you to tailor both which entities to detect and how to handle them. All customizations are centralized in the same YAML configuration file used for the agent orchestration (e.g., pii_example.yaml or pii_search_example.yaml), under base_config.pii_masking.

Adding More Entities¶

If Presidio supports additional PII types (e.g., IBAN_CODE, MEDICAL_LICENSE, or custom recognizers), you can extend your config:

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - IBAN_CODE
        - MEDICAL_LICENSE
        - PERSON

Make sure to also define masking behavior:

entity_operator_mapping:
  IBAN_CODE:
    operator: hash
  MEDICAL_LICENSE:
    operator: redact

You can find the full list of built-in PII entity types in Presidio's documentation.

Defining Custom Operators or Placeholder Formats¶

You may redefine any placeholder format per entity by customizing the new_value:

EMAIL_ADDRESS:
  operator: replace
  params:
    new_value: "<<email>>"

Or enable hashing for irreversible masking:

CREDIT_CARD:
  operator: hash

Or remove PII altogether (no placeholder shown):

LOCATION:
  operator: redact

Creating Multiple YAML Variants¶

You can maintain multiple config files (e.g., pii_example.yaml, pii_search_example.yaml, pii_strict.yaml) with different combinations of:

Enabled/disabled masking
Different entity sets
Operator schemes
Agent configurations

Then pass the desired YAML to create_project(config_path=...) when registering your project.

Use Case Matrix¶

Below is a guide to help you decide when to use PII masking and how to configure it:

Use Case	Masking Enabled	Recommended Operator	Why This Matters
Production inference	Yes	`replace`	Prevents raw PII from reaching logs, models, or monitoring agents
Internal debugging	Optional	—	Devs can see original inputs for issue diagnosis
Compliance audits	Yes	`replace`, `hash`	Shows evidence of redaction while retaining traceability
External demo/showcases	Yes	`replace`	Guarantees privacy-safe interactions during live sessions
QA & annotation tooling	Optional	`replace`, `redact`	Keep PII masked during human reviews
Analytics dashboards	Yes	`replace`, `redact`	Prevents PII leakage into metrics or reporting tools
Sensitive search indexing	Yes	`hash`, `redact`	Allows indexing without storing PII

PII Masking Module Documentation¶

Overview¶

Why Use It?¶

Core Design Philosophy¶

Backend-Neutral Privacy¶

Reversible — But Only During Session¶

Microsoft Presidio Integration¶

AnalyzerEngine¶

AnonymizerEngine¶

DeanonymizeEngine¶

YAML-Driven, Not Hardcoded¶

One Masking Context Per Session¶

Agent-Agnostic By Design¶

System Flow¶

1. User Input Received¶

2. PII Detection & Masking (Client-Side Only)¶

3. Masked Query Sent to Agent(s)¶

4. Agent Produces Response (Still Masked)¶

5. Optional: Demasking for Display¶

6. Session Ends → PII is Cleared¶

Enabling or Disabling PII Masking¶

How it Works¶

Runtime Behavior¶

Default Configuration File¶

Default PII YAML Configuration: pii_handler.yaml¶

Examples¶

Configuration: Authentication¶

Example 1: pii_example.py and pii_example.yaml¶

Purpose¶

How It Works¶

pii_example.py¶

pii_example.yaml¶

Example 2: pii_search_example.py and pii_search_example.yaml¶

Purpose¶

Flexible Modes¶

pii_search_example.py¶

pii_search_example.yaml¶

For reference¶

Example Interaction¶

Supported PII Types and Operators¶

Supported PII Types¶

Supported PII Operators¶

replace¶

redact¶

hash¶

DEFAULT Handler (Fallback)¶

Advanced Customization¶

Adding More Entities¶

Defining Custom Operators or Placeholder Formats¶

Creating Multiple YAML Variants¶

Use Case Matrix¶

Default PII YAML Configuration: `pii_handler.yaml`¶

`pii_example.py`¶

`pii_example.yaml`¶

`pii_search_example.py`¶

`pii_search_example.yaml`¶

`replace`¶

`redact`¶

`hash`¶

`DEFAULT` Handler (Fallback)¶