Skip to content

PII Masking Module Documentation

Overview

TheĀ PII Masking ModuleĀ is a lightweight yet robust wrapper around Microsoft Presidio that ensuresĀ personally identifiable information (PII)Ā is never exposed to backend systems or language model agents on AI Refinery. It is designed for conversational and agentic AI platforms, offering secure, frontend-based redaction of sensitive data including emails, phone numbers, names, and more.

This module isĀ fully configurable (the behavior and settings of the system can be customized by the user via a config file),Ā reversible (masking can be undone through a placeholder mapping), andĀ toggleable (the feature can be turned on/off by the user), making it adaptable for both production-grade privacy enforcement and local development needs.

Why Use It?

  • User Privacy by Default: Ensures that sensitive inputs (e.g., names, emails, IDs) are masked before hitting any backend API, websocket, or agent runtime. No raw PII ever leaves the client without deliberate demasking.
  • Configurable via Project YAML File: PII masking is now toggled and configured directly inside our project’s YAML file (e.g., pii_example.yaml, pii_search_example.yaml). This centralizes privacy settings alongside agent orchestration and utility configs. Example:

    base_config:
      pii_masking:
        enable: True
        config:
          common_entities: [EMAIL_ADDRESS, PHONE_NUMBER]
          entity_operator_mapping:
            EMAIL_ADDRESS:
              operator: replace
              params:
                new_value: "[EMAIL]"
    
  • Plug-and-Play: The masking layer works seamlessly with all agents. Whether it's a stateless echo bot or a search agent, PII redaction is handled transparently at the client level — no changes needed in the agent logic.

  • Structured Placeholders: Every detected entity is replaced with a type-annotated placeholder such asĀ [EMAIL_1],Ā [PERSON_2], ensuring clarity and traceability across multi-turn exchanges (this is customizable by the user, who can define if they want to replace, redact or hash the information - these are what we call the ā€˜operators’)
  • Default Masking Entities: If users enable PII masking (enable: True) in their YAML file but doĀ notĀ specify any entities or operators, the system automatically falls back to the defaults inĀ pii_handler.yaml. By default, the following PII entities are masked using theĀ replaceĀ operator:

    - PERSON
    - PHONE_NUMBER
    - EMAIL_ADDRESS
    - CREDIT_CARD
    - US_SSN
    - US_BANK_NUMBER
    - US_PASSPORT
    - LOCATION
    - DATE_TIME
    - IP_ADDRESS
    

    Each entity will be replaced with a structured placeholder likeĀ [EMAIL_1],Ā [PERSON_2], etc., unless overridden.

  • Session-Based Metadata Tracking: Masking and unmasking operations share state within a session, not per query. This allows consistent unmasking of repeated entities across multiple messages — ideal for chat-based flows.

  • Dual Demo Modes (Interactive + Batch): You can explore the module either interactively or with predefined query samples:

    • pii_example.py: A minimalĀ interactive echoing agentĀ demo that allows you to input queries and receive masked responses in real-time (see ā€˜Example 1: pii_example.py and pii_example.yaml’ under ā€˜Examples’)
    • pii_search_example.py: AĀ batch-style search agent demoĀ that processes multiple sample queries. You can toggle between modes by commenting/uncommenting:

      # asyncio.run(pii_demo())       # <- Batch demo
      # interactive()                 # <- Interactive mode
      

      (see ā€˜Example 2: pii_search_example.py and pii_search_example.yaml’ under ā€˜Examples’)

  • Frontend-Only Rehydration: Original content is restorableĀ only locallyĀ andĀ only temporarilyĀ for display or user confirmation — never transmitted or stored.

  • Regulatory Compliance Alignment: Supports data minimization and protection standards likeĀ GDPR,Ā HIPAA, andĀ CCPA, especially in production environments where sensitive inputs must be masked before processing.

Core Design Philosophy

Backend-Neutral Privacy

PII redaction is performedĀ on the client (SDK) side, before any data reaches:

  • agent functions,
  • REST or web-socket endpoints,
  • logging pipelines,
  • or persistent databases.

Each detected entity is substituted with a consistent, format-preserving placeholder (e.g.,Ā [EMAIL_1],Ā [PERSON_2]) to maintain context integrity while safeguarding privacy.

Reversible — But Only During Session

  • Masked outputs areĀ reversible in memoryĀ for the duration of a single client session usingĀ PIIHandler.
  • This enables frontend-only rehydration of redacted content for display, verification, or QA purposes.
  • No sensitive information is ever persisted or sent back to the server.

Microsoft Presidio Integration

The PII Masking Module is built on top of Microsoft’sĀ PresidioĀ framework, providing robust, customizable, and language-aware detection and anonymization of personally identifiable information (PII).

Our system leverages three key components from Presidio:

AnalyzerEngine

Detects PII entities (e.g., names, emails, credit cards) in raw text using both pattern-based and ML-based recognizers.

AnonymizerEngine

Performs masking or redaction operations based on configuration. In your case, it generatesĀ structured placeholder tokensĀ such asĀ [EMAIL_1],Ā [PHONE_2].

DeanonymizeEngine

Allows controlled, reversible recovery of original PII values using internally managedĀ session-bound metadata.

YAML-Driven, Not Hardcoded

The module now fully adopts YAML-driven configuration. Instead of toggling flags in Python code, you (as the user) specify:

  • Whether masking is enabled (enable: True)
  • Which entities to monitor (common_entities)
  • How each entity should be masked (entity_operator_mapping)

Example:

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - EMAIL_ADDRESS
        - PHONE_NUMBER
      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params:
            new_value: "[EMAIL]"

This makes the system more declarative, scalable, and CI/CD-friendly.

One Masking Context Per Session

Unlike traditional systems that handle masking on aĀ per-queryĀ basis, our implementation shares the masking state across the entire session. This enables:

  • Reuse of consistent placeholders across turns

    (e.g., the same phone number will always map toĀ [PHONE_1])

  • Accurate demasking of multi-turn agent conversations

  • More natural and trust-preserving UX in chat environments

Agent-Agnostic By Design

Whether you're using:

  • aĀ CustomAgentĀ that simply echoes masked text,
  • aĀ SearchAgentĀ that performs document retrieval,
  • or a chain-of-thought multi-agent orchestration,

...no changes are needed within the agents. PII protection wraps around the full query life cycle — from input, through orchestration, to output — without interfering with agent logic.

System Flow

1.Ā User Input Received

  • A query containing potentially sensitive information is submitted via aĀ DistillerClientĀ orĀ AsyncDistillerClientĀ instance.
  • The session is initialized with a YAML configuration (e.g.,Ā pii_example.yaml) that enables or disables masking, and defines which entities to protect.

2.Ā PII Detection & Masking (Client-Side Only)

  • PIIHandler.mask_text()Ā is invoked to scn the input for configuredĀ common_entities.
  • For each match:
    • A format-preserving placeholder is generated (e.g.,Ā [PHONE_1],Ā [EMAIL_2])
    • A mapping between the original value and the placeholder is recordedĀ per session
  • If the same entity/value appears in multiple queries, the same placeholder will be reused.

Example:

Original Input:
"Hi, I'm John. Email me at john.doe@company.com or call (212) 555-1234."

Masked Output:
"Hi, I'm [PERSON_1]. Email me at [EMAIL_1] or call [PHONE_1]."

3.Ā Masked Query Sent to Agent(s)

  • The masked version of the query is passed to agents through the orchestrator defined in the YAML.
  • No raw PII reaches:
    • Agent logic
    • Backend APIs
    • Database logs
    • Internal storage
  • The agents operate entirely on placeholders.

4.Ā Agent Produces Response (Still Masked)

  • Agent responses areĀ not alteredĀ unless frontend demasking is explicitly triggered.
  • By default, responses that include placeholders (e.g.,Ā [EMAIL_1]) will remain masked when returned to the client.

5.Ā Optional: Demasking for Display

  • If enabled by the client application (e.g., CLI, notebook, frontend), the response can be passed throughĀ PIIHandler.demask_text()Ā to reverse placeholders back into original values.
  • This rehydration occurs:
    • Locally only
    • Temporarily in memory
    • Without logging or persisting raw PII

6.Ā Session Ends → PII is Cleared

  • When the session ends (or the client is explicitly closed), theĀ PIIHandlerĀ clears:
    • The placeholder-to-PII mapping
    • Metadata used for demasking
  • This ensures PII is never cached, stored, or retrievable after the session.

Enabling or Disabling PII Masking

The PII Masking Module is now controlled entirely through ourĀ project YAML configuration. This provides a clean, centralized, and declarative interface for enabling or disabling masking on a per-project basis.

How it Works

ToĀ enable masking, include the following in your YAML config where you define your agents (e.g.,Ā pii_example.yaml,Ā pii_search_example.yaml):

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - EMAIL_ADDRESS
        - PHONE_NUMBER
        ...
      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params:
            new_value: "[EMAIL]"
        ...

ToĀ disable masking, you can either not include the pii_masking block in your config file, or explicitly set:

base_config:
  pii_masking:
    enable: False

IfĀ pii_masking.enableĀ is missing or set toĀ False, PII masking will be skipped entirely — no detection, no substitution, no metadata tracking.

Runtime Behavior

When a project is registered viaĀ DistillerClient.create_project(config_path=...), the system:

  1. Reads theĀ pii_maskingĀ block from the provided YAML config
  2. Initializes theĀ PIIHandlerĀ accordingly
    • Enables masking and loads overrides ifĀ enable: True
    • Disables masking ifĀ enable: FalseĀ or absent
    • If user specifies enable: True but does not provide any entities (PERSON, PHONE_NUMBER) or operators (replace, redact, hash), it defaults to our pii_handler.yaml configurations for what to mask, where we essentially merely replace the following entities (which we mentioned above) with a placeholder:

        - PERSON
        - PHONE_NUMBER
        - EMAIL_ADDRESS
        - CREDIT_CARD
        - US_SSN
        - US_BANK_NUMBER
        - US_PASSPORT
        - LOCATION
        - DATE_TIME
        - IP_ADDRESS
      

This behavior applies to both AsyncDistillerClientĀ and DistillerClient

Default Configuration File

Default PII YAML Configuration:Ā pii_handler.yaml

  • pii_handler.yamlĀ is theĀ default configuration fileĀ used by theĀ PIIHandlerĀ class to control how personally identifiable information (PII) is detected and masked. It isĀ embedded within the SDKĀ (usually underĀ air/distiller/pii_handler/pii_handler.yaml) andĀ automatically loadedĀ when the user enables masking by settingĀ base_config.pii_masking.enable: trueĀ in their project config but does not provide further customization details via theĀ base_config.pii_masking.configĀ section of their YAML project file (likeĀ pii_example.yaml).
  • pii_handler.yaml defines:

    • What to detectĀ (common_entities)

      A list of PII entity types (e.g., EMAIL_ADDRESS, PERSON, CREDIT_CARD) that should be scanned in user queries.

    • How to mask each typeĀ (entity_operator_mapping)

      For each entity, you specify a masking strategy (e.g.,Ā replace,Ā redact, orĀ hash) and optionally define a custom placeholder.

  • This is what it looks like:

common_entities:
  - PERSON
  - PHONE_NUMBER
  - EMAIL_ADDRESS
  - CREDIT_CARD
  - US_SSN
  - US_BANK_NUMBER
  - US_PASSPORT
  - LOCATION
  - DATE_TIME
  - IP_ADDRESS

entity_operator_mapping:
  CREDIT_CARD:
    operator: replace
    params:
      new_value: "[CREDIT_CARD]"

  US_SSN:
    operator: replace
    params:
      new_value: "[US_SSN]"

  US_BANK_NUMBER:
    operator: replace
    params:
      new_value: "[US_BANK_NUMBER]"

  US_PASSPORT:
    operator: replace
    params:
      new_value: "[US_PASSPORT]"

  PERSON:
    operator: replace
    params:
      new_value: "[PERSON]"

  PHONE_NUMBER:
    operator: replace
    params:
      new_value: "[PHONE]"

  EMAIL_ADDRESS:
    operator: replace
    params:
      new_value: "[EMAIL]"

  LOCATION:
    operator: replace
    params:
      new_value: "[LOCATION]"

  DATE_TIME:
    operator: replace
    params:
      new_value: "[DATE]"

  IP_ADDRESS:
    operator: replace
    params:
      new_value: "[IP]"

  DEFAULT:
    operator: replace
    params:
      new_value: "<PII>"

Examples

Example 1: pii_example.py and pii_example.yaml

Purpose

AĀ minimal interactive demoĀ that lets you enter queries via the terminal.

It's ideal for understanding howĀ PII masking integrates into a live sessionĀ and how placeholder substitution works in real-time.

This uses:

  • DistillerClientĀ (synchronous wrapper)
  • A simple Echoing Agent
  • A project config defined inĀ pii_example.yaml, including masking rules

How It Works

  1. You authenticate and create a new project usingĀ pii_example.yaml.
  2. You register anĀ Echoing Agent, which simply returns your masked input.
  3. You can interactively enter text, and the PII masking is handled before anything reaches the agent.
  4. The masked response is printed, and frontend demasking (in memory only) restores original values if needed.

pii_example.py

# pii_example.py

import os
from typing import Any, Awaitable, Callable, Dict, Union, cast
from air import DistillerClient, login

# Authenticate
auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

async def echoing_agent(query: str) -> str:
    """A minimal agent that just echoes queries. PII masking is handled by DistillerClient before this."""
    return f"Processed query:\n{query}"

def interactive():
    """Launch interactive demo with registered simple agent."""
    distiller_client = DistillerClient()
    distiller_client.create_project(config_path="pii_example.yaml", project="pii-demo")

    executor_dict = {"Echoing Agent": echoing_agent}

    distiller_client.interactive(
        project="pii-demo",
        uuid="some-uuid",
        executor_dict=cast(Dict[str, Union[Callable[..., Any], Dict[str, Callable[..., Any]]]], executor_dict),
    )

if __name__ == "__main__":
    print("\n[PII Demo] Interactive Mode")
    interactive()

pii_example.yaml

orchestrator:
  agent_list:
    - agent_name: "Echoing Agent"

utility_agents:
  - agent_class: CustomAgent
    agent_name: "Echoing Agent"
    agent_description: "This agent receives a query with sensitive information already masked by the distiller client and either responds or echoes your query."
    config:
      output_style: "conversational"

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - PHONE_NUMBER
        - EMAIL_ADDRESS
        - CREDIT_CARD
        - US_SSN
        - US_BANK_NUMBER
        - US_PASSPORT
        - LOCATION
        - DATE_TIME
        - IP_ADDRESS

      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params: { new_value: "[EMAIL]" }
        PERSON:
          operator: replace
          params: { new_value: "[PERSON]" }
        PHONE_NUMBER:
          operator: replace
          params: { new_value: "[PHONE]" }
        CREDIT_CARD:
          operator: replace
          params: { new_value: "[CREDIT_CARD]" }
        US_SSN:
          operator: replace
          params: { new_value: "[US_SSN]" }
        US_BANK_NUMBER:
          operator: replace
          params: { new_value: "[US_BANK_NUMBER]" }
        US_PASSPORT:
          operator: replace
          params: { new_value: "[US_PASSPORT]" }
        LOCATION:
          operator: replace
          params: { new_value: "[LOCATION]" }
        DATE_TIME:
          operator: replace
          params: { new_value: "[DATE]" }
        IP_ADDRESS:
          operator: replace
          params: { new_value: "[IP]" }

Example 2: pii_search_example.py and pii_search_example.yaml

Purpose

This example is designed forĀ scripted testing, where a batch of hardcoded queries is sent to an agent.

You can observeĀ how each sensitive element is masked, and how the system behaves across multiple PII types.

It uses:

  • AsyncDistillerClient
  • A simpleĀ SearchAgent
  • The same PII masking engine and configuration logic as Example 1

Flexible Modes

The script supports two modes:

  • Demo mode (enabled by default) — runs through sample queries programmatically
  • Interactive mode — comment out the demo and uncomment the interactive section at the bottom to run it live.

pii_search_example.py

# pii_search_example.py

import asyncio, os, uuid
from typing import Any, Awaitable, Callable, Dict, Union, cast
from air import login
from air.distiller.client import AsyncDistillerClient

# Authenticate
auth = login(account=str(os.getenv("ACCOUNT")), api_key=str(os.getenv("API_KEY")))

async def search_agent(query: str) -> str:
    """Defining a search agent to test PII masking, which is handled by DistillerClient before this."""
    return f"Processed query:\n{query}"

async def pii_demo():
    queries = [
        "Hi, I'm Henry. My number is 4111 1111 1111 1111.",
        "Can you book a meeting with Dr. Jane Doe at (212) 555-7890 on May 4th?",
        "The IP address 192.168.0.1 should be allowed in the firewall.",
        "Email my updated resume to recruiter@company.com.",
        "Her SSN is 123-45-6789 and passport is X1234567.",
    ]

    distiller_client = AsyncDistillerClient()
    distiller_client.create_project(config_path="pii_search_example.yaml", project="pii-demo")
    session_id = str(uuid.uuid4())

    await distiller_client.connect(
        project="pii-demo",
        uuid=session_id,
        executor_dict={"Search Agent": search_agent},
    )

    print("\n[PII Demo] Running Sample Queries\n")

    for i, query in enumerate(queries, 1):
        print(f"Query {i}:\nOriginal: {query}")
        try:
            responses = await distiller_client.query(query)
            async for response in responses:
                print(f"Masked Output:\n{response['content']}\n{'-'*50}")
        except Exception as e:
            print(f"[ERROR] Failed to process query {i}: {e}")
            print("-" * 50)

    await distiller_client.close()

def interactive():
    distiller_client = AsyncDistillerClient()
    distiller_client.create_project(config_path="pii_search_example.yaml", project="pii-demo")
    executor_dict = {"Search Agent": search_agent}
    distiller_client.interactive(
        project="pii-demo",
        uuid="some-uuid",
        executor_dict=cast(Dict[str, Union[Callable[..., Any], Dict[str, Callable[..., Any]]]], executor_dict),
    )

if __name__ == "__main__":
    print("\n[PII Demo] Sample Queries")
    asyncio.run(pii_demo())

    # To try live interaction, comment out the line above and uncomment the next lines:
    # print("\n[PII Demo] Interactive Mode")
    # interactive()

pii_search_example.yaml

orchestrator:
  agent_list:
    - agent_name: "Search Agent"

utility_agents:
  - agent_class: SearchAgent
    agent_name: "Search Agent"
    agent_description: "This agent receives a query with or without sensitive information already masked by the distiller client, performs searches and replies to user."
    config:
      output_style: "conversational"

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - PERSON
        - PHONE_NUMBER
        - EMAIL_ADDRESS
        - CREDIT_CARD
        - US_SSN
        - US_BANK_NUMBER
        - US_PASSPORT
        - LOCATION
        - DATE_TIME
        - IP_ADDRESS

      entity_operator_mapping:
        EMAIL_ADDRESS:
          operator: replace
          params: { new_value: "[EMAIL]" }
        PERSON:
          operator: replace
          params: { new_value: "[PERSON]" }
        PHONE_NUMBER:
          operator: replace
          params: { new_value: "[PHONE]" }
        CREDIT_CARD:
          operator: replace
          params: { new_value: "[CREDIT_CARD]" }
        US_SSN:
          operator: replace
          params: { new_value: "[US_SSN]" }
        US_BANK_NUMBER:
          operator: replace
          params: { new_value: "[US_BANK_NUMBER]" }
        US_PASSPORT:
          operator: replace
          params: { new_value: "[US_PASSPORT]" }
        LOCATION:
          operator: replace
          params: { new_value: "[LOCATION]" }
        DATE_TIME:
          operator: replace
          params: { new_value: "[DATE]" }
        IP_ADDRESS:
          operator: replace
          params: { new_value: "[IP]" }

For reference

Example Mode Client Used Purpose
pii_example.py Interactive DistillerClient Try queries manually
pii_search_example.py Scripted (or Interactive) AsyncDistillerClient Batch-test masking behavior across PII types + try queries manually with a more complex agent

Example Interaction

Input:

Hey, please call me at (212) 555-8124 and send the report to john.doe@company.com.

PII Identified:

[PII MASKING] Detected and masked the following PII types:
 - PHONE_NUMBER at [24:38] -> '(212) 555-8124' -> [PHONE_1]
 - EMAIL_ADDRESS at [67:89] -> 'john.doe@company.com' -> [EMAIL_1]

Masking byĀ PIIHandler.mask_text():

Hey, please call me at [PHONE_1] and send the report to [EMAIL_1].

Agent Output:

Processed query:
Hey, please call me at [PHONE_1] and send the report to [EMAIL_1].

Unmasked (frontend-only) Unmasked View:

Processed query:
Hey, please call me at (212) 555-8124 and send the report to john.doe@company.com.

This view isĀ reconstructed locally in-memoryĀ using metadata saved during masking. The demasking isĀ only available for the sessionĀ and is never persisted or sent to any backend.

Supported PII Types and Operators

Supported PII Types

The PII masking module leverages Microsoft Presidio to detect a broad range of commonly regulated or sensitive data types. All supported types must be explicitly listed in the YAML config underĀ common_entities.

Entity Type Placeholder Format Example Match Description
EMAIL_ADDRESS [EMAIL_1] john.doe@example.com Email addresses
PHONE_NUMBER [PHONE_1] (212) 555-8124 US or international phone numbers
PERSON [PERSON_1] Jane Doe First and last names
CREDIT_CARD [CREDIT_CARD_1] 4111 1111 1111 1111 Visa/Mastercard/Amex credit cards
US_SSN [US_SSN_1] 123-45-6789 U.S. Social Security Numbers
US_BANK_NUMBER [US_BANK_NUMBER_1] 987654321 U.S. bank account numbers
US_PASSPORT [US_PASSPORT_1] X1234567 U.S. passport numbers
LOCATION [LOCATION_1] 1600 Amphitheatre Parkway Physical address, city, state, ZIP
DATE_TIME [DATE_1] May 4th,Ā 01/01/2024 Absolute or relative dates and times
IP_ADDRESS [IP_1] 192.168.0.1,Ā 2001:db8::1 IPv4 and IPv6 addresses

To activate detection for a type, include it underĀ common_entitiesĀ in your YAML config. The defaultĀ pii_handler.yamland the examples already include all types above.

Supported PII Operators

Each entity type can be individually configured in the YAML using one of the supported operators below. You define the operator underĀ entity_operator_mapping.

replace

  • Replaces the original PII with a structured placeholder (e.g.,Ā [EMAIL_1])
  • Default behaviorĀ if not specified
EMAIL_ADDRESS:
  operator: replace
  params:
    new_value: "[EMAIL]"

redact

  • Completely removes the PII from the text (no placeholder left behind)
PHONE_NUMBER:
  operator: redact

Input:

Call me at (212) 555-8124

Masked:

Call me at

hash

  • Replaces the original PII with a hashed representation (irreversible)
US_SSN:
  operator: hash

Input:

SSN is 123-45-6789

Masked:

SSN is 7e7cf1d9dcd21e...

DEFAULTĀ Handler (Fallback)

To apply a global fallback to any undefined entity type, use theĀ DEFAULTĀ key:

DEFAULT:
  operator: replace
  params:
    new_value: "<PII>"

If Presidio detects an entity type not explicitly listed inĀ entity_operator_mapping, this operator will apply.

Advanced Customization

The PII Masking Module is highly flexible and allows you to tailor bothĀ which entities to detectĀ andĀ how to handle them. All customizations are centralized in the same YAML configuration file used for the agent orchestration (e.g.,Ā pii_example.yamlĀ orĀ pii_search_example.yaml), underĀ base_config.pii_masking.

Adding More Entities

If Presidio supports additional PII types (e.g.,Ā IBAN_CODE,Ā MEDICAL_LICENSE, or custom recognizers), you can extend your config:

base_config:
  pii_masking:
    enable: True
    config:
      common_entities:
        - IBAN_CODE
        - MEDICAL_LICENSE
        - PERSON

Make sure to also define masking behavior:

entity_operator_mapping:
  IBAN_CODE:
    operator: hash
  MEDICAL_LICENSE:
    operator: redact

You can find the full list of built-in entity types inĀ Presidio's documentation.

Defining Custom Operators or Placeholder Formats

You may redefine any placeholder format per entity by customizing theĀ new_value:

EMAIL_ADDRESS:
  operator: replace
  params:
    new_value: "<<email>>"

Or enable hashing for irreversible masking:

CREDIT_CARD:
  operator: hash

Or remove PII altogether (no placeholder shown):

LOCATION:
  operator: redact

Creating Multiple YAML Variants

You can maintain multiple config files (e.g.,Ā pii_example.yaml,Ā pii_search_example.yaml,Ā pii_strict.yaml) with different combinations of:

  • Enabled/disabled masking
  • Different entity sets
  • Operator schemes
  • Agent configurations

Then pass the desired YAML toĀ create_project(config_path=...)Ā when registering your project.

Use Case Matrix

Below is a guide to help you decide when to use PII masking and how to configure it:

Use Case Masking Enabled Recommended Operator Why This Matters
Production inference Yes replace Prevents raw PII from reaching logs, models, or monitoring agents
Internal debugging Optional — Devs can see original inputs for issue diagnosis
Compliance audits Yes replace,Ā hash Shows evidence of redaction while retaining traceability
External demo/showcases Yes replace Guarantees privacy-safe interactions during live sessions
QA & annotation tooling Optional replace,Ā redact Keep data semi-anonymized during human reviews
Analytics dashboards Yes replace,Ā redact Prevents PII leakage into metrics or reporting tools
Sensitive search indexing Yes hash,Ā redact Allows indexing without storing personal data