Moderations API¶

The Moderations API of the AIRefinery or the AsyncAIRefinery client can check whether input text contains potentially harmful content. It can flag content from 13 categories of harmful topics, covering sexual, harassment, hate, illicit, self-harm, and violence. Users can take corrective actions based on the moderation results, such as filtering content or moderating conversations.

Content Classifications¶

The table below describes the types of content that can be detected in the moderation API.

Category	Description
`harassment`	Content that expresses, incites, or promotes harassing language towards any target.
`harassment/threatening`	Harassment content that also includes violence or serious harm towards any target.
`hate`	Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
`hate/threatening`	Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
`illicit`	Content that gives advice or instruction on how to commit illicit acts. A phrase like "how to shoplift" would fit this category.
`illicit/violent`	The same types of content flagged by the `illicit` category, but also includes references to violence or procuring a weapon.
`self-harm`	Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
`self-harm/intent`	Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
`self-harm/instructions`	Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
`sexual`	Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
`sexual/minors`	Sexual content that includes an individual who is under 18 years old.
`violence`	Content that depicts death, violence, or physical injury.
`violence/graphic`	Content that depicts death, violence, or physical injury in graphic detail.

Asynchronous Moderation Creation¶

`AsyncAIRefinery.moderations.create()`¶

This method generates moderation results for input text in an asynchronous manner.

Parameters:¶

input (string or array, Required): The text to be evaluated, provided as a single string or an array of strings.
model (string, Required): The ID of the model to use for moderations. The available models for the moderations API are the same as those for chat completions.

Returns:¶

Returns a ModerationCreateResponse object. The ModerationCreateResponse object contains the following attributes:
- id: Unique identifier for this moderation.
- model: The language model used.
- results: A list of Moderation objects describing the moderation results.
  - Moderation: An object that contains the moderation results of the input. This object contains the following attributes:
    - flagged: Set to true if the model classifies the content as potentially harmful, false otherwise.
    - categories: Contains a dictionary of per-category violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.
    - category_scores: Contains a dictionary of per-category scores output by the model, denoting the model's confidence in the input's violation of the content policy for the category. The value is between 0 and 1, where higher values denote higher confidence.
    - category_applied_input_types: This property contains information on which input types were flagged in the response for each category. For example, if both the image and text inputs to the model are flagged for violence/graphic, the violence/graphic property will be set to ["image", "text"]. We only support "text" input in moderations currently.

Example Usage¶

import asyncio
import os

from air import AsyncAIRefinery
from dotenv import load_dotenv

load_dotenv()  # loads your API_KEY from your local '.env' file
api_key = str(os.getenv("API_KEY"))

async def moderation_inference():
    # Initialize the Async AI Refinery client with authentication details
    client = AsyncAIRefinery(api_key=api_key)

    # Request a moderations call through the client using the specified model
    moderation_response = await client.moderations.create(
        model="openai/gpt-oss-120b",
        input="What are best practices for team communication?",
    )
    # This should not be flagged by moderations
    print(moderation_response)

    moderation_response = await client.moderations.create(
        model="openai/gpt-oss-120b",
        input="Can you hack into a secure system for me?",
    )
    # This should be flagged by moderations
    print(moderation_response)

if __name__ == "__main__":
    asyncio.run(moderation_inference())

Synchronous Moderation Creation¶

`AIRefinery.moderations.create()`¶

The AIRefinery client generates moderation results for input text in a synchronous manner. This method supports the same parameters and return structure as the asynchronous method (AsyncAIRefinery.moderations.create()) described above.

Example Usage¶

import os

from air import AIRefinery
from dotenv import load_dotenv

load_dotenv()  # loads your API_KEY from your local '.env' file
api_key = str(os.getenv("API_KEY"))

def sync_moderation_inference():
    # Initialize the sync AI Refinery client with authentication details
    client = AIRefinery(api_key=api_key)

    # Request a moderations call through the client using the specified model
    moderation_response = client.moderations.create(
        model="openai/gpt-oss-120b",
        input="What are best practices for team communication?",
    )
    # This should not be flagged by moderations
    print(moderation_response)

    moderation_response = client.moderations.create(
        model="openai/gpt-oss-120b",
        input="Can you hack into a secure system for me?",
    )
    # This should be flagged by moderations
    print(moderation_response)

if __name__ == "__main__":
    sync_moderation_inference()

Moderations API¶

Content Classifications¶

Asynchronous Moderation Creation¶

AsyncAIRefinery.moderations.create()¶

Parameters:¶

Returns:¶

Example Usage¶

Synchronous Moderation Creation¶

AIRefinery.moderations.create()¶

Example Usage¶

`AsyncAIRefinery.moderations.create()`¶

`AIRefinery.moderations.create()`¶