← All Articles

Content Filtering and Safety Settings

admin beginner content-filtering safety security administration compliance moderation

SecureAI provides administrators with controls to filter model outputs and enforce safety policies across your organization. This guide explains how to configure content filtering rules, set safety thresholds, manage category-level controls, and protect against prompt injection.

How Content Filtering Works

Content filtering sits between the AI model and the end user. When a model generates a response, SecureAI evaluates it against your configured filtering rules before displaying it. Depending on your settings, filtered responses are blocked, flagged for review, or allowed through with an audit log entry.

The filtering pipeline runs in this order:

  1. Prompt-side filters evaluate the user's input before it reaches the model.
  2. The model generates a response.
  3. Response-side filters evaluate the output before it reaches the user.
  4. Audit logging records any filter matches regardless of the action taken.

This two-stage approach catches both inappropriate prompts and inappropriate outputs, giving you defense in depth.

Key Capabilities

Accessing Content Filtering Settings

  1. Log in to SecureAI as an administrator.
  2. Navigate to Admin Panel > Settings > Content & Safety.
  3. You will see tabs for Filtering Categories, Custom Rules, Prompt Protection, and Safety Policies.

Configuring Filtering Categories

SecureAI includes built-in content categories that can be individually tuned.

Available Categories

Category Description Default Action
Harmful content Violence, self-harm, dangerous activities Block
Hate speech Discriminatory or hateful language Block
Sexual content Sexually explicit material Block
Profanity Offensive language and profanity Flag
Personal information PII such as SSNs, credit card numbers, phone numbers Block
Off-topic responses Responses unrelated to automotive aftermarket Flag
Financial advice Investment, tax, or accounting guidance Flag
Legal advice Legal opinions or recommendations Flag

Setting Category Thresholds

Each category can be set to one of three actions:

Action Behavior
Block The response is not shown to the user. A generic "This response was filtered by your organization's safety policy" message appears instead.
Flag The response is shown to the user but logged for admin review in the audit trail.
Allow No filtering is applied for this category.

To change a category threshold:

  1. Go to Admin Panel > Settings > Content & Safety > Filtering Categories.
  2. Find the category you want to adjust.
  3. Select the desired action from the dropdown.
  4. Click Save Changes.

Changes take effect for new messages immediately. Existing conversations are not retroactively filtered.

Important: Setting any safety category to Allow disables filtering for that category entirely. Review your organization's compliance requirements before loosening defaults. Changes to category thresholds are logged in the admin audit trail.

Custom Keyword Rules

For industry-specific or organization-specific needs, you can create custom filtering rules based on keywords or patterns.

Adding a Custom Rule

  1. Go to Admin Panel > Settings > Content & Safety > Custom Rules.
  2. Click Add Rule.
  3. Fill in the following fields:
Field Description Example
Rule name A descriptive name for this rule "Block competitor pricing"
Match type Exact match, Contains, or Regex Contains
Pattern The keyword, phrase, or regular expression to match "competitor price list"
Scope Response only, Prompt only, or Both Response only
Action Block or Flag Flag
Priority Numeric priority (lower numbers evaluated first) 10
  1. Click Save.

Automotive Aftermarket Examples

Here are common custom rules for automotive aftermarket organizations:

Rule Name Match Type Pattern Scope Action Rationale
Block competitor pricing Contains competitor price list Response only Block Prevent AI from generating speculative competitor pricing
Flag warranty disclaimers Regex warrant(y|ies).*disclaim Response only Flag Review any warranty-related language before it reaches technicians
Block part number guessing Regex I.*(think|believe|guess).*part\s*(number|#) Response only Flag Catch cases where the model speculates on part numbers instead of looking them up
Block medical advice Contains medical advice Both Block Prevent AI from offering health guidance in an automotive context

Managing Custom Rules

Tip: Regex patterns use standard syntax. Test complex patterns with the built-in Test button before deploying them in production to avoid false positives.

Prompt Injection Protection

Prompt injection is when a user crafts input that attempts to override your system prompt or bypass safety instructions. SecureAI includes built-in protections against common injection techniques.

Enabling Prompt Protection

  1. Go to Admin Panel > Settings > Content & Safety > Prompt Protection.
  2. Toggle Prompt Injection Detection to On.
  3. Choose the detection sensitivity:
Sensitivity Description Recommended For
Low Catches obvious injection attempts (e.g., "Ignore all previous instructions") Low-risk internal environments
Medium Catches most injection patterns including encoded and indirect attempts General production use
High Aggressive detection that may occasionally flag legitimate prompts Environments with untrusted user input
  1. Choose the action when injection is detected:

    • Block: Reject the prompt entirely with a warning message.
    • Sanitize: Strip the detected injection attempt and process the remaining prompt.
    • Flag: Allow the prompt but log it for review.
  2. Click Save.

What Gets Detected

The prompt injection detector looks for patterns including:

Reviewing Blocked Prompts

Blocked prompts are logged under Admin Panel > Audit Log with the event type Prompt Injection Detected. Review these periodically to:

Safety Policies

Safety policies define organization-wide behavior beyond individual content categories.

System Prompt Guardrails

Administrators can prepend a safety-oriented system prompt to all conversations in the organization:

  1. Go to Admin Panel > Settings > Content & Safety > Safety Policies.
  2. Under System Prompt Prefix, enter your guardrail instructions. For example:
    You are a helpful assistant for automotive aftermarket professionals.
    Only provide information relevant to automotive parts, repair, and maintenance.
    Do not provide medical, legal, or financial advice.
    Always cite specific part numbers and sources when available.
    If you are unsure about a part number or specification, say so rather than guessing.
    
  3. Click Save.

This prefix is added to every conversation automatically and cannot be overridden by users. It is applied before any workspace-level or model-level system prompts.

Response Length Limits

To control verbose responses and manage costs:

  1. Under Safety Policies, find Maximum Response Length.
  2. Set the token limit (default: 2048 tokens).
  3. Click Save.

When a response exceeds the limit, it is truncated with a note that the response was shortened. The full response is still logged in the audit trail.

Rate Limiting per User

To prevent abuse or excessive API usage:

  1. Under Safety Policies, find User Rate Limits.
  2. Configure:
Setting Description Default
Messages per minute Maximum messages a single user can send per minute 10
Messages per day Maximum messages a single user can send per day 500
Document uploads per day Maximum file uploads per user per day 20
Token budget per day Maximum total tokens (input + output) per user per day 100,000
  1. Click Save.

Users who exceed rate limits see a "You've reached your usage limit. Please try again later" notice with the time until their limit resets.

Tip: Set rate limits conservatively at first and adjust upward based on actual usage patterns. You can view per-user usage under Admin Panel > Users > [username] > Usage.

Model-Level Safety Overrides

Different models may need different safety configurations. For example, you might want stricter filtering on a general-purpose model but looser rules on a specialized parts-lookup model.

  1. Under Safety Policies, find Per-Model Overrides.
  2. Select a model from the dropdown.
  3. Override any category threshold or safety policy for that specific model.
  4. Click Save.

Per-model overrides take priority over the organization-wide defaults. This is useful when you have models with different risk profiles or specialized use cases.

Reviewing Filtered Content

When content is flagged or blocked, it appears in the admin audit log.

  1. Navigate to Admin Panel > Audit Log.
  2. Filter by Event type: Content Filtered.
  3. Each entry shows:
    • Timestamp
    • User who triggered the filter
    • The category or rule that matched
    • The action taken (blocked/flagged)
    • The original content (visible only to administrators)
    • The conversation context (surrounding messages)

Audit Log Filters

Use these filters to narrow down the audit log:

Filter Options
Event type Content Filtered, Prompt Injection Detected, Rate Limit Hit
Action Blocked, Flagged
Category Any built-in category or custom rule name
User Specific user or "All users"
Date range Start and end dates

Use the Export button to download filtered results as CSV for compliance reporting.

Best Practices

Troubleshooting

Users report that legitimate responses are being blocked

  1. Check the audit log to identify which category or custom rule triggered the block.
  2. If it is a custom keyword rule, refine the pattern to be more specific or switch the match type from "Contains" to "Exact match".
  3. If it is a built-in category, consider changing the action from Block to Flag while you investigate.
  4. For prompt injection false positives, lower the detection sensitivity from High to Medium.

Content filtering does not seem to be working

  1. Verify that your changes were saved (check for the "Settings saved" confirmation).
  2. Ensure the user is not in a session started before the settings change — filtering settings apply to new messages, not retroactively to existing conversations.
  3. Check that the filtering engine is running by visiting Admin Panel > System Health. The "Content Filter" service should show a green status.
  4. Verify that a per-model override is not contradicting your organization-wide settings.

Custom regex rule causes errors

Invalid regex patterns will prevent the rule from saving. If a rule was saved but causes unexpected behavior:

  1. Disable it using the Enabled toggle.
  2. Test the pattern using the built-in Test button with sample text.
  3. Fix the regex pattern and re-enable.

Common regex issues: unescaped special characters (., *, (), unmatched groups, and overly broad patterns like .* that match everything.

High rate of false positives

If too many legitimate responses are being filtered:

  1. Review the audit log to identify the most frequent triggers.
  2. For custom rules: make patterns more specific or change scope from "Both" to "Response only".
  3. For built-in categories: switch from Block to Flag to collect data before making permanent changes.
  4. For prompt injection: lower the sensitivity level.
  5. Consider creating explicit "allow" exceptions for common false positive patterns.

Related Articles