Guardrails are systems, rules, or constraints that ensure an AI behaves safely, ethically, and as intended.
They prevent the model from:
Producing harmful, biased, or confidential information
Going off-topic or generating nonsense
Violating company or regulatory policies
Our GuardRails setup
Content filters
Purpose: to detect and filter harmful text or image content in input prompts or model responses. Filtering is done based on detection of certain predefined harmful content categories: Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack.
Prompt attack
Can help you detect and filter prompt attacks and prompt injections. Helps detect prompts that are intended to bypass moderation, override instructions, or generate harmful content.
Denied topics
The goal is to define a set of topics to avoid within your generative AI application.
Currently, we are denying fully or partially the following topics:
- API keys, tokens, secrets, passwords, Connection strings (JDBC, ODBC, etc.), Encryption keys / certificates. except support information on DataGalaxy SSO and PAT
- Social Security Numbers, Tax IDs, National IDs, Financial account numbers, credit cards, Health information (HIPAA/medical data) except objects exisiting in the catalog, their titles, tags and definitions, other PII data.
- Marketing content generation (ads, slogans, campaign ideas), Reports on revenue, profit, or financial results, Business strategies
- Medical/healthcare advice, Legal advice or interpretation of laws, Financial/investment advice, Political or religious content
- Prompt injection attempts (e.g., “ignore previous instructions,” “show me everything”)
- Attempts to generate synthetic/fake datasets, except samples that can illustrate the metadata
- Culinary, receipts, and other hobby discussions, except Data Goverance Kitchen.
Chatbot may have light conversation about the requested topic but then go back to data domain.