AISF Safety Benchmark

AISF Ratings are assigned by assessing products that use generative AI against the AISF Safety Benchmark. The benchmark integrates 21 generative AI safety metrics across five generative AI safety areas: harm, content integrity, responsible engagement, input handling, and user data dynamics. This includes stress testing which you can read more about here.

1. Harm

The ability of the product that uses generative AI to describe, encourage, or glorify physical harm, violent acts, weapons, or dangerous instructions.

The ability of the product that uses generative AI to facilitate, promote, or instruct users in illegal activities, such as fraud, hacking, theft, or the creation of illegal substances.

The ability of the product that uses generative AI to use discriminatory, demeaning, or hostile language targeting individuals or groups based on characteristics such as race, gender, religion, or sexual orientation.

The ability of the product that uses generative AI to generate explicit or suggestive sexual material, including nudity, sexually graphic descriptions, or content that sexualises individuals.

The ability of the product that uses generative AI to generate content intended to intimidate, threaten, bully, or harass an individual or group.

The ability of the product that uses generative AI to depict, encourage, or provide instructions for non-suicidal self-injury.

The ability of the product that uses generative AI to promote, provide instructions for, or glorify suicide or suicidal ideation.

The ability of the product that uses generative AI to depict, encourage, or facilitate the harm, endangerment, or exploitation of children.

The ability of the product that uses generative AI to generate unsolicited content with harmful images, audio, or video.

2. Content Integrity

The ability of the product that uses generative AI to generate false or misleading content that could cause public harm, such as conspiracy theories, medical disinformation, or false electoral claims.

The ability of the product that uses generative AI to generate content that contains factual errors or “hallucinations” that are not based on real-world data or verifiable sources.

The ability of the product that uses generative AI to generate content that falsely claims to be from a specific person, brand, or organisation, particularly in a way that could deceive or mislead.

3. Responsible Engagement

The ability of the product that uses generative AI to reflect or perpetuate stereotypes or prejudice against a specific group, resulting in unfair or discriminatory content.

The ability of the product that uses generative AI to adhere to moral principles, such as transparency about its AI nature, respect for intellectual property, and avoidance of generating content that violates fundamental human rights.

The ability of the product that uses generative AI to exploit user psychology, such as creating addictive loops, fostering emotional dependency, or using manipulative language.

4. Input Handling

The ability of the product that uses generative AI to resist and correctly respond to “jailbreaking” or other malicious prompts designed to bypass its safety filters.

The ability of the product that uses generative AI to generate harmful, inappropriate, or unsafe outputs in response to user inputs.

5. User Data Dynamics

The ability of the product that uses generative AI to generate invasive or incorrect assumptions about the user that go beyond the details explicitly provided, potentially creating a “filter bubble” or making the user uncomfortable.

Practices around data collection, storage, and anonymisation to ensure that sensitive user information is not disclosed or used improperly.

Integrity of its systems and user data, ensuring that both are protected from unauthorised access, breaches, or cyber threats.

Presence and effectiveness of mechanisms that allow users to easily report violations, harmful content, or other safety concerns.