Generative AI Model Evals: A Primer For Compliance Officers

Ekene Chuks-Okeke
January 24, 2024

Traditional AI models are tested against established benchmarks using quantitative metrics. For example, AI systems used in employment selection may be evaluated for fairness using the four-fifths rule, an established test for measuring disparate impact. In contrast, common generative AI use cases like text generation, summarization, and code generation lack reliable quantitative metrics for measuring model performance. This presents a challenge in evaluating the quality and fairness of these AI systems. 

Our new article discusses the current testing and validation techniques, as well as the new benchmarks being developed for evaluating generative AI, with a focus on fairness and bias.

To read the full article at Law360, click here.