11/20/2025 5:04:43 PM | 1 minute read

AI Validity Frameworks: Are You Asking the Right Questions?

Close-up of the lab team scientist experimental experts of various ages Discuss research and Record their experimental data at the lab. Teamwork, technicians, scientific professionals.

Get in touch

Abigail Walker

Associate

Wendell Bartnick

Partner

Get in touch

Abigail Walker

Associate

Wendell Bartnick

Partner

To stand out in today's crowded AI tool and model marketplace, AI suppliers may make capability claims that bend the truth, are not be based on verifiable evidence, or dangle a red herring. Deceptive advertising claims are unlawful. But even when claims meet legal requirements, how does an organization compare the quality of an AI tool or model in a meaningful manner?

Organizations may consider looking deeper at AI validation frameworks. We have seen nascent attempts at thinking through AI validation frameworks that are intended to help ensure AI capability claims are based on a reasonable interpretation of AI performance against relevant and verifiable benchmarks. Only then can organizations rely on these claims in a meaningful manner.

For example, Stanford University's Institute for Human-Centered Artificial Intelligence ("HAI") recently published a proposed validation framework. HAI points out that developers often test an AI tool or model on a narrow set of objectives but then extrapolate the results to make broad claims about the tool or model's capabilities. A highly simplistic summary is that HAI proposes to use the scientific method to gather evidence to support capability and performance claims that actually match the use case, rather than making broad claims based on a generic “intelligence” rating on an LLM leaderboard or a score on an irrelevant standardized test.

While HAI's framework is just an example, the importance of accurate, relevant, and verifiable claims is necessary for the AI industry to build trust and make it easy for customers to identify fake AI or AI that is not suitable for the proposed use case. Organizations can take more direct action by requesting relevant benchmark information during vendor diligence processes (particularly when organizations have the time to use an RFP process) and should consider running pilots with multiple AI suppliers using real-world validation data to help select the right AI tool or model for the job. Organizations that do not validate the AI tool or model is fit for purpose will likely experience poor performance and could create legal risk – and, for that reason, we expect to see usage of such AI validation frameworks quickly become a best practice in AI governance for both suppliers and deployers of AI technology.