Anthropic is Launching a New Program to Fund AI Benchmarks

Anthropic, leading startup in Generative AI hub, announced a new plan to transform the assessment of AI systems into what is considered a milestone in the industry. This program aims at funding for perhaps better and more exploitable AI performance benchmarks that will be used to measure AI performance and influence as critics have voiced their concerns regarding adequacy of existing tools.

Improvement of AI Benchmarks

Previous evaluations of AI systems, including the MLPerf evaluations, have been criticized for being limited and not capturing all the aspects of modern-day use cases of AI. These benchmarks often focus on singular tasks, which is quite unrepresentative of the complex abilities of highly developed AI systems such as Anthropic’s Claude and OpenAI’s ChatGPT. Furthermore, there are huge gaps in benchmarks that would look at how unsafe features might be latent in AI or measure its capacity to mislead or trick, to carry out cyberattacks or to perpetuate bias.

Anthropic's New Initiative

Anthropic’s new program of AI development is intended to overcome these problems by encouraging researchers to come up with more innovative benchmarks to help assess specific skills of AI models, which are considered to be safer and more sophisticated. The company has said that it would give out grants to independent outfits with suggested ways of capturing these facets. This initiative suggests that all existing methodologies should be changed; the emphasis is made on creating standards, which will reflect the possible negative impact and the security threats of AI on society.

The company is particularly interested in benchmarks that can:

  • Evaluate the potential AI model holds to participate in risky actions, for example, cyberterrorism or disinformation.

  • Evaluate the effectiveness of its uses in scientific research, its potential to avoid bias, and its capability to censor toxic information.

  • Assess the effectiveness of AI in multiple language interactions and its use across various fields and domains.

Also Read: Anthropic Unveils Breakthrough AI, Challenging OpenAI’s GPT-4

Implementation and Future Prospects

For this purpose, Anthropic has employed a full-time program coordinator and aims at providing diverse funding modalities depending on the various needs as well as development phases of the projects. Besides funding, researchers will get a chance to work together with Anthropic’s domain specialists, including the red team, fine-tuning, and the trust and safety team. The objective of this partnership is to guarantee that the benchmarks that are being designed do not deviate from the company’s safety measures on the use of Artificial Intelligence.

Anthropic is also willing to directly fund or acquire the most promising projects that would come out from this undertaking, which is a clear indication of Anthropic’s sustained efforts towards improving the benchmark of Artificial Intelligence.

Industry Impact and Skepticism

Overall, Anthropic’s initiative can be regarded as a good start to improve the evaluation of AI more generally, but it is not free from skepticism. Some critics point to an Associated Press article which says that benchmark creation can be self-serving, hence opening the possibility that future benchmarks will be devised with the specific aim of only yielding positive outcomes for Anthropic’s models. To counter these concerns, the company has stressed that third-party researchers would be used, as well as the general AI safety classification.

Furthermore, Anthropic is not the only organization that engages in such an initiative. Other AI startups, for instance, Sierra Technologies, are calling for the creation of better evaluation metrics for the AI agents working on complex tasks. This collective effort highlights the industry’s desire for more accurate and extensive AI assessment techniques.


According to the new Anthropic program to fund the new benchmarks creation, the current practice of AI models' evaluation is critically weak, and the new program may change this situation for the better dramatically.

Anthropic can contribute to improving AI safety and performance assessment by pointedly calling the current popular benchmarks’ inadequacies and encouraging researchers to create new and substantially broader methods. What is more, this initiative will help not only to increase the understanding of AI capabilities and create the positive reception of similar technologies but also provide other companies with the positive example of the responsible approach to the development of the AI technologies.

Meet the Author


Sanjeev Verma

Sanjeev Verma, the CEO of Biz4Group LLC, is a visionary leader passionate about leveraging technology for societal betterment. With a human-centric approach, he pioneers innovative solutions, transforming businesses through AI Development, IoT Development, eCommerce Development, and digital transformation. Sanjeev fosters a culture of growth, driving Biz4Group's mission toward technological excellence. He’s been a featured author on IBM and TechTarget.

Linkedin -

Providing Disruptive Business Solutions for Your Enterprise

Get Free Consultation From Our Technical Experts