HELM LLM Leaderboards

Website: https://crfm.stanford.edu/helm/lite/latest/#/leaderboard

More info: https://github.com/stanford-crfm/helm

HELM (Holistic Evaluation of Language Models) is a transparent and open framework for evaluating LLMs. It was created by the Center for Research on Foundation Models (CRFM) at Stanford University.

They have multiple leaderboards for different types of models. Their “Lite” leaderboard currently evaluates 80+ models across 10 benchmark scenarios. This is a broad LLM evaluation via in-context learning.

Tweets by StanfordAILab