LiveBench.ai

Website: https://livebench.ai

LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

  • LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses.
  • Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge.
  • LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.