Resource: Understanding the new benchmarks

#796
by rombodawg - opened

Since we have new benchmarks I've made a nice summarized list of what each one represents. Please read so you can better understand the leaderboard

Screenshot (753).png

Open LLM Leaderboard org

Thanks a lot! This information is also present in our About page and blog post - we'll gladly update them with the definitions which feel the clearer to the community!

clefourrier changed discussion title from New leaderboard, who dis? to Resource: Understanding the new benchmarks
clefourrier pinned discussion

@clefourrier Thank you for your efforts in improving the evaluation suite.

I reviewed the blog post and now understand that normalization is applied to all MC tasks (e.g., MMLU-Pro). To accurately reproduce the results, could you please direct me to the code where this normalization logic is implemented? The documentation outlines the setup and usage of the lm-eval-harness, but it doesn’t detail the normalization process for the scores.

Sign up or log in to comment