leaderboard should be more curated

#908
by ehartford - opened

why is there no meta-llama/Meta-Llama-3.1-405B

why is there no mistralai/Mistral-Large-Instruct-2407

these are incredibly important open weights models that ought to be in the leaderboard on day 1.

alozowski changed discussion title from leaderboard should be more curated to the leaderboard is missing large models
Open LLM Leaderboard org

Hi @ehartford ,

First, let me correct the title of this discussion to a more relevant to your question.

We are aware that we don't have meta-llama/Meta-Llama-3.1-405B and mistralai/Mistral-Large-Instruct-2407 currently, as well as other large models that requires evaluation in a multi-node mode.

We are working on adding Meta-Llama-3.1-405B to the Leaderboard, but in the meantime, if you want us to evaluate any other large model, please, open a discussion about it and wait for the community's reaction. If there's enough interest from the community, we're open to manually evaluating models that require more than one node! For example, we evaluated alpindale/WizardLM-2-8x22B thanks to this discussion

alozowski changed discussion status to closed
ehartford changed discussion title from the leaderboard is missing large models to leaderboard should be more curated

I meant the title not your interpretation.

The examples I gave happened to be large.

But the leaderboard should be curated.

405b should have been on the leaderboard on the day it was released.

Goes for any model released by Mistral Facebook Google etc, big or small

"we hear your suggestion and disregard it" is a fine response

Open LLM Leaderboard org

Thank you for your feedback. We understand your desire for a more curated approach focusing on models from key organisations like Meta, Mistral, and Google. We strive to add significant models as quickly as possible, but immediate addition on release day isn't always possible, especially for large models requiring extensive computational resources.

We balance various factors including community interest, resource availability, and technical complexity when prioritising model evaluations.

If there are specific models you believe should be prioritised, we encourage you to open discussions about them. Community feedback will help us to measure interest and allocate our resources effectively.

We appreciate your engagement with the Leaderboard and are committed to making it as useful and comprehensive as possible within our constraints.

Ok well this is a business vision and prioritization issue, not a resources issue, you certainly have the resources to run evals on the 1-2 top open weights models that come out each month. (Especially when they legit compete with OpenAI and Claude). If that's not important enough to warrant prioritized evaluation then the community frankly needs a new leaderboard.

You don't need to write me back I'm just venting at this point.

I will go ahead and create a new leaderboard that implements exactly yours but actually evals all the new hot models as they come out rather than months later. @clem

@ehartford The Leaderboard is curated:

Screenshot_20240830-214941_Mull.png

You're just not the curator,

We are,

Regards,

  • Everybody voting on the Leaderboard, also known as: The Community

(including me)

So save yourself some compute for another leaderboard, the community can decide on their own.

great comeback, that'll teach me. Cheers.

Sign up or log in to comment