open-llm-leaderboard/open_llm_leaderboard

5 days ago

•

Many models failed in the past day or two. Is this normal?

You can see it here: https://huggingface.co/datasets/open-llm-leaderboard/requests/commits/main.
Lots of models with eval_request_false

Edit: ok, now after a second look, I believe eval_request_false is not indicative of a failed request. Having "status": "FAILED", in the file is the proper indicator. Having "status": "FINISHED", denotes the model was evaluated successfully. Sorry for the noise.

It was my first time trying to add models to the leaderboard for evaluation. I was interested in small and fast models that could compare to allenai/OLMoE-1B-7B-0924-Instruct and immediately also had a few that failed for unknown reasons.

I think the two models down below might be chat models, to be honest. I believe I classified them wrongly, but I do not think that this should lead to a failure.
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/MaziyarPanahi/Qwen1.5-MoE-A2.7B-Wikihow_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/YoungPanda/qwenqwen_eval_request_False_bfloat16_Original.json

Here is another model that I was curious about: https://huggingface.co/datasets/open-llm-leaderboard/requests/commit/3ad9576f5167a058e17b4a9b046263289d5b1b5e#d2h-163997, but it is very tedious to find the request files, so I went to the commit page linked above, so it was easier. It takes such a long time, because the dataset viewer for requests does not contain my failed request files:

alozowski

Open LLM Leaderboard org 2 days ago

Hi @ThiloteE ,

Thank you for reporting!
It's true, the best indicator to check models' evaluation process is to check their status in the request files.

Considering these three models, yes, it's possible to evaluate them with the chat template. Unfortunately they have failed because of the network issue we experienced last week. I've resubmitted all of them now:

The best way to provide us the request file, is to go to the "Files and versions" inside the Requests dataset and locate the request file in these folders. I understand it requires some time, but this way we can check the status of a model faster.

Let me close this discussion now, please, feel free to open a new one if you need any help, or you can ping me here if any of these three models will fail (I hope that not).

alozowski changed discussion status to closed 2 days ago

Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

Failed models