leaderboard-pr-bot commited on
Commit
45a22de
1 Parent(s): 1e5c6fa

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -4
README.md CHANGED
@@ -1,16 +1,110 @@
1
  ---
2
  language:
3
  - en
4
- library_name: transformers
5
  license: apache-2.0
 
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
11
- thumbnail: >-
12
- https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
13
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
 
@@ -204,4 +298,17 @@ Please read this disclaimer carefully before using the large language model prov
204
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
205
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
206
 
207
- By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
11
+ thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
 
12
  pipeline_tag: text-generation
13
+ model-index:
14
+ - name: h2o-danube3-4b-chat
15
+ results:
16
+ - task:
17
+ type: text-generation
18
+ name: Text Generation
19
+ dataset:
20
+ name: IFEval (0-Shot)
21
+ type: HuggingFaceH4/ifeval
22
+ args:
23
+ num_few_shot: 0
24
+ metrics:
25
+ - type: inst_level_strict_acc and prompt_level_strict_acc
26
+ value: 36.29
27
+ name: strict accuracy
28
+ source:
29
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
30
+ name: Open LLM Leaderboard
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: BBH (3-Shot)
36
+ type: BBH
37
+ args:
38
+ num_few_shot: 3
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 8.84
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: MATH Lvl 5 (4-Shot)
51
+ type: hendrycks/competition_math
52
+ args:
53
+ num_few_shot: 4
54
+ metrics:
55
+ - type: exact_match
56
+ value: 2.79
57
+ name: exact match
58
+ source:
59
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: GPQA (0-shot)
66
+ type: Idavidrein/gpqa
67
+ args:
68
+ num_few_shot: 0
69
+ metrics:
70
+ - type: acc_norm
71
+ value: 1.34
72
+ name: acc_norm
73
+ source:
74
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: MuSR (0-shot)
81
+ type: TAUR-Lab/MuSR
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: acc_norm
86
+ value: 5.23
87
+ name: acc_norm
88
+ source:
89
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: MMLU-PRO (5-shot)
96
+ type: TIGER-Lab/MMLU-Pro
97
+ config: main
98
+ split: test
99
+ args:
100
+ num_few_shot: 5
101
+ metrics:
102
+ - type: acc
103
+ value: 13.65
104
+ name: accuracy
105
+ source:
106
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=h2oai/h2o-danube3-4b-chat
107
+ name: Open LLM Leaderboard
108
  ---
109
 
110
 
 
298
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
299
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
300
 
301
+ By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
302
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
303
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_h2oai__h2o-danube3-4b-chat)
304
+
305
+ | Metric |Value|
306
+ |-------------------|----:|
307
+ |Avg. |11.36|
308
+ |IFEval (0-Shot) |36.29|
309
+ |BBH (3-Shot) | 8.84|
310
+ |MATH Lvl 5 (4-Shot)| 2.79|
311
+ |GPQA (0-shot) | 1.34|
312
+ |MuSR (0-shot) | 5.23|
313
+ |MMLU-PRO (5-shot) |13.65|
314
+