Spaces:

hlnicholls
/

nucleotide_transformer_benchmark

Sleeping

App Files Files Community

tpierrot commited on Sep 15, 2023

Commit

1e9588a

•

1 Parent(s): b119a49

updated text

Browse files

Files changed (1) hide show

app.py +17 -6

app.py CHANGED Viewed

@@ -47,13 +47,24 @@ _LAST_UPDATED = "Sept 15, 2023"
 banner_url = "./assets/logo.png"
 _BANNER = f'<div style="display: flex; justify-content: space-around;"><img src="{banner_url}" alt="Banner" style="width: 40vw; min-width: 300px; max-width: 600px;"> </div>'  # noqa
-_INTRODUCTION_TEXT = """The 🤗 Nucleotide Transformer Leaderboard aims to track,
-rank and evaluate DNA foundational models on a set of curated downstream tasks
-introduced in the huggingface dataset
-[nucleotide_transformer_downstream_tasks](https://huggingface.co/datasets/InstaDeepAI/nucleotide_transformer_downstream_tasks) ,
- with a standardized evaluation protocole presented in the "ℹ️ Methods" tab."""  # noqa
-_METHODS_TEXT = """We have compared the fine-tuned performance of Nucleotide Transformer models on the 18 downstream tasks with four different pre-trained models: [DNABERT-1](https://academic.oup.com/bioinformatics/article/37/15/2112/6128680), [DNABERT-2](https://arxiv.org/abs/2306.15006), [HyenaDNA](https://arxiv.org/abs/2306.15794) (1kb and 32kb context length) and the [Enformer](https://www.nature.com/articles/s41592-021-01252-x) (which was trained as a supervised model on several genomics tasks). We ported the architecture and trained weights of each model to our code framework and performed parameter-efficient fine-tuning for every model as described above, using the same cross-validation scheme for a fair comparison. All results can be visulaized in an interactive leader-board 2. Only for HyenaDNA we performed full fine-tuning due to the incompatibility of our parameter-efficient fine-tuning approach with the model architecture."""  # noqa
 def retrieve_array_from_text(text):

 banner_url = "./assets/logo.png"
 _BANNER = f'<div style="display: flex; justify-content: space-around;"><img src="{banner_url}" alt="Banner" style="width: 40vw; min-width: 300px; max-width: 600px;"> </div>'  # noqa
+_INTRODUCTION_TEXT = """The 🤗 Nucleotide Transformer Leaderboard aims to track, rank and evaluate DNA foundational models on a set of curated downstream tasks introduced in the huggingface dataset [nucleotide_transformer_downstream_tasks](https://huggingface.co/datasets/InstaDeepAI/nucleotide_transformer_downstream_tasks), with a standardized evaluation protocole presented in the "ℹ️ Methods" tab.\n\n
+This leaderboard has been designed to provide, to the best of our ability, fair and robust comparisons between models. If you have any question or concern regarding our methodology or if you would like another model to appear in that leaderboard, please reach out to [email protected] and [email protected]. While we may not be able to take into consideration all requests, the team will always do its best to ensure that benchmark stays as fair, relevant and up-to-date as possible.\n\n
+ """  # noqa
+_METHODS_TEXT = """
+This leaderboard uses the downstream tasks benchmark and evaluation methdology described in the Nucleotide Transformer paper. We fine-tune each model on each task using a ten-fold validation strategy. For each model and each task, we report the aggregation over the ten-folds for several metrics - the Matthew Correlation Coefficient (MCC), the macro f1-score (F1) and the accuracy (ACC). The Nucleotide Transformer, DNABert and Enformer models have been fine-tuned using the same parameter efficient fine-tuning technique (IA3) with the same set of hyper-parameters. Due to the different nature of their architecture, the HyenaDNA models have been fully-finetuned using the original code provided by the authors.
+\n\n
+PLease keep in mind that the Enformer has been originally trained in a supervised fashion to solve gene expression tasks. For the sake of benchmarking, we re-used the provided model torso as a pre-trained model for our benchmark, which is not the intended and recommended use of the original paper. Though we think this comparison is interesting to hilight the differences between self-supervised and supervised learning for pre-training and observe that the Enformer is a very competitive baseline even for tasks that differ from gene expression.
+\n\n
+For the sake of clarity the tasks being shown by default in this eladerboard are the human related tasks while the original Nucleotide Transformer paper show performance over both yeast and human related tasks. To obtain the same results as the one shown in the paper, please check all the tasks boxes above.
+\n\n
+Note also that the performance shown for some methods in that table may differ slightly from the one reported in the HyenaDNA and DNABert papers. This might come from the usage of different train and test splits as well as from our ten-fold systamtic evaluation.
+\n\n
+"""  # noqa
 def retrieve_array_from_text(text):