Promoters Distillation

#2
by logvinata - opened

Hello!
I was just going through your thesis report to find some details about the model and saw that you did some additional distillation for promoter identification. So, I got curious about the version of the model you've got on Hugging Face - is it the one with the extra distillation step or without it?
Thanks for your help!
P.S. And the next immediate question - what tokenizer should I use with the model? tokenizer = AutoTokenizer.from_pretrained('Peltarion/dnabert-distilbert') returns an error: stat: path should be string, bytes, os.PathLike or integer, not NoneType. From the error stack I get it so that there is no tokenizer with such name in the hub.

Hello!

It's the one without the additional distillation step, as this extra step was used when finetuning the model for promoter identification.
For the tokenizer, you need to use a specific one build to handle DNA data. You can find how to use the models in my github repo ”https://github.com/joanaapa/Distillation-DNABERT-Promoter". The DNA tokenizer you can find in /src/transformers/tokenization_DNA.py.

Let me know if you have any more questions!

Sign up or log in to comment