RaphaelMourad
/

Mistral-DNA-v1-17M-hg38

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

RaphaelMourad commited on May 29

Commit

c1f959a

•

1 Parent(s): 9456a46

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -6,11 +6,11 @@ tags:
 - protein
 ---
-# Model Card for Mistral-Prot-v1-15M (Mistral for protein)
-The Mistral-Prot-v1-15M Large Language Model (LLM) is a pretrained generative protein molecule model with 15.2M parameters.
 It is derived from Mixtral-8x7B-v0.1 model, which was simplified for protein: the number of layers and the hidden size were reduced.
-The model was pretrained using 1M protein strings from the uniprot 50 database.
 ## Model Architecture
@@ -26,14 +26,14 @@ Like Mixtral-8x7B-v0.1, it is a transformer model, with the following architectu
 import torch
 from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("RaphaelMourad/Mistral-Prot-v1-15M", trust_remote_code=True)
-model = AutoModel.from_pretrained("RaphaelMourad/Mistral-Prot-v1-15M", trust_remote_code=True)
 ```
 ## Calculate the embedding of a protein sequence
 ```
-insulin = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
 inputs = tokenizer(insulin, return_tensors = 'pt')["input_ids"]
 hidden_states = model(inputs)[0] # [1, sequence_length, 256]
@@ -48,7 +48,7 @@ Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer.
 ## Notice
-Mistral-Prot-v1-15M is a pretrained base model for protein.
 ## Contact

 - protein
 ---
+# Model Card for Mistral-DNA-v1-17M-hg38 (Mistral for protein)
+The Mistral-DNA-v1-17M-hg38 Large Language Model (LLM) is a pretrained generative DNA sequence model with 16.8M parameters.
 It is derived from Mixtral-8x7B-v0.1 model, which was simplified for protein: the number of layers and the hidden size were reduced.
+The model was pretrained using 10kb DNA sequences from the hg38 human genome assembly.
 ## Model Architecture
 import torch
 from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("RaphaelMourad/Mistral-DNA-v1-17M-hg38", trust_remote_code=True)
+model = AutoModel.from_pretrained("RaphaelMourad/Mistral-DNA-v1-17M-hg38", trust_remote_code=True)
 ```
 ## Calculate the embedding of a protein sequence
 ```
+insulin = "TGATGATTGGCGCGGCTAGGATCGGCT"
 inputs = tokenizer(insulin, return_tensors = 'pt')["input_ids"]
 hidden_states = model(inputs)[0] # [1, sequence_length, 256]
 ## Notice
+Mistral-DNA-v1-17M-hg38 is a pretrained base model for DNA.
 ## Contact