RaphaelMourad commited on
Commit
c1f959a
1 Parent(s): 9456a46

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -6,11 +6,11 @@ tags:
6
  - protein
7
  ---
8
 
9
- # Model Card for Mistral-Prot-v1-15M (Mistral for protein)
10
 
11
- The Mistral-Prot-v1-15M Large Language Model (LLM) is a pretrained generative protein molecule model with 15.2M parameters.
12
  It is derived from Mixtral-8x7B-v0.1 model, which was simplified for protein: the number of layers and the hidden size were reduced.
13
- The model was pretrained using 1M protein strings from the uniprot 50 database.
14
 
15
  ## Model Architecture
16
 
@@ -26,14 +26,14 @@ Like Mixtral-8x7B-v0.1, it is a transformer model, with the following architectu
26
  import torch
27
  from transformers import AutoTokenizer, AutoModel
28
 
29
- tokenizer = AutoTokenizer.from_pretrained("RaphaelMourad/Mistral-Prot-v1-15M", trust_remote_code=True)
30
- model = AutoModel.from_pretrained("RaphaelMourad/Mistral-Prot-v1-15M", trust_remote_code=True)
31
  ```
32
 
33
  ## Calculate the embedding of a protein sequence
34
 
35
  ```
36
- insulin = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
37
  inputs = tokenizer(insulin, return_tensors = 'pt')["input_ids"]
38
  hidden_states = model(inputs)[0] # [1, sequence_length, 256]
39
 
@@ -48,7 +48,7 @@ Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer.
48
 
49
  ## Notice
50
 
51
- Mistral-Prot-v1-15M is a pretrained base model for protein.
52
 
53
  ## Contact
54
 
 
6
  - protein
7
  ---
8
 
9
+ # Model Card for Mistral-DNA-v1-17M-hg38 (Mistral for protein)
10
 
11
+ The Mistral-DNA-v1-17M-hg38 Large Language Model (LLM) is a pretrained generative DNA sequence model with 16.8M parameters.
12
  It is derived from Mixtral-8x7B-v0.1 model, which was simplified for protein: the number of layers and the hidden size were reduced.
13
+ The model was pretrained using 10kb DNA sequences from the hg38 human genome assembly.
14
 
15
  ## Model Architecture
16
 
 
26
  import torch
27
  from transformers import AutoTokenizer, AutoModel
28
 
29
+ tokenizer = AutoTokenizer.from_pretrained("RaphaelMourad/Mistral-DNA-v1-17M-hg38", trust_remote_code=True)
30
+ model = AutoModel.from_pretrained("RaphaelMourad/Mistral-DNA-v1-17M-hg38", trust_remote_code=True)
31
  ```
32
 
33
  ## Calculate the embedding of a protein sequence
34
 
35
  ```
36
+ insulin = "TGATGATTGGCGCGGCTAGGATCGGCT"
37
  inputs = tokenizer(insulin, return_tensors = 'pt')["input_ids"]
38
  hidden_states = model(inputs)[0] # [1, sequence_length, 256]
39
 
 
48
 
49
  ## Notice
50
 
51
+ Mistral-DNA-v1-17M-hg38 is a pretrained base model for DNA.
52
 
53
  ## Contact
54