hdallatorre commited on
Commit
0126b6f
1 Parent(s): 66427d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -37,12 +37,13 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git
37
  ```
38
 
39
  A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
40
- ```
 
41
  ⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has
42
  been shown to generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor`
43
  argument in the config to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference
44
  (i.e 6669 for a sequence of 40008 base pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
45
- ```
46
 
47
  ```python
48
  # Load model and tokenizer
 
37
  ```
38
 
39
  A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
40
+
41
+
42
  ⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has
43
  been shown to generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor`
44
  argument in the config to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference
45
  (i.e 6669 for a sequence of 40008 base pairs) and `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
46
+
47
 
48
  ```python
49
  # Load model and tokenizer