zhangtaolab
/

plant-dnamamba-open_chromatin

Inference Endpoints

Model card Files Files and versions Community

lgq12697 commited on Jul 4

Commit

3b7feee

•

1 Parent(s): 308629b

Update README.md

Files changed (1) hide show

README.md +11 -22

README.md CHANGED Viewed

@@ -18,43 +18,32 @@ All the models have a comparable model size between 90 MB and 150 MB, BPE tokeni
 ### Model Sources
 - **Repository:** [Plant DNA LLMs](https://github.com/zhangtaolab/plant_DNA_LLMs)
-- **Manuscript:** [Versatile applications of foundation DNA language models in plant genomes]()
 ### Architecture
-The model is trained based on the OpenAI GPT-2 model with modified tokenizer specific for DNA sequence.
 ### How to use
 Install the runtime library first:
 ```bash
 pip install transformers
 ```
-Here is a simple code for inference:
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
-model_name = 'plant-dnagpt-H3K27ac'
-# load model and tokenizer
-model = AutoModelForSequenceClassification.from_pretrained(f'zhangtaolab/{model_name}', trust_remote_code=True)
-tokenizer = AutoTokenizer.from_pretrained(f'zhangtaolab/{model_name}', trust_remote_code=True)
-# inference
-sequences = ['GCTTTGGTTTATACCTTACACAACATAAATCACATAGTTAATCCCTAATCGTCTTTGATTCTCAATGTTTTGTTCATTTTTACCATGAACATCATCTGATTGATAAGTGCATAGAGAATTAACGGCTTACACTTTACACTTGCATAGATGATTCCTAAGTATGTCCT',
-             'TAGCCCCCTCCTCTCTTTATATAGTGCAATCTAATATATGAAAGGTTCGGTGATGGGGCCAATAAGTGTATTTAGGCTAGGCCTTCATGGGCCAAGCCCAAAAGTTTCTCAACACTCCCCCTTGAGCACTCACCGCGTAATGTCCATGCCTCGTCAAAACTCCATAAAAACCCAGTG']
-pipe = pipeline('text-classification', model=model, tokenizer=tokenizer,
-                trust_remote_code=True, top_k=None)
-results = pipe(sequences)
-print(results)
-```
 ### Training data
-We use GPT2ForSequenceClassification to fine-tune the model.
 Detailed training procedure can be found in our manuscript.
 #### Hardware
-Model was trained on a NVIDIA GTX1080Ti GPU (11 GB).

 ### Model Sources
 - **Repository:** [Plant DNA LLMs](https://github.com/zhangtaolab/plant_DNA_LLMs)
+- **Manuscript:** [Versatile applications of foundation DNA large language models in plant genomes]()
 ### Architecture
+The model is trained based on the State-Space Mamba-130m model with modified tokenizer specific for DNA sequence.
+This model is fine-tuned for predicting open chromatin.
 ### How to use
 Install the runtime library first:
 ```bash
 pip install transformers
+pip install causal-conv1d<=1.2.0
+pip install mamba-ssm<2.0.0
 ```
+Since `transformers` library (version < 4.43.0) does not provide a MambaForSequenceClassification function, we wrote a script to train Mamba model for sequence classification.
+An inference code can be found in our [GitHub](https://github.com/zhangtaolab/plant_DNA_LLMs).
+Note that Plant DNAMamba model requires NVIDIA GPU to run.
 ### Training data
+We use a custom MambaForSequenceClassification script to fine-tune the model.
 Detailed training procedure can be found in our manuscript.
 #### Hardware
+Model was trained on a NVIDIA GTX4090 GPU (24 GB).