Base model: westlake-repl/SaProt_35M_AF2

Model Card for Model ID

This model is trained on a sigle site deep mutation scanning dataset and can be used to predict fitness score of mutant amino acid sequence of protein GAL4_YEAST (Regulatory protein).

Protein Function

This protein is a positive regulator for the gene expression of the galactose-induced genes such as GAL1, GAL2, GAL7, GAL10, and MEL1 which code for the enzymes used to convert galactose to glucose. It recognizes a 17 base pair sequence in (5'-CGGRNNRCYNYNCNCCG-3') the upstream activating sequence (UAS-G) of these genes.

Task type

protein level regression

Dataset description

The dataset is from Deep generative models of genetic variation capture the effects of mutations. And can also be found on SaprotHub dataset.

Label means fitness score of each mutant amino acid sequence, ranging from negative infinity to positive infinity.

Model input type

Amino acid sequence

Performance

0.72 Spearman's ρ

LoRA config

lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

Training config

class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-4

epoch: 50

batch size: 2

precision: 16-mixed

SaProtHub
/

DMS_GAL4_YEAST