Neural Bioinformatics Research Group

community

AI & ML interests

None defined yet.

Neural Bioinformatics Research Group - ProkBERT Models

Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications.

Models

We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance.

ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient.

Model Overview

Model Parameters Tokenizer Layers Attention Heads Max. Context Size Training Data
mini 20.6M 6-mer, shift=1 6 6 1027 nt 206.65 billion
mini-c 24.9M 1-mer 6 6 1022 nt 206.65 billion
mini-long 26.6M 6-mer, shift=2 6 6 4096 nt 206.65 billion

A comprehensive overview of model parameters across varied configurations.

Resources


For more information or questions, please visit our GitHub repository or contact us at email.