OnlySportsLM

Model Overview

OnlySportsLM is a 196M language model specifically designed and trained for sports-related natural language processing tasks. It is part of the larger OnlySports collection, which aims to advance domain-specific language modeling in sports.

Model Architecture

Base architecture: RWKV-v6
Parameters: 196 million
Structure: 20 layers, 640 dimensions

Training

Dataset: OnlySports Dataset (subset of 315B tokens out of 600B total)
Training setup: 8 H100 GPUs
Optimizer: AdamW
Learning rate: Initially 6e-4, adjusted to 1e-4 due to observed loss spikes
Context length: 1024 tokens

Performance

OnlySportsLM shows impressive performance on sports-related tasks:

Outperforms previous SOTA 135M/360M models by 37.62%/34.08% on the OnlySports Benchmark
Competitive with larger models like SomlLM 1.7B and Qwen 1.5B in the sports domain

Usage

You can use this model for various sports-related content generation.

Download all files in this repo. Open RWKV_v6_demo.py for inference.

Limitations

The model is specifically trained on sports-related content and may not perform as well on general topics
Training was stopped at 315B tokens due to resource constraints, potentially limiting its full capabilities

Related Resources

Citation

If you use OnlySportsLM in your research, please cite our paper.

Contact

For more information or inquiries about OnlySportsLM, please visit our GitHub repository.

Chrisneverdie
/

OnlySportsLM_196M