Edit model card

OnlySportsLM

Model Overview

OnlySportsLM is a 196M language model specifically designed and trained for sports-related natural language processing tasks. It is part of the larger OnlySports collection, which aims to advance domain-specific language modeling in sports.

Model Architecture

  • Base architecture: RWKV-v6
  • Parameters: 196 million
  • Structure: 20 layers, 640 dimensions

Training

  • Dataset: OnlySports Dataset (subset of 315B tokens out of 600B total)
  • Training setup: 8 H100 GPUs
  • Optimizer: AdamW
  • Learning rate: Initially 6e-4, adjusted to 1e-4 due to observed loss spikes
  • Context length: 1024 tokens

Performance

OnlySportsLM shows impressive performance on sports-related tasks:

  • Outperforms previous SOTA 135M/360M models by 37.62%/34.08% on the OnlySports Benchmark
  • Competitive with larger models like SomlLM 1.7B and Qwen 1.5B in the sports domain

image/png

Usage

You can use this model for various sports-related content generation.

Download all files in this repo. Open RWKV_v6_demo.py for inference.

Limitations

  • The model is specifically trained on sports-related content and may not perform as well on general topics
  • Training was stopped at 315B tokens due to resource constraints, potentially limiting its full capabilities

Related Resources

Citation

If you use OnlySportsLM in your research, please cite our paper.

Contact

For more information or inquiries about OnlySportsLM, please visit our GitHub repository.

Downloads last month
20
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train Chrisneverdie/OnlySportsLM_196M

Collection including Chrisneverdie/OnlySportsLM_196M