Chrisneverdie
/

OnlySports_Classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

OnlySports_Classifier / README.md

Chrisneverdie's picture

Update README.md

61e2172 verified 8 days ago

|

history blame contribute delete

No virus

1.96 kB

	---
	license: cc-by-sa-4.0
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- sports
	datasets:
	- Chrisneverdie/OnlySports_Dataset
	base_model: Snowflake/snowflake-arctic-embed-xs
	---


	# Sports Text Classifier

	## Overview

	This Sports Text Classifier is a crucial component of the OnlySports Dataset creation pipeline. It's designed to accurately identify and extract sports-related documents from a large corpus of web content.

	## Model Architecture

	- Base model: [Snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)
	- Additional layer: Binary classification layer
	- Training: 10 epochs with a learning rate of 3e-4

	## Performance

	The classifier achieves exceptional accuracy in distinguishing between sports and non-sports documents:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/656590bd40440ddcc051ade7/hK_a183i2_H5AfUF6ZXd6.png)

	## Training Data

	The classifier was trained on a balanced dataset of sports and non-sports content:

	- 64k samples from seven prestigious sports websites
	- 36k non-sports text documents classified using GPT-3.5

	## Usage

	This classifier is primarily used in the creation of the OnlySports Dataset, presented in this [paper](https://arxiv.org/abs/2409.00286). It can be applied to filter large text corpora for sports-related content with high accuracy.

	## Integration

	The classifier is integrated into a MapReduce architecture for efficient processing of large-scale datasets. It's used in conjunction with URL keyword filtering to create a comprehensive sports text dataset.

	## Related Projects

	This classifier is part of the larger OnlySports collection, which includes:

	- [OnlySports Dataset](https://huggingface.co/collections/Chrisneverdie/onlysports-66b3e5cf595eb81220cc27a6)
	- [OnlySportsLM](https://huggingface.co/Chrisneverdie/OnlySportsLM_196M)

	For more information, check our [paper](https://arxiv.org/abs/2409.00286).