Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Rate Limits

The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider Inference Endpoints to have dedicated resources.

User Tier	Rate Limit
Unregistered Users	1 request per hour
Signed-up Users	300 requests per hour
PRO and Enterprise Users	1000 requests per hour

< > Update on GitHub

←Supported Models Security→

api-inference

Rate Limits