api-inference documentation

Rate Limits

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Rate Limits

The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider Inference Endpoints to have dedicated resources.

User Tier Rate Limit
Unregistered Users 1 request per hour
Signed-up Users 300 requests per hour
PRO and Enterprise Users 1000 requests per hour
< > Update on GitHub