Custom Components GalleryNEW
ExploreCustom Components GalleryNEW
ExploreEvery Gradio app comes with a built-in queuing system that can scale to thousands of concurrent users. You can configure the queue by using queue()
method which is supported by the gr.Interface
, gr.Blocks
, and gr.ChatInterface
classes.
For example, you can control the number of requests processed at a single time by setting the default_concurrency_limit
parameter of queue()
, e.g.
demo = gr.Interface(...).queue(default_concurrency_limit=5)
demo.launch()
This limits the number of requests processed for this event listener at a single time to 5. By default, the default_concurrency_limit
is actually set to 1
, which means that when many users are using your app, only a single user's request will be processed at a time. This is because many machine learning functions consume a significant amount of memory and so it is only suitable to have a single user using the demo at a time. However, you can change this parameter in your demo easily.
See the docs on queueing for more details on configuring the queuing parameters.