Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestQueue could have a limit on max enqueued requests #321

Open
mnmkng opened this issue Feb 21, 2019 · 1 comment
Open

RequestQueue could have a limit on max enqueued requests #321

mnmkng opened this issue Feb 21, 2019 · 1 comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@mnmkng
Copy link
Member

mnmkng commented Feb 21, 2019

Currently, when crawling pages with Pseudo URLs, it often happens that the crawler spends most of its time enqueueing thousands of pages in the request queue and the user has no way of limiting this behavior. They may set the maxRequestsPerCrawl option, but that only limits the pages actually crawled, not the requests enqueued. Thus, the user may end up with 100 pages crawled and thousands in the queue.

This will be especially important when switching to the per-request priced persistent queue.

We could add an enqueuedRequests property to RequestQueue that would get initialized automatically to current value from storage and then increment itself in memory with each added request.

We would also add an options.requestLimit configuration property to RequestQueue. After reaching this limit, .addRequest() would return null or something and prevent enqueueing of more requests.

@mtrunkat @jancurn

@jancurn
Copy link
Member

jancurn commented Feb 22, 2019

I think this is a good idea. Maybe I'd call the option differently, e.g. maxRequestCount to make it more clear.

One note: if the new request has forefront: true, shall we enqueue it or not when limit is reached? To be perfectly logically correct, we should, since it means the request has some kind of a priority.

@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
3 participants