Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestQueue.getRequest() should use local cache #297

Open
jancurn opened this issue Feb 1, 2019 · 3 comments
Open

RequestQueue.getRequest() should use local cache #297

jancurn opened this issue Feb 1, 2019 · 3 comments
Labels
bug Something isn't working. feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@jancurn
Copy link
Member

jancurn commented Feb 1, 2019

This shouldn't cause any problem and can greatly improve performance. See TODO at https://github.com/apifytech/apify-js/blob/master/src/request_queue.js#L276

@jancurn jancurn added the feature Issues that represent new features or improvements to existing features. label Feb 1, 2019
@jancurn
Copy link
Member Author

jancurn commented Feb 1, 2019

Actually, since the underlying storage is not read-after-write consistent, calling addRequest and getRequest immediately after that might return null, and thus cause weird bugs. I'm flagging this as bug then.

@jancurn jancurn added the bug Something isn't working. label Feb 1, 2019
@jancurn
Copy link
Member Author

jancurn commented Feb 14, 2019

This might also be the cause of this problem:

2019-02-14T11:59:46.283Z ERROR: BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://www.example.com/","retryCount":1} (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.286Z   ApifyClientError: Record was not found
2019-02-14T11:59:46.288Z     at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.291Z     at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.294Z     at <anonymous>
2019-02-14T11:59:46.296Z     at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.298Z ERROR: BasicCrawler: runTaskFunction error handler threw an exception. This places the RequestQueue into an unknown state and crawling will be terminated. This most likely happened due to RequestQueue being overloaded and unable to handle Request updates even after exponential backoff. Try limiting the concurrency of the run by using the maxConcurrency option. (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.300Z   ApifyClientError: Record was not found
2019-02-14T11:59:46.302Z     at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.303Z     at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.305Z     at <anonymous>
2019-02-14T11:59:46.307Z     at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.309Z ERROR: AutoscaledPool: runTaskFunction failed. (error details: type=record-not-found, statusCode=404)
2019-02-14T11:59:46.311Z   ApifyClientError: Record was not found
2019-02-14T11:59:46.313Z     at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.315Z     at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.317Z     at <anonymous>
2019-02-14T11:59:46.319Z     at process._tickCallback (internal/process/next_tick.js:189:7)
2019-02-14T11:59:46.382Z User function threw an exception:
2019-02-14T11:59:46.388Z ApifyClientError: Record was not found
2019-02-14T11:59:46.390Z     at exports.newApifyClientErrorFromResponse (/home/myuser/node_modules/apify-client/build/utils.js:87:12)
2019-02-14T11:59:46.392Z     at exports.requestPromise (/home/myuser/node_modules/apify-client/build/utils.js:158:19)
2019-02-14T11:59:46.394Z     at <anonymous>
2019-02-14T11:59:46.396Z     at process._tickCallback (internal/process/next_tick.js:189:7)
@jancurn
Copy link
Member Author

jancurn commented Feb 15, 2019

Just a note that the RequestQueue should support the use case where one actors writes to the queue and another one is reading from it. Perhaps the cache should be used only if it's less than N seconds old, and afterwards we can just use underlying storage.

@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.
3 participants