Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore parallelization for CrawlerCheerio #282

Open
jancurn opened this issue Jan 16, 2019 · 1 comment
Open

Explore parallelization for CrawlerCheerio #282

jancurn opened this issue Jan 16, 2019 · 1 comment
Labels
discussion feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@jancurn
Copy link
Member

jancurn commented Jan 16, 2019

Cheerio is quite CPU intensive, so for higher concurrency of the crawler, the CPU chokes. We should explore whether it's possible to run Cheerio download and parsing in a separate worker thread or process. Theoretically, we could somehow transform and serialize the Cheerio object so that it can be sent between processes. The question is how much overheads will the serialization add. We need to test it and only enable this feature if the CPU is choking.

@jancurn jancurn added feature Issues that represent new features or improvements to existing features. discussion labels Jan 16, 2019
@pocesar
Copy link
Contributor

pocesar commented Mar 28, 2020

Node 12 enables us to use worker_threads natively, with the ability to share memory between threads, efficient communication / serialization, and being able to do atomic operations as well, everything out-of-the-box. we could learn from Rust Tokio with their work-stealing approach for threadpool

@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.
3 participants