Explore parallelization for CrawlerCheerio #282
Labels
discussion
feature
Issues that represent new features or improvements to existing features.
t-tooling
Issues with this label are in the ownership of the tooling team.
Cheerio is quite CPU intensive, so for higher concurrency of the crawler, the CPU chokes. We should explore whether it's possible to run Cheerio download and parsing in a separate worker thread or process. Theoretically, we could somehow transform and serialize the Cheerio object so that it can be sent between processes. The question is how much overheads will the serialization add. We need to test it and only enable this feature if the CPU is choking.
The text was updated successfully, but these errors were encountered: