Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate adblocker functionality #456

Open
jakubbalada opened this issue Sep 18, 2019 · 9 comments
Open

Integrate adblocker functionality #456

jakubbalada opened this issue Sep 18, 2019 · 9 comments
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@jakubbalada
Copy link
Member

Interesting tip from HN (for Dashblock):
Maybe you already do it, but I think integrating adblocker functionality when loading JS sites would be desirable to reduce load time. And if ads are what the API user is interested in, perhaps add a flag for whether or not one wants ads to load. Recommendation: https://github.com/cliqz-oss/adblocker Should be the fastest adblocker library (used by Ghostery, Cliqz and Brave)

@jakubbalada jakubbalada added the feature Issues that represent new features or improvements to existing features. label Sep 18, 2019
@mtrunkat
Copy link
Member

This could be integrated into Apify.launchPuppeteer() function as useAdBlock: true option.

https://sdk.apify.com/docs/api/apify#module_Apify.launchPuppeteer

@Darking360
Copy link

Greetings. So the thing would be to implement ad blocker to increase the speed of the scrap/crawl? I could work on this 🙏

@mtrunkat
Copy link
Member

mtrunkat commented Oct 4, 2019

Yes exactly, it could boost the speed especially for some websites that are heavy on ads (news sites). But it would be great to first test this assumption. Would you be interested also in trying this out? Use Apify SDK to run scraper with and without ad blocker against some websites?

@Darking360
Copy link

Sure! I can set up a test and run it to check this first with some timing debug, I'll create it and run it, then attach it here for you to see, thank you 🚀

@pocesar
Copy link
Contributor

pocesar commented Oct 8, 2019

interesting. I manually block all the common ad networks using blockRequests, this would offload the task to the extension

@deleted-user-1
Copy link

Makes sense for a lot of users I guess but fyi it's an explicit anti-feature with usecase-killing effect for me. I'd need this off with zero sideeffects on current behavior.

@remusao
Copy link

remusao commented Jul 23, 2020

Makes sense for a lot of users I guess but fyi it's an explicit anti-feature with usecase-killing effect for me. I'd need this off with zero sideeffects on current behavior.

In the small POC I proposed a while ago #600, the feature is completely disabled by default and only does some work when blocking is enabled by the user.

@mnmkng
Copy link
Member

mnmkng commented Jul 23, 2020

Yeah, sorry @remusao . We still have not figured out if the performance will improve or not. I apologize.

@remusao
Copy link

remusao commented Jul 23, 2020

Yeah, sorry @remusao . We still have not figured out if the performance will improve or not. I apologize.

Of course, no worries at all, I just wanted to make clear to @matjaeck that there should be a way to integrate such a feature without any overhead when it's disabled.

@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.
7 participants