Since SE decided to respond to everyone except me on the internal announcement, this answer is a direct and mostly unmodified copy of the questions I posted on the internal announcement [mod-only link] on 2024-08-22. A couple notes have been added after the fact to account for answers provided to other people that further emphasise the various points I've outlined. Additionally, a couple extra notes have been added, as a form of bot detection was rolled out some time around 2024-08-30 that resulted in problems a week later (2024-09-06) that killed boson. This rollout was apparently unrelated to this change, but it took several days and calling out SE6 to actually figure that out.
TL;DR:
- What is a request and how do you count it? The post demonstrates how one singular webpage load can be as much as nearly 60 requests entirely on its own, and that most page loads invokes more than one request, which drastically reduces the number of effective page loads before blocks appear.
- Have you accounted for power users with a significant amount of traffic from, among other things, mass-opening search pages and running self-hosted stackapps that don't take the form of userscripts?
- Even though individual applications may not exceed 60 or even just 10 requests per minute, combined traffic from power users can trivially exceed this in active periods, especially when operating at a scale.
- While not formatted as a question, how this is implemented actually matters. See both questions 1 and 2, and the rest of this post. This point isn't possible to summarise.
- Failure to actually correctly detect traffic can and likely will result in applications like the comment archive being taken offline, because its traffic is combined with my traffic and everything else I host and use. A moderator exemption will not help the other things I host that make requests that aren't authed under my account.5
- How exactly is the block itself implemented? Is the offending user slapped with an IP ban? This affects whether or not this change screws over shared networks, including (but not limited to) workplace networks and VPNs
- The API is mentioned as a migration target, but it isn't exhaustive enough, nor are extensions to it planned for some of the cure functionality that risks being affected by this change - also not formatted as a question, but the API is not a fully viable replacement for plain requests to the main site.
See the rest of the post for context and details.
I cannot add anything to the list, as it either can't be reviewed or doesn't meet the threshold to be added to the list, but as a self-hoster, I am heavily impacted by this.
We do not plan on implementing stricter general use rate-limits. In other words, we will only limit traffic if it comes from non-human sources. [...] (Unfortunately I’m unable to share details as to how this is implemented.)
See, this worries me. If you [SE] can't tell us how you tell bots from non-bots, that means there's a good chance of incorrect classifications in both directions. I have no idea how you're implementing this, but if you [SE], for example, make the mistake of relying on stuff "humans normally do", you [SE] can and will run into problems with adblock users.
I'm taking an educated guess here, because this is the only trivial way to axe stuff like Selenium without affecting identical-looking real users. Going by user-agent is another obvious choice, but this doesn't block selenium and similar tooling.
There's also the third option of using dark magic in CloudFlare, in which case, I'm completely screwed for reasons I'll describe in the bullet point on Boson momentarily.
There's two problems I see with this if the incorrect classifications are bad enough:
- Any IP with more than a couple people actively using the site can and will be slapped with a block, even if they're normal users
- Not going after IPs makes it trivial to tell how the bot detections are made, which means it's trivial to bypass without using VPNs.
- VPNs may also be disproportionately affected here; there are many moderators who regularly use VPNs for all or most internet use who will get slapped with blocks.
- Any IP with a sufficient number of stackapps running (of any type) can and will run into problems.
I fall solidly into category #2, and occasionally #1 (via both work and varying VPN use).
On my network, I host the following stackapps:
Boson, the bot running the comment archive (2 API requests per minute + 2 * n chat messages, n >= 0, + multiple requests to log in)
Very occasionally, CloudFlare decides to slap Boson's use of the API, which very occasionally sends it into a reboot-relog spiral. I'm pretty sure I've mitigated this, but there are still plenty of failure modes I haven't accounted for. When this happens, I often then get slapped with a recaptcha that I have to manually solve to get boson back up. Also interestingly, CloudFlare only ever slaps API use, and not chat use, even though the number of chat messages posted (in a minute where there are new comments to post to chat) exceeds that of the number of API requests made by a potentially significant margin.
Also, generally speaking, chatbots on multiple chat domains have to do three logins in a short period of time. Based on observational experience, somewhere between 4 and 8 logins is a CloudFlare trigger.
The number of CF-related problems has gone drastically up in the last few months as a result of CF being, well, CF. It detects perfectly normal traffic as being suspicious on a semi-regular basis, and this kills my tools and is annoying to recover from, because a few of the blocks are hard to get around. Finishing up the note from earlier in this answer, if dark CloudFlare magic is used to implement this system, I will have problems within the first few hours of this being released, because that's just how CloudFlare works.
Don't you just love CloudFlare? /s
Editor's note: On 2024-09-06, an apparently unrelated bot detection change happened that independently killed boson. It initially looked like parts of the rate limiting change, but SE has denied this. They also denied making any recent changes to bot detection, and instead suspected CloudFlare might've done something that broke it separately. In either case, this highlights my point; CloudFlare-based bot detection will break stuff, intentionally or otherwise. If they don't even have to enable something that can break community bots and tooling, CF itself is a problem as long as it's operational.
For context, based on the internet archive and observations from devtools, the specific bot detection system used is JavaScript detections, which requires a full browser environment to run. This is not an option with Boson, because it's fully headless, and written in C++ where I'm not masochistic enough to set up webdriver support.
IA observations suggest this particular form of bot detection was enabled on 2024-08-30. It's still unclear why it took a week to run into problems - it might be a coincidence, or it might've only become a problem on 2024-09-06 for complicated CF config reasons I'm not going to pretend to understand.
If you too would like to get blocked from bot detection, you can curl https://stackoverflow.com/login
(intentionally 404 - 404 pages result in far more aggressive blocks than any other pages) four times. Also, as a preemtive note, SE was notified about the details of this several times (after the fact, however7) when they took an interest after being called out in public
An unnamed, partly unregistered, closed-source bulk-action comment moderation tool. It does 1 API request per minute (and has for the last 1-2 years). When actively used1, it does up to 15-20 per minute, with a combination of API requests (the main comment deletion method) and via the undocumented bulk deletion endpoint[2][3], because this thing is designed for moderation at a scale.
- In addition to the requests already listed, in certain configurations, it too goes through an automated login process.
- The majority of the requests are API requests, however, but the login required to make bulk deletion work cannot be done through the API
- Just like boson, the comment collection process semi-regularly gets killed by CloudFlare during API calls.
Every quarter, I'll be downloading the data dump - and likely have to repeat several requests, because the download system appears to be flaky under load. This is done automatically through Stack Exchange data dump downloader and transformer, which makes somewhere between 10 and 20 page requests per second, including redirects, and with peaks during the login process for reasons that will be shown later.
Editor's note: The data dump downloader shouldn't be affected. Based on science done on the JS detection bot killer from 2024-09-06 to 2024-09-07, Selenium is unlikely to be affected. If the rate limit becomes a problem, I'll artificially lower the request volume until it cooperates. Please open an issue on GitHub if it breaks anyway - ensuring we have continued access to the data dump is the only priority I have left atm.
I have an RSS feed set up to read a meta feed, running every 15 minutes
I had plans to host a few more stackapps as well, but those plans were delayed by the strike and strike fallout. If those plans continue at some point, I'll be making ✨ even more requests ✨, as these were also planned in the form of chatbots, and chatbots cannot be moved over to the API.
I very occasionally run various informal tooling to monitor Stuff. These are in the form of bash scripts that use curl
, and ntfy
to tell me if whatever it is I'm looking for has happened. These are all applications where the API is an infeasible strategy for something this tiny. The last such script I made ran every 6 hours
Whenever I do bot/stackapp development, the number of requests goes up significantly for a short period of time (read: up to several hours) due to higher-than-normal request rates for debugging purposes
Editor's note: Most of these, with the exception of Boson (and the data dump downloader), were axed a few days ahead of the initial announcement deadline as an attempt to load shed after not getting any response from SE for nearly two weeks, and the initial deadline approaching rapidly. Boson, as previously mentioned, was later killed when an unrelated bot detection change appeared to take the place of the rate limit change. All the applications listed here (again, except the data dump downloader) are now disabled and/or axed, and all my future plans for stackapps are scrapped due to the lack of response from SE.
In addition, I run a crapton of userscripts (including a few very request-heavy userscripts), and actively use SO and chat. Under my normal use, I can load several pages per minute, and depending on how you count requests (a problem I'm commenting on later), this totals potentially hundreds of web requests.
During burninations, I also load a full pagesize 50 search page worth of questions to delete. This means that over the course of around 20 seconds, my normal use can burn through around 55 page loads, not including requests to delete questions. Mass-opening search pages is a semi-common use-case as well, and will result in problems with the currently proposed limits - 60/min is extremely restrictive for power users.
Though the vast majority of these individual things does not exceed 60 requests per minute, when you combine this activity, I have a problem. If any of my activity is incorrectly (or even correctly in the case of the bots I have running) identified as automated, I get yeeted and my tools get killed. If said killing is done by CloudFlare, recovery is going to be a pain in the ass.
With limits as strict as the proposed limits, it'll become significantly harder to self-host multiple stackapps without running into even more rate limiting problems. The vast majority of current anti-whatever tooling is IP-based, so if this system is too, the activity will be totalled, and I will eventually and inevitably get IP-blocked.
Even though some applications have a very infrequent usage time, if Something Happens:tm: or the run times happen to overlap, I can trivially exceed the request limit. This is especially true of logins following an internet outage or site outage, as an outage that kills my scripts forces relogs, and again, logging in is expensive when done at a scale. It's already annoying enough with CloudFlare getting in the way.
While a moderator exemption will reduce some of the problems, if the result is an IP block, I have a4 bot account that does a chunk of these requests. Other requests again are fully unauthenticated, but these make up an extreme minority of the total number of requests made.
Editor's note: as demonstrated by the bot detection rollout earlier in September, while there may be exemptions here, that apparently does not extend to anything else that could break bots and tooling. We may be safe from the rate limit, but not necessarily the next bot detection system they enable quietly, or have enabled for them by CloudFlare.
Misc. other problems
To this end, moderators will be granted a unilateral exception to the new rate-limit on any site they moderate
Moderators are far from the only people to have a use pattern like mine - there's lots of active users doing a lot in the network, or on their favourite site.
Note that we strongly recommend userscript developers switch over to API usage as soon as possible
There are multiple things there aren't endpoints for. There's no way to log in, there's no way to download the data dumps, there's no way to log into chat, there's no way to post to chat, etc.
Advanced Flagging, for example, posts feedback to certain bots by sending messages in chat. It does so with manual non-API calls, because there's simply no way to do this via the API.
Exclusions of endpoints for some of these things are either implied to be, or have explicitly been said to be by design. As convenient as it would be, as it currently stands, the API is not a viable substitute for everything userscripts or bots need to do. This is especially true for certain actions given the heavy rate limiting, tiny quota, and small to non-existent capacity for bulk API requests.
Editor's note: SE answered someone else internally who asked about an API for chat around 3 hours after I posted my questions, where they confirmed they would not be adding chat support to the API in the foreseeable future. This further underlines my point; not everything can just be switched over to the API and be expected to work. This will result in stuff breaking.
What is a request?
Here's a few samples of requests made from various pages. Note that non-SO domains are omitted, including gravatar, metasmoke, ads, cookielaw, and third-party JS CDNs. Also note that in this entire section, I'll be referring specifically to stackoverflow.com
, but that's just because I don't feel like writing a placeholder for any domain in the network. Whenever I refer to stackoverflow.com
, it can be substituted with any network URL
Requests from a random question page with no answers:
Requests made (incl. userscripts, excl. blocked) |
Requests blocked (uBlock) |
Userscript requests |
Domain |
13 |
2 |
|
cdn.sstatic.net |
4a |
|
2 |
i.sstatic.net |
3 |
|
|
stackoverflow.com |
2 |
|
|
qa.sockets.stackexchange.com |
1 |
|
1 |
api.stackexchange.com |
Editor's note: After the release of the previously mentioned unrelated bot detection and the (also unrelated) recent tags experiment, requests to stackoverflow.com
went from 3 to 7, meaning 9 questions/min is enough to get rate limited. The bot detection adds 3 requests alone, so all the counts are now out of date - and they're on the lower end of things.
Requests from a random question page with 4 answers:
Requests made (incl. userscripts, excl. blocked) |
Requests blocked (uBlock) |
Userscript requests |
Domain |
13 |
2 |
|
cdn.sstatic.net |
12a |
|
3 |
i.sstatic.net |
3 |
|
|
stackoverflow.com |
2 |
|
|
qa.sockets.stackexchange.com |
Requests from the flag dashboard:
Requests made (incl. userscripts, excl. blocked) |
Requests blocked (uBlock) |
Userscript requests |
Domain |
31a |
|
|
i.sstatic.net |
8 |
2 |
|
cdn.sstatic.net |
1 |
|
|
stackoverflow.com |
1 |
|
|
qa.sockets.stackexchange.com |
Requests made when expanding any post with flags:
Requests made (incl. userscripts, excl. blocked) |
Requests blocked (uBlock) |
Userscript requests |
Domain |
3 |
|
|
stackoverflow.com |
1 |
|
|
i.sstatic.net |
Requests from https://stackoverflow.com/admin/show-suspicious-votes
:
Requests made (incl. userscripts, excl. blocked) |
Requests blocked (uBlock) |
Userscript requests |
Domain |
18 |
2 |
|
cdn.sstatic.net |
33a |
|
|
i.sstatic.net |
1 |
|
|
stackoverflow.com |
1 |
|
|
qa.sockets.stackexchange.com |
Requests from the login page through a completed login:
Requests made (incl. userscripts, excl. blocked and failed) |
Requests blocked (uBlock) |
Failed requests |
Userscript requests |
Domain |
1 |
|
1 |
|
askubuntu.com |
30 |
6 |
|
|
cdn.sstatic.net |
18 |
|
|
|
i.sstatic.net |
1 |
|
1 |
|
mathoverflow.net |
1 |
|
1 |
|
serverfault.net |
1 |
|
1 |
|
stackapps.com |
1 |
|
1 |
|
stackexchange.com |
6 |
|
|
|
stackoverflow.com |
1 |
|
1 |
|
superuser.com |
a: Changes based on the number of users on the page. On large Q&A threads or any form of listing page, this can get big.
While mods are allegedly exempt, this still shows a pretty big problem; how do you count one request? Even if it's just requests to stackoverflow.com
, question pages amplify one request to 3, and that's just with the current requests made. Depending on how you count the number of requests, just loading one singular strategic page (probably a search page with pagesize=50) can be enough to cap out the entire request limit. The vast majority of pages in the network do some amount of request amplification.
The login page in particular is, by far, the worst. One singular login makes requests to every other site in the network, meaning the total succeeded requests account for 12 requests just on that one action.
This leads to the question proposed in the section header: what is a request, and how is it counted? How do you [SE] plan to ensure that the already restrictive 60 requests/min limit doesn't affect normal users?
Footnotes
- Admittedly, it hasn't for over a year because strike and the following lack of motivation to keep going
- Part of the manual tooling deals with bulk operations on identical comments. I could queue them up to the mountain of a backlog I generate when reviewing comments, or I could tank a large number of problematic comments in one request, drastically freeing up quota for other parts of the application.
- This doesn't run all the time either, and is in a separate module from the bulk of the API requests
- Technically two, but one of them are not in use yet because strike.
- I also just realised that I can bypass this by running all the stackapps under my account, but this means running live bot accounts with moderator access, and insecurely storing credentials on a server exposed to the internet. For obvious reasons, this is a bad idea, but if self-hosting isn't accounted for, this is the only choice I have to maintain the comment archive.
- I still stand by my comments saying Stack Exchange, Inc. killed the comment archive - whether or not it's related to the change announced here or not, someone at Stack Exchange, Inc. rolled out bot detection either intentionally, or accidentally by willingly continuing to use CloudFlare in spite of it already having been disruptive to bots and other community tools. Granted, the challenge platform detection in particular is far more disruptive than other parts of CloudFlare, but CloudFlare has killed API access for me several times (and I'm not counting the bug on 2024-08-16 - I'm exclusively talking about rate limiting blocks by CF under normal operation, and while complying with the backoff signals in API responses)
- SE was not notified ahead of time, because due to the initial rollout timeline for the rate limit coinciding with the bot blocking kicking in, and being ignored for three weeks straight internally at the time, I assumed this was intentional, and opted to ensure the comment archive could remain functional somewhere over writing a bug report. There's a lot more private details as to why the bug report came after the fact (read: after SE suddenly decided to respond to me), which SE has been told in detail multiple times.
NOARCHIVE
, notNOCACHE
.robots.txt
only keeps honest people out