Rendered at 16:02:53 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
denysvitali 1 hours ago [-]
Cloudflare is known to use fingerprinting to detect scrapers For example, they use JA3 fingerprints and match them against the UA to block stuff like cURL while allowing OkHttp (Android clients) - but this can be easily be spoofed with packages such as CycleTLS [1].
I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection", but unless you do PoW (which is also ecologically a nightmare), probably fingerprinting is the way to go - completely destroying the privacy of everyone involved.
Cromite, a privacy conscious fork of Chromium for Android, has constantly issues with CloudFlare Turnstile [2] because they (Cloudflare) try to fingerprint it in multiple ways in order to pass the challenge. The only way to get it to work would be to join the CloudFlare Browser Developer program - which requires signing an NDA. Rightfully so, the project maintainer didn't want to do it.
If you want to see the extent of what CloudFlare does to fingerprint the browsers, just have a look in the issue [2] and see which flags need to be disabled in order to allow CloudFlare to pass the challenge.
I understand both sides, but at least CloudFlare could be flexible enough to fall back to PoW instead of just blocking people from sending forms or accessing websites...
it's all for nothing, because Cloudflare's scraping protection works about as well as a $5 padlock - good enough to dissuade bored teens, not good enough to dissuade even an amateur burglar. if someone wants to scrap your publicly visible data, they will. there's nothing you can do.
ACCount37 14 minutes ago [-]
At the same time: it sure works well enough to annoy anyone with a "bad ASN" IP with 80 captchas a day.
PearlRiver 42 minutes ago [-]
This is why I have two separate browsers. If you want to do official stuff like paying for things you need to get through cloudflare.
helterskelter 23 minutes ago [-]
Firefox added profile switching recently. Works good.
(That said, I still keep separate machines. One for doing "official" things, the other for everything else)
ajb 6 minutes ago [-]
Odd - they've had that for years, but only on the command line. Wonder if it's different under the hood? They also have firefox containers which also never quite became a first-class feature (you have to install a plugin).
b65e8bee43c2ed0 10 minutes ago [-]
>Works good.
does it? same binary, same machine, same display, same 781 other heuristics.
JoshTriplett 33 minutes ago [-]
"This makes your browser appear suspicious because it looks like you're trying to hide your identity."
Yeah, this needs to be burned to the ground.
gruez 16 minutes ago [-]
Bad optics aside, it doesn't actually reflect reality. See my other comment. You can enable basically all the privacy settings and still pass turnstile. Tor browser in a VM passes it, of all things.
I tested this extension that I've been using for a long time on the turnstile page and it got through, fwiw. I think it's a bit more subtle than how resistfingerprinting works but not sure what the privacy tradeoff is.
Thanks, i did not know about `privacy.resistfingerprinting`
I'll make sure to fail all cloudflare turnshit in the future.
gruez 58 minutes ago [-]
I have it enabled and turnstile works fine.
adamtaylor_13 56 minutes ago [-]
So if you need to prevent bot abuse, but also don't want an ugly captcha every time someone goes to sign up, is there a better option?
ribtoks 36 minutes ago [-]
Use proof-of-work captchas, many are private by default. Look into Private Captcha or Cap captcha.
phoronixrly 19 minutes ago [-]
How does proof of work stop bots?
stephantul 16 minutes ago [-]
Because it destroys the economics of scraping. It’s too expensive with proof of work, or at least not as economically viable
gruez 10 minutes ago [-]
Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.
pmontra 4 minutes ago [-]
It will not scare away bots but 10 seconds of wait (CPU or only a sleep) will turn away many real users. "This site is so slow, I'll use something else." A kind of reverse captcha.
ray_v 13 minutes ago [-]
If it gets too expensive/time-consuming to scrape then it won't happen at scale (as much)?
ImPostingOnHN 35 minutes ago [-]
The tool "Anubis" uses proof of work instead
BetterThanSober 12 minutes ago [-]
With a tuned cool down period this isn't a problem, especially if you frequent the sites. OpenWRT uses Anubis and usually when I need to peruse their site I'm on a very low-end device. I prefer waiting much more over finding Waldos
But in principle I agree that there's no good answer to this, scraping _is_ useful and I bet most of us here had scraped something, it is AI company and their use of human's material for training without consent and return that led us to this (I know botting exists in forum since forum is a thing but it is easily solved by human moderators and keyword filter)
timpera 32 minutes ago [-]
Anubis often takes more than 60 seconds to complete on low-end devices (especially old smartphones). It seems like there's no good solution.
dangus 22 minutes ago [-]
That must be really low end then. I’ve never seen it complete in a timeframe that was slower than “I can’t even read the page before it redirects”
ImPostingOnHN 27 minutes ago [-]
There's not an easy, perfect solution, for sure. Newer phones get faster, but spammer compute gets cheaper.
Some sort of decentralized trust web seems like another option, though less viable.
WesolyKubeczek 9 minutes ago [-]
One of unexpected outcomes from AI-induced hardware shortage may be that, in fact, compute won’t be getting cheaper and may in fact get more expensive…
phoronixrly 18 minutes ago [-]
How does Anubis stop bots?
xena 8 minutes ago [-]
Bots don't execute JavaScript or follow complicated redirects.
avallach 50 minutes ago [-]
Doesn't this mean we just need to make the webgl fingerprint resistance implementation smarter? Instead of explicitly rejecting webgl access or responding with dummy data, respond with data that is random within space of N common and reproducible patterns. E.g. emulate webgl implementation of some low spec but actually popular devices.
bflesch 8 minutes ago [-]
All of those advanced features should be enabled on a per-website basis but unfortunately even browsers whose marketing focuses on privacy don't allow you to do that. Same with TLS root CA certificates, there is no way to configure that a certain CA can only create certificates for certain domains.
bflesch 11 minutes ago [-]
Firefox has so much built-in tracking it seems they want to push me to build my own browser. For example every time you open the settings there are several ways they are sending out pings to certain extensions.
Also by default addons.mozilla.org is a privileged site so of course they include google tracking in it and they get the proper fingerprint no matter what you have configured.
gruez 47 minutes ago [-]
This blog post is filled with false assumptions.
>Turns out it's because Cloudflare wants to have a fingerprint of your device via WebGL, the only reason for doing this would be tracking.
> So Cloudflare just banned all WebKitGTK browsers as I guess they put an exception for Safari.
This is false. I ran firefox with:
* hardware acceleration disabled (so software renderer, nothing to fingerprint)
* resistfingerprinting enabled, including letterboxing with default window size
* webgl disabled
* VPN enabled
* In a Windows VM
By all accounts this should be the most suspicious fingerprint ever, but turnstile happily lets me through. If they want to track people, they're doing a pretty bad job. My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
> Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
shiomiru 30 minutes ago [-]
> My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
So why is Cloudflare saying the author got blocked because of WebGL?
> > Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
> This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
Yep. Cloudflare and cloudflare's customers don't care about blocking people that use non-standard browsers (or accessible browsers, or feed readers, or whatever). Using cloudflare defaults is basically saying, "Only major corporate browsers released in the last year or two can access this site."
Wowfunhappy 1 hours ago [-]
...in the age of AI, does anyone have an actual solution for keeping out bots while preserving the privacy of humans?
Obviously this is terrible, but I think there's a possibility it's the least terrible option? Another option is IP reputation, which I think is worse. Or scanning a code with a non-rooted phone, which I think is even worse than that!
fidotron 1 hours ago [-]
> ...in the age of AI, does anyone have an actual solution for keeping out bots while preserving the privacy of humans?
There isn't one, and pretending otherwise is nonsense because humans will always provide their credentials to something to act on their behalf.
In the limit you end up with Chinese phone farms.
tardedmeme 57 minutes ago [-]
Right. Botnet operators love cloudflare because they make so much money renting out compromised machines to pass their tests.
thisislife2 41 minutes ago [-]
The only solution is regulation. If all content created by anyone has a copyright, how does an implicit opt-in (which is what happens if you don't create a robots.txt file for your website) for scraping make any sense? Moreover, even if you have a robots.txt, AI (or whatever) bots often don't respect it (or use workarounds - they outsource scraping of such "restricted" sites to unethical third-parties to get the data; Meta has even resorted to piracy, openly!). So clearly, the logic and the "honour system" has failed.
Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.
Or you could let information be free, at least the stuff that’s on the public net.
As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.
ImPostingOnHN 33 minutes ago [-]
I don't think regulation will stop web scraping, not least of which because it can be done from locations outside the jurisdiction of the regulations.
> we have to acknowledge the system is broken
The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.
thisislife2 27 minutes ago [-]
Just as criminal laws don't end all crimes, copyright laws and anti-scraping regulation won't end all scraping. But it will greatly reduce it and limit it to rogue actors. Two examples I can cite here are the laws against email spams and laws against unsolicited marketing calls - they had a definite impact in reducing both (even in India, from where I am, where implementation of laws are often lax).
JoshTriplett 22 minutes ago [-]
Exactly. Bot activity is a problem of volume, not all-or-nothing. Solving 95% of it would be a win.
Gander5739 43 minutes ago [-]
You don't need a non-rooted phone to pass captcha checks, I have a rooted phone and can pass the captchas that ask you to scan a qr code. But I doubt phones without google services would manage.
cr125rider 57 minutes ago [-]
And identifying a bot that is acting on my behalf. Claude go search this topic is basically the same as Googling something and clicking on the results. Human driven AI searching needs to be in a different box than AI scraping for training data.
Which sounds extremely difficult to differentiate
JoshTriplett 31 minutes ago [-]
Hopefully it stays that way; "a bot acting on my behalf" is still a bot. At least it's often a well-behaved bot and uses a user-agent that can be detected and blocked.
spacedoutman 49 minutes ago [-]
Private invite only internets
csomar 54 minutes ago [-]
They are not a problem unless you "believe" it is a problem. I estimate around 20-25K hits to my website from bots per day and I have all cloudflare protections disabled. Any decently optimized server should be able to easily handle that. (it's roughly 1 request every 3 seconds).
specialp 41 minutes ago [-]
Yes and that is just the bot background radiation of the internet. I run a primary source of information site and these botnets are aggressive to a DDOS level. All to do some sort of scraping. Because they have sophisticated enough tactics to DDOS us if they wanted to. However I am not sure their objective as they have wasted enough of our resources to have scraped all our content 1000s of times over. That 25k traffic is a couple of minutes for us. And that adds up. 80-90pct of our traffic is this
thisislife2 38 minutes ago [-]
True. But it still wastes your server resources, right? And it's sad that you have to accept that as part of the "cost" of hosting a site ...
ndriscoll 17 minutes ago [-]
What resources are you concerned about? An n100 minipc should be capable of serving something like a blog at 20k+ requests/second (or saturating its network).
doctorpangloss 55 minutes ago [-]
web environment integrity
malka1986 1 hours ago [-]
> keeping out bot
You can forget about it. It is not possible. Simple as that.
Wowfunhappy 1 hours ago [-]
Let's say I'm selling concert tickets. How do I prevent bots from buying up all the tickets and scalping them?
ndriscoll 11 minutes ago [-]
Sell them via a Dutch auction. Eliminate the arbitrage opportunity for scalpers and make more money in the process.
MyMemoryfails 51 minutes ago [-]
I'd simply check filling speed, even with browser's autocomplete humans are slow due needing click submit.
Then when it's "processing", do them in bulk and prioritize slower users. There's huge opportunity do bot checks after checkout without affecting user experience.
Also on product launches you could add unique field which requires user to input, for example that way bots can't prepare for launches.
fragmede 23 minutes ago [-]
huh. no wonder my password manager's auto submit triggers bot detection (it's a fairly popular one).
luckylion 59 minutes ago [-]
Tie them to the buyer's identity, offer at-value buy-backs until X weeks before event, disallow resale.
nulledy 1 hours ago [-]
As turnstile users on several of our sites, I think we need to revisit that decision.
sammy2255 1 hours ago [-]
Out of curiosity, why did you have it on in the first place?
Fokamul 1 hours ago [-]
Please, anyone from EU (US is doomed rofl) create a petition to ban browser-fingerprinting in EU, across all existing browsers.
I'm not good at creating petitions but can happily sign it.
Also with stop killing games and anti-chat control.
I can imagine this can get a traction, if it's explained in youtube video to "normal" people.
fidotron 1 hours ago [-]
A better solution would be to make webgl, webgpu and (especially) webrtc have some sort of prompt before they can be in any way used in that fashion, but this will absolutely destroy web ux Windows Vista style.
JoshTriplett 24 minutes ago [-]
And then the gatekeepers like Cloudflare will say "please hit accept in order to verify your browser and access this site".
richwater 54 minutes ago [-]
You mean the "Accept Cookies" banner that has become a complete joke? Pass
bflesch 5 minutes ago [-]
It's about explicitly deciding to allow certain capabilities on a per-website basis. No major browser allows defense-in-depth via fine-grained website permissions.
Even simply changing the user agent was sabotaged at Firefox, and choosing one user agent per domain is wishful thinking.
MyMemoryfails 39 minutes ago [-]
I think he means browser permissions, for example when browsers want notify or record your mic theres a permission check something similar for webgl.
J-Kuhn 20 minutes ago [-]
Fun Fact: When Cookies were introduced into Netscape, you got a browser permission prompt. Then browser vendors set it to allow by default.
And then legislation required those consent boxes back, so everyone built their own, instead of demanding that the default should be changed back.
koolala 1 hours ago [-]
a. Accept All
b. Accept Only Necessary Fingerprinting
shevy-java 20 minutes ago [-]
I wondered about that too. So they allege that bots
require that everyone now has to ID to the big service
providers. Very dystopian situation. Skynet is currently
winning the war.
348752389 1 hours ago [-]
[dead]
kykat 1 hours ago [-]
What? Big tech company is evil? No way! I thought cloudflare were good guys...
aleksandrm 1 hours ago [-]
What gave you the impression that Cloudflare were the good guys?
tardedmeme 56 minutes ago [-]
Probably everyone on HN singing their praises for the past 10 years.
kykat 38 minutes ago [-]
And my og comment getting downvoted on this very intellectual forum that definitely isn't an echo chamber
Petersipoi 16 minutes ago [-]
Your very sarcastic, uninteresting comment getting downvoted is not an indication that forum isn't intellectual. It's an indication that you aren't behaving intellectually.
bflesch 4 minutes ago [-]
Cognitive dissonance in tech millionaires is quite strong, still worth it to trigger them from time to time on a factual basis.
aboardRat4 1 hours ago [-]
Big tech companies are always visited first by the G-men who need something done.
I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection", but unless you do PoW (which is also ecologically a nightmare), probably fingerprinting is the way to go - completely destroying the privacy of everyone involved.
Cromite, a privacy conscious fork of Chromium for Android, has constantly issues with CloudFlare Turnstile [2] because they (Cloudflare) try to fingerprint it in multiple ways in order to pass the challenge. The only way to get it to work would be to join the CloudFlare Browser Developer program - which requires signing an NDA. Rightfully so, the project maintainer didn't want to do it.
If you want to see the extent of what CloudFlare does to fingerprint the browsers, just have a look in the issue [2] and see which flags need to be disabled in order to allow CloudFlare to pass the challenge.
I understand both sides, but at least CloudFlare could be flexible enough to fall back to PoW instead of just blocking people from sending forms or accessing websites...
[1]: https://github.com/Danny-Dasilva/CycleTLS
[2]: https://github.com/uazo/cromite/issues/2365
(That said, I still keep separate machines. One for doing "official" things, the other for everything else)
does it? same binary, same machine, same display, same 781 other heuristics.
Yeah, this needs to be burned to the ground.
https://litter.catbox.moe/gaizpk692bhhs6b7.png
https://github.com/kkapsner/CanvasBlocker
I'll make sure to fail all cloudflare turnshit in the future.
But in principle I agree that there's no good answer to this, scraping _is_ useful and I bet most of us here had scraped something, it is AI company and their use of human's material for training without consent and return that led us to this (I know botting exists in forum since forum is a thing but it is easily solved by human moderators and keyword filter)
Some sort of decentralized trust web seems like another option, though less viable.
Also by default addons.mozilla.org is a privileged site so of course they include google tracking in it and they get the proper fingerprint no matter what you have configured.
>Turns out it's because Cloudflare wants to have a fingerprint of your device via WebGL, the only reason for doing this would be tracking.
> So Cloudflare just banned all WebKitGTK browsers as I guess they put an exception for Safari.
This is false. I ran firefox with:
* hardware acceleration disabled (so software renderer, nothing to fingerprint)
* resistfingerprinting enabled, including letterboxing with default window size
* webgl disabled
* VPN enabled
* In a Windows VM
By all accounts this should be the most suspicious fingerprint ever, but turnstile happily lets me through. If they want to track people, they're doing a pretty bad job. My guess is that OP's browser is getting banned because his WebKitGTK has a weird fingerprint, not because of webgl or whatever.
> Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
So why is Cloudflare saying the author got blocked because of WebGL?
> > Such things are blocked in WebKit, and have been for years. Meaning it's tracking so awful that even Apple would block it, and as far as I can tell it's not the kind of privacy protection you can easily disable in it.
> This is also false. Webgl fingerprinting works just fine on Safari. They might try to mitigate it by adding some noise, but that's not so different than what firefox does, and is certainly not "blocked".
While I don't have an iDevice to try, the assumption that they are special cased is fair... because they are: https://blog.cloudflare.com/eliminating-captchas-on-iphones-...
(Yes, this is basically WEI in a shinier package.)
No idea. I can't even reproduce the error OP got with webgl disabled.
https://litter.catbox.moe/y42l22k97tgv96nx.png
Obviously this is terrible, but I think there's a possibility it's the least terrible option? Another option is IP reputation, which I think is worse. Or scanning a code with a non-rooted phone, which I think is even worse than that!
There isn't one, and pretending otherwise is nonsense because humans will always provide their credentials to something to act on their behalf.
In the limit you end up with Chinese phone farms.
Cloudflare, Google Captcha, HCaptcha etc. are all shitty technical solutions because, as we are all discovering, it comes at the cost of our privacy (i.e. our personal data may monetise these services) and / or our computing resource and time. If current copyright laws aren't sufficient to prevent this, we have to acknowledge the system is broken. The answer could be enhancing it with some kind of Digital Millennium Copyright Act (DMCA) -like laws, but in favour of the creators against BigTech or rogue actors.
- Web-scraping and copyright law - https://www.neudata.co/blog/web-scraping-and-copyright-law
- Why DMCA Claims Against Web Scrapers Face Long Odds - https://capstonedc.com/insights/why-dmca-claims-against-web-...
As for issues like bots overloading websites or using too many resources scaling laws will take care of it quickly, it’s not like you can’t serve thousands of RPS from a Raspberry Pi these days.
> we have to acknowledge the system is broken
The system is broken. It probably takes, what, 10 seconds or less to use a residential or foreign proxy, 6+ months to internationally track and prosecute a single offender? So like a million times more effort going the regulatory route.
Which sounds extremely difficult to differentiate
You can forget about it. It is not possible. Simple as that.
Then when it's "processing", do them in bulk and prioritize slower users. There's huge opportunity do bot checks after checkout without affecting user experience.
Also on product launches you could add unique field which requires user to input, for example that way bots can't prepare for launches.
I'm not good at creating petitions but can happily sign it. Also with stop killing games and anti-chat control.
I can imagine this can get a traction, if it's explained in youtube video to "normal" people.
Even simply changing the user agent was sabotaged at Firefox, and choosing one user agent per domain is wishful thinking.
And then legislation required those consent boxes back, so everyone built their own, instead of demanding that the default should be changed back.
b. Accept Only Necessary Fingerprinting