Right I must have just blanket banned the whole /8 to be sure alibaba cloud is included. Did some time ago so I forgot
- 0 Posts
- 3 Comments
Joined 3 years ago
Cake day: November 2nd, 2022
You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.
blob42@lemmy.mlto Selfhosted@lemmy.world•Anubis is awesome! Stopping (AI)crawlbotsEnglish10·2 days agoI am planning to try it out, but for caddy users I came up with a solution that works after being bombarded by AI crawlers for weeks.
It is a custom caddy CEL expression filter coupled with caddy-ratelimit and caddy-defender.
Now here’s the fun part, the defender plugin can produce garbage as response so when a matching AI crawler fits it will poison their training dataset.
Originally I only relied on the rate limiter and noticed that AI bots kept trying whenever the limit was reset. Once I introduced data poisoning they all stopped :)
git.blob42.xyz { @bot <<CEL header({'Accept-Language': 'zh-CN'}) || header_regexp('User-Agent', '(?i:(.*bot.*|.*crawler.*|.*meta.*|.*google.*|.*microsoft.*|.*spider.*))') CEL abort @bot defender garbage { ranges aws azurepubliccloud deepseek gcloud githubcopilot openai 47.0.0.0/8 } rate_limit { zone dynamic_botstop { match { method GET # to use with defender #header X-RateLimit-Apply true #not header LetMeThrough 1 } key {remote_ip} events 1500 window 30s #events 10 #window 1m } } reverse_proxy upstream.server:4242 handle_errors 429 { respond "429: Rate limit exceeded." } }
If I am not mistaken the 47.0.0.0/8 ip block is for Alibaba cloud
We need a decentralized community owned cloudflare alternative. Anubis looks on good track.