A tiny mouse, a hacker.

See here for an introduction, and my link tree for socials.

  • 0 Posts
  • 11 Comments
Joined 1 year ago
cake
Cake day: December 24th, 2023

help-circle


  • If any of those end up interacting with me, or I otherwise see them on my timeline, they’ll get treated appropriately: reported, blocked, or in extreme cases, served garbage interactions to. Serving garbage to 500+ bots is laughably easy. Every day I have over 5 million requests from various AI scrapers, from thousands of unique IP addresses, and I serve them garbage. It doesn’t make a blip on my tiny VPS: in just the past 24 hours, I served 5.2M requests from AI scrapers, from ~2100 unique IP addresses, using 60Mb memory and a mere 2.5 hours of CPU time. I can do that on a potato.

    But first: they have to interact with me. As I am on a single-user instance, chances are, by the time any bot would get to try and spam me, a bigger server already had them reported and blocked (and I periodically review blocks from larger instances I trust, so there’s a good chance I’d block most bots before they have a chance of interacting with me).

    This is not a fight bots can win.


  • Personally, I do not have any automatism to detect LLMs larping as people. But I do review accounts that follow or interact with mine, and if I find any that are bots, I’ll enact counter measures. That may involve reporting them to their server admin (most instances don’t take kindly to such bots), blocking their entire instance, or in extreme cases, start serving them garbage interactions.





  • It’s about 5 times longer than previous releases were maintained for, and is an experiment. If there’s a need for a longer term support branch, there will be one. It’s pointless to start maintaining an 5+ year branch with 0 users and a handful of volunteers, none of whom are paid for doing the maintenance.

    So yes, in that context, 15 months is long.




  • I found that no general purpose search engine will ever serve my needs. Their goal is to index the entire internet (or a very large subset of it), and sadly, a very large part of the internet is garbage I have no desire to see. So I simply stopped using search engines. I have a carefully curated, topical list of links from where I can look up information from, RSS feeds, and those pretty much cover all what I used search for.

    Lately, I have been experimenting with YaCy, and fed it my list of links to index. Effectively, I now have a personal search engine. If I come across anything interesting via my RSS feeds, or via the Fediverse, I plug it into YaCy, and now its part of my search library. There’s no junk, no ads, no AI, no spam, and the search result quality is stellar. The downside is, of course, that I have to self-host YaCy, and maintain a good quality index. It takes a lot of effort to start, but once there’s a good index, it works great. So far, I found the effort/benefit ratio to be very much worth it.

    I still have a SearxNG instance (which also searches my YaCy instance too, with higher weight than other sources) to fall back to if I need to, but I didn’t need to do that in the past two months, and only two times in the past six.