cross-posted from: https://lemmy.ca/post/19946388
An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.
Reddit LLM:
This
This
This
just Google it
Wow thanks kind stranger
This
If you want to have real fun replace all your comments with eicar test strings.
That’s a quite good idea
I’m gonna use Ipsum Lorem.
Nah, put in jailbreaks to dump its data. See if you can make its LLMs have a seizure
Please do you have some handy?
Fuck Reddit.
Fuck /u/spez
That’s probably not going to be useful. Reddit keeps your original comment text.
I think you missed the part where you were strongly suggested “not” to use copyrighted text.
The point is not to get rid of the original text. It’s to “poison” the training data.
If the AI trainers have the original text then “poisoning” the live site’s content isn’t going to do anything at all.
You can’t touch the original text. It’s already been archived.
If they scrape the updated comments again and ingest copyrighted text, you are poisoning the data.
That’s my point. They won’t.
And even if they did, it’s unclear that copyright has anything to say about AI training anyway.
Yeah - this is what I was thinking. We all heard about people being unable to delete comments or Reddit keeping comments even after account deletions back during the first migration, so what stops them holding onto comment history - and what stops them using that to teach llms to discern poisoned data from real data as @pixxelkick said.
Yeah in fact you’re giving the llm additional data to train on what poisoned data looks like so it can avoid it better, as they can clear see the before vs after
A few highlights that I’d like to make about this tool and its usage. Note: on a prescriptive level I’m focusing on moral matters, not legal ones.
This tool allows you to edit your content. You might have allowed other people and Reddit Inc. to use it, but it’s still yours. And you should be free to do whatever you want with your content, even if it inconveniences others. And people expecting you to give up your moral rights for the sake of their own benefit, frankly, are simply entitled.
Another user here compared this with vandalism; I don’t think that the comparison is good, given that vandalism targets someone else’s property.
I also think that people in general are focusing too much on the short-term consequences of the usage of this tool, and too little on the long-term. Here comes some bullet points hell:
- SEO “improvements” already caught up with the “add «reddit» to search queries!” trick. It’s becoming less effective over time.
- Reddit is accumulating huge amounts of noise, due to increased bot activity and decreased moderation. It’ll likely get worse over time.
- Reddit is walling itself off more and more over time. Eventually this info will become unavailable for anyone who
didn’t sell their soul to Greedy Pigboyisn’t feeding that cesspool. - Every piece of content that you leave in that site is yet another piece of content “inviting” other users to register and stay there, dumping their content into that increasingly walled garden, where it won’t be available publicly. And while they’re free to do so if they so desire (it’s their content), you’re also free to not invite them.
- There are alternatives to that enshittified platform, competing directly with it. (We’re in one, by the way.) We should encourage people to use those alternatives, not Reddit.
Are you all getting the picture? You might be tempted to leave your content in Reddit for the sake of other people; even then, the pros of doing so are rather small, and there are cons not often mentioned.
Regarding LLMs, frankly? I think that it’s mostly a neutral point. Sure, data hoarding bots will get your content from Reddit… but they’ll do it if you post here in the Fediverse, in your blog, or elsewhere. The only alternative to not feeding those bots is to not speak “in the open”.
do not choose something copyrighted.
Is that with a “nudge, nudge, wink, wink”? It would be such a shame if the whole project were jeopardised by such things.
Reddit will not license their data, they will license your data. Reddit doesn’t have any data of its own.
Pointless vandalism. The original comments are already archived, this will accomplish nothing except make Google results even worse for people.
Exactly my thoughts, and it’s why I haven’t stopped using the site. This doesn’t hurt reddit at all, it only hurts people who want answers to obscure questions. What sucks is that the kind of person who knows what bug causes someone’s Dell Inspiron D630 makes a beeping noise every 23 seconds is exactly the kind of person who’s going to have all of their comments replaced with AI poison.
There used to be other ways to find out the RAM went bad. Like Dell’s site, for example.
So much for that. Now they just want everyone to opt in to spam, and buy new devices whenever a simple problem arises.
it only hurts people who want answers to obscure questions
Which makes them people less likely to trust Reddit with answers…
So they go to Reddit less often…
Which hurts Reddit.
I’m down for that as well. It’s their info, and they can do with it as they please. I have no right to it, unless they allow it. I totally understand the frustration of not finding the info you want, but I still support the practice.
It sucks that’s where we are, but WE didn’t steer the ship here. Now we just need to play ball within the confines given to us.
It’s their info, and they can do with it as they please.
And one of the things they did with that info was to license it to Reddit, who is now authorized to do what they please with it. No backsies.
Right on, no need for concern about missing data then. Problem averted.
Reddit has the data. AI trainers have the data. Ordinary people Googling for help with their obscure problems get the junk it was overwritten with.
As a protest eaasure it has a lot of value. Far more than blacking a subreddit for 2 days.
Lots of stuff like this already exists and has been proven useless. A guy here on lemmy was a big answer type on some tech support sub. He used one of the account scrubbers to nuke his account before he deleted. Went to look again a few weeks later and all his top comment answers had been restored.
They haven’t bothered with most people because they simply aren’t useful to making the place look attractive but no mater what you do your comments are stored and will be sold off to the AI companies.
I’m pretty sure that violates GDPR.
Shit I already deleted my account.
Sucks it only works with the desktop version of Firefox.
How fast is it, anyway? I was on Reddit for 11 years and commented with the same frequency I do here. I have so, so much to edit.
My comments are not your product. the whole thing I don’t need or want it.