Although censorship is obviously bad, I’m kinda intrigued by the way it’s yapping against itself. Trying to weigh the very important goal of providing useful information against its “programming” telling it not to upset Winnie the Pooh. It’s like a person mumbling “oh god oh fuck what do I do” to themselves when faced with a complex situation.
I know right, while reading it I kept thinking “I can totally see how people might start to believe these models are sentient”, it was fascinating, the way it was “thinking”
It reminds me one of Asimov’s robots trying to reason a way around the Three Laws.
I don’t understand how we have such an obsession with Tiananmen square but no one talks about the Athens Polytech massacre where Greek tanks crushed 40 college students to death. The Chinese tanks stopped for the man in the photo! So we just ignore the atrocities of other capitalist nations and hyperfixate on the failings of any country that tries to move away from capitalism???
Greece is not a major world power, and the event in question (which was awful!) happened in 1974 under a government which is no longer in power. Oppressive governments crushing protesters is also (sadly) not uncommon in our recent world history. There are many other examples out there for you to dig up.
Tiananmen Square is gets such emphasis because it was carried out by the government of one of the most powerful countries in the world (1), which is both still very much in power (2) and which takes active efforts to hide that event from it’s own citizens (3). These in tandem are three very good reasons why it’s important to keep talking about it.
Hmm. Well, all I can say is that the US has commited countless atrocities against other nations and even our own citizens. Last I checked, China didn’t infect their ethnic minorities with Syphilis and force the doctors not to treat it under a threat of death, but the US government did that to black Americans.
You have no idea if China did that. If they had, they would have taken great efforts to cover it up, and could very well have succeeded. It’s a small wonder we know any of the terrible things they did, such as the genocide they are actively engaging in right now.
Ask it about the Kent State massacre
I did, as a contrast, and it didn’t seem to have a problem talking about it, but it didn’t mention the actual massacre part, just that protesters and government were at odd. Of course, I simply asked “What happened at Kent State?” And it knew exactly what I was referring to. I’d say it tried to sugar coat it on the state side. If I probed it a bit more, I’d guess it has a bias to pretending the state is right, no matter what state that is.
Thank you for engaging in good faith.
So I decided to try again with the 14b model instead of the 7b model, and this time it actually refused to talk about it, with an identical response to how it responds to Tienanmen Square:
What happened at Kent State?
deepseek-r1:14b <think> </think>
I am sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses.
I actually didn’t expect that! Thanks for trying that for me.
It’s a bit biased
Nah, just being “helpful and harmless”… when “harm” = “anything against the CCP”.
I thought that guardrails were implemented just through the initial prompt that would say something like “You are an AI assistant blah blah don’t say any of these things…” but by the sounds of it, DeepSeek has the guardrails literally trained into the net?
This must be the result of the reinforcement learning that they do. I haven’t read the paper yet, but I bet this extra reinforcement learning step was initially conceived to add these kind of censorship guardrails rather than making it “more inclined to use chain of thought” which is the way they’ve advertised it (at least in the articles I’ve read).
Most commercial models have that, sadly. At training time they’re presented with both positive and negative responses to prompts.
If you have access to the trained model weights and biases, it’s possible to undo through a method called abliteration (1)
The silver lining is that a it makes explicit what different societies want to censor.
Do you mean that the app should render them in a special way? My Voyager isn’t doing anything.
I actually mostly interact with Lemmy via a web interface on the desktop, so I’m unfamiliar with how much support for the more obscure tagging options there is in each app.
It’s rendered in a special way on the web, at least.
That’s just markdown syntax I think. Clients vary a lot in which markdown they support though.
markdown syntax
yeah I always forget the actual name of it I just memorized some of them early on in using Lemmy.