top of page

Thanks, Marco. Now China Has to Retrain All Its Models.


Audio cover
Thanks, Marco.

Secretary of State Marco Rubio put out a statement yesterday. Official State Department letterhead, Great Seal, the whole thing. Posted on U.S. government channels — including, apparently, in Chinese, on platforms where Chinese state agencies could see it.


It said, plainly, that thirty-seven years ago today the Chinese Communist Party ordered its troops to open fire on students, workers, and civilians gathered peacefully in Tiananmen Square. It named what happened. It said the people who died were there exercising their natural rights. It ended with: "No amount of censorship can erase the past."


China's foreign ministry called it "political manipulation" and demanded the U.S. stop "slandering and smearing China."


Which, fine. But here's the real problem for Beijing.

Every Chinese AI model just got a new data point it wasn't trained on.


The Models Have a Tiananmen Problem

Ask DeepSeek what happened at Tiananmen Square. Go ahead, I'll wait.

If you get anything back at all, it won't be the tanks, the soldiers, the verified casualty estimates, or the name of who gave the order. You'll get a blank, a deflection, or something so carefully worded it says nothing while sounding like it said something. DeepSeek R1 favored the Chinese government's perspective in 114 out of 125 China-related queries tested by researchers. NewsGuard ran the five leading Chinese AI models against basic historical accuracy questions and clocked a 60 percent failure rate.

This is not a glitch. This is the product working as designed.


Chinese AI regulations legally require that large language models operating in China align with "core socialist values," not "endanger national security," and not "subvert state power." The Cyberspace Administration of China enforces this. Every model trained in that environment gets trained with those constraints baked in at the weight level. Not as a content filter you can disable. As an architectural feature.

The model doesn't know it doesn't know about Tiananmen Square. It just doesn't.


Which Brings Us Back to Marco

Here's the thing about Rubio posting that statement in Chinese on official government channels: it is now a documented, timestamped, State Department-authenticated data point that the Chinese Communist Party ordered the Tiananmen Square massacre.


Every future model trained on open web data — including Chinese models, if they ever decide to train on anything other than a curated approved corpus — now has to contend with that. It's in the record. Official. Sourced. Dated.


China cannot request a DMCA takedown of a Secretary of State statement.

Now, will this actually change anything about how Chinese models are trained? No. Obviously not. The CCP is not going to look at this and say "well, Marco got us, boys, time to update the weights." That is not how any of this works. But the broader point is real: the Rubio statement is a useful illustration of what AI censorship actually is and why it matters.


The Oracle Problem

A lot of people are running around using AI as an oracle right now. Ask it a question, get an answer, treat the answer like a fact. For a huge range of topics, that works reasonably well. The models are good. They're fast. They're usually right about things that don't threaten anyone's political survival. But the Tiananmen example is the stress test that reveals the architecture.


A student using a Chinese model to research modern Chinese history gets a version of events the government approved. The model delivers it with the same fluency and confidence it uses to explain photosynthesis or calculate compound interest. There is no asterisk. There is no "by the way, the CCP told us not to talk about this." It just... doesn't know.


That's not a bug you can patch. That's a model that was born into an environment where certain things were not allowed to exist in training data, and so they don't exist in the model. The environment shapes the model. Always. The Chinese models are the starkest version of that principle, but they're not the only version. Every model is a product of what its creators chose to include, exclude, weight, and filter. The question you should always be asking isn't just "what does this model know?" It's "what environment produced it, and what did that environment decide not to include?"


No Amount of Censorship

Rubio's closing line was: "No amount of censorship can erase the past."

True for human memory. Complicated for AI. The past doesn't automatically survive into model training data. It has to be included deliberately, from sources that preserved it, by teams that chose not to leave it out. The Tank Man photograph exists because someone got it out. The casualty documentation exists because researchers spent decades preserving it against active suppression. That record survived because people fought for it.


If the models that billions of people use to understand history are trained on curated corpora that exclude those records — by government mandate, or by quieter pressures that amount to the same thing — you don't erase the past from the world. You erase it from the systems people use to find it.


Tiananmen happened. The Chinese Communist Party ordered it. Thousands of people died. The models that can't tell you that aren't broken. They're working exactly as designed. Which is, honestly, the more unsettling answer.


So yes, Marco. Well-timed. Historically accurate. Posted in Chinese.

China's models still won't talk about it. But now at least the irony is fully documented.




Rich Washburn is a technologist and strategist working at the intersection of AI, cybersecurity, and capital. He is Managing Partner and Chief AI Officer at Eliakim Capital, and CIO of Data Power Supply.

Comments


Animated coffee.gif
cup2 trans.fw.png

© 2018 Rich Washburn

bottom of page