Google, OpenAI Warn Rivals Are Cloning LLM Reasoning

What happened:

Google and OpenAI say quiet copycats are strip-mining frontier models. Google’s threat team logged a campaign that fired more than 100,000 prompts at Gemini to mimic its reasoning in non-English tasks across a wide set of problems. Google says it caught this in real time and protected internal reasoning traces. See the report on CSO Online.

What this is:

These are distillation attacks. In plain English: repeatedly query a public model, capture its answers (and any exposed “thinking”), then train a cheaper clone on that input/output data. OpenAI says some operators, including Chinese groups, have moved beyond simple chain-of-thought grabs to multi-stage operations - think chain-of-thought (CoT) extraction, synthetic-data generation, and large-scale cleaning - to speed up and hide cloning. See more at The Register.

Distillation attack (simple): hit a model a lot, collect outputs, train your own model on those outputs.
Distillation attack (advanced): extract intermediate reasoning traces, generate synthetic prompts and answers, clean the data, then train at scale.

Who’s named:

OpenAI told the House China committee it observed accounts linked to employees of DeepSeek - a Hangzhou AI firm backed by hedge fund High-Flyer - attempting to bypass access controls and programmatically harvest outputs for distillation. DeepSeek and parent High-Flyer did not immediately comment to Reuters. Google did not name companies; it says many probes come from private-sector outfits worldwide. Read the coverage on Yahoo Finance UK and background on Wikipedia: DeepSeek.

Why it matters:

This isn’t just intellectual property theft. Google also flagged nation-state experimentation: China-backed APT31 (an advanced persistent threat group) allegedly used Gemini to auto-analyze vulnerabilities and plan U.S. cyberattacks. No successful breaches were cited, but the playbook is out there. OpenAI frames illicit distillation as a threat to “American-led, democratic AI,” and notes occasional Russian activity too. More context at The Register.

Receipts and reality check:

“Your model is really valuable IP... if you can distill the logic, you can replicate it,” says John Hultquist, chief analyst at Google Threat Intelligence Group (GTIG). Google says its Terms of Service prohibit distilling Gemini; it can disable accounts or sue. Source: The Register.
DeepSeek 101: Hangzhou-based, founded 2023, backed by High-Flyer; its V3 and R1 models triggered U.S. scrutiny and warnings on Capitol Hill. See background on Wikipedia: DeepSeek.

Founder playbook - pragmatic, not perfect:

Rate limit and watch for weird query bursts; pattern-match scripted prompt loops. This slows attackers but does not stop a determined effort.
Hide the good stuff: do not expose chain-of-thought in public endpoints. Keep reasoning traces private by default.
Watermark outputs or attach provenance metadata. This helps detect large-scale scraping but is not foolproof.
Lock access: per-customer keys, behavioral caps, and legal teeth in Terms of Service. Expect a whack-a-mole on enforcement. More tips at CSO Online.

Bottom line:

This is corporate espionage with prompts. If your moat is “we have a smarter model,” assume someone is already reverse-teaching theirs on your answers.

Google, OpenAI Flag Rivals Cloning LLM Reasoning

Enjoyed this article?