Off the radar  /  №03  /  07.02.26

The paradox runs local.

1865: Britain fears running out of coal, and the obvious fix is efficiency. Jevons looks at the logic and calls the opposite. This week NVIDIA split a model in two and the same loop started spinning in silicon.

  1. Britain is afraid of running out of coal. The obvious fix: better engines; burn less; stretch the reserve. William Stanley Jevons looks at the logic and calls the opposite. Efficiency makes energy cheaper; cheaper energy invites use. Better engines won't save the coal; they'll eat it faster. He was right. History filed the receipt.

Same loop, spinning up again. This week, in silicon.

The Jevons Paradox: more efficient engines lead to cheaper energy lead to more consumption, looping. A Victorian steam engine engraving under gold circles.

They split the model

NVIDIA shipped Nemotron TwoTower yesterday. Two networks, one backbone. The first tower frozen, autoregressive; holds the context, everything already committed. The second a diffusion denoiser; refines whole blocks in parallel, commits multiple tokens per step. 2.42x throughput; 98.7% of baseline quality. And the denoiser trained on ~2.1T tokens; roughly 8% of the 25T that built the backbone. You don't start over; you bolt parallel generation onto what you already own. The weights are open, the paper's on arXiv, and NVIDIA's framing writes its own Jevons epigraph: it turns the economics from a cliff to a ramp.

Three weeks earlier: DiffusionGemma. Google's model, NVIDIA's optimization. 256 tokens denoised per step; up to 4x faster; 1,000 tokens/sec on a single H100, 150 on a DGX Spark sitting on a desk. No cloud. No meter.

Jevons walks back in

The argument against local inference was always efficiency itself. Your GPU idles most of the day; the datacenter's never does; let the cloud amortize the silicon. Every one of these releases weakens that argument. A thousand tokens a second on local hardware; good enough locally arriving years ahead of schedule.

And cheap inference does what cheap energy did. We don't use less; we use absurdly more. Agent loops that run all night because nothing is metered. Drafts discarded a hundred times before anyone sees one. Ideas too expensive at cloud prices become weekend projects at local prices. The efficiency gain doesn't shrink the appetite; it feeds it. Jevons, verbatim.

The pattern, named

Optimize a resource and you expand the demand for it. But note what gets consumed. Coal's loop drained a finite reserve. This loop burns electricity and produces experiments. Every halving of cost-per-token widens the circle of who gets to tinker; labs, then startups, then one person with a gaming card and a stubborn idea. 1865 depleted something. 2026 might democratize something.

The engine got more efficient. Demand is about to do what demand does.

Jevons would recognize every beat; except, maybe, the ending.

Sources

  1. Tech Times, 07.02.26 — Nemotron TwoTower: 2.42x throughput without retraining
  2. arXiv 2606.26493 — Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context
  3. Hugging Face — nvidia/Nemotron-TwoTower collection (open weights)
  4. NVIDIA RTX AI Garage, 06.10.26 — DiffusionGemma optimized for RTX / DGX Spark

Benchmark caveat: parallel block denoising trades a few points on strictly sequential tasks; HumanEval 79.27 → 75.58 at the default operating point. Reported transparently in the paper.