Hamiltonian correction algorithm

EK-TORUS / simulation-correction technology — primer, call analysis, skepticism test, and recommended diligence plan

Update:

What the new material clarifies

The engineer is not pitching random magic anymore. The slides make it much clearer that he is claiming:

a correction layer for simulation/integration workloads
aimed at symplectic / structure-preserving integrators
focused on reducing long-horizon drift / error
delivered as a middleware or API layer
with claimed gains across multiple domains:
- celestial mechanics
- Neural ODE
- Plasma PIC
- Lattice QCD / HMC
- molecular dynamics
- HMC Bayesian
- HNN inference
- SympNet
- SGLD

That basic idea is not crazy on its face. Rich’s “Jack Wisdom” comment was exactly the right instinct. There is already established prior art around symplectic correctors and related high-order kernel methods in Hamiltonian systems, and real tools like REBOUND/WHFast already implement those families of methods. Also, in known corrector approaches, some correction work can be applied only at synchronization/output boundaries, which is one reason “better accuracy with low overhead” is at least plausible in principle. (rebound.readthedocs.io)

So the idea is plausible.

What is not yet proven is whether his version is:

actually novel,
actually as strong as claimed,
actually general across all those domains,
and actually deployable in real environments.

Best reading of the bad transcript

The transcript is messy, but the core signal is pretty clear.

The call had three parts. First, a lot of rapport-building and “shared geek culture” talk. That part matters socially, but it is not technical evidence.

Second, his actual background pitch:

mechanical engineering background
some experience adjacent to ISS/space systems
product, hardware, and AI work
then he says he started exploring math around time/calendars/structure and tested ideas against 3-body / simulation problems
that led to what he calls a correction algorithm

Third, the real business pitch:

he wants you to test it through an API / docker-style environment
he says it improves accuracy without forcing longer runs
he frames the savings in data-center, space, and HPC terms
Rich correctly introduces the prior-art / peer-review / prove-it gate
everyone else starts reacting to the upside if true

That is the correct summary of the call.

What looks better now than before

The slides do improve his credibility in a few ways.

He now has:

a named product concept: EK-TORUS
a specific problem framing
domain-by-domain claimed results
experiment counts
an operating model
a basic API concept
an ROI framework

That moves him from “guy with a vague idea” to “guy with a prototype story and a sales deck.”

That is progress.

What still does not pass diligence

This is where I’d be careful.

1. “Improvement” is still undefined

The biggest problem in the whole deck is that the main column says:

14–457x
26–83x
1.3–51x
etc.

But improvement in what, exactly?

Could be:

lower state error at equal runtime
fewer steps at equal error
lower energy drift
better acceptance rate
lower GPU-hours
lower wall-clock time

Those are all very different claims.

Until he defines the metric for every row, those numbers are not decision-grade.

2. The ROI pages are mostly algebra, not proof

Example:

On the ROI page he uses 45x for celestial mechanics and shows annual GPU cost dropping from $25,800 to about $573.

That math checks out because:

25,800 / 45 ≈ 573

So that page is not independent validation. It is just taking the claimed improvement multiplier and dividing the baseline cost by it.

That means the ROI page is a scenario model, not proof of savings.

Same issue with the Tier A slide:

$18,000 / 6.2 ≈ $2,903
$32,000 / 50 = $640

So those savings slides are internally consistent, but they are derived from his assumptions, not evidence.

3. “3 independent replications” is weaker than it sounds

The deck says:

Python (primary)
Julia (independent)
Rust (independent)

That is useful, but that is not the same thing as:

3 outside labs
3 academic replications
3 customer replications

That sounds like three implementations, not three external validations.

That is still helpful, but it is much weaker than the phrase suggests.

4. The API story is not fully believable as written

He says:

one API call
sub-millisecond response
no infrastructure changes

That combination is too neat.

If this is a real remote API over a network, sub-millisecond end-to-end is not realistic for most actual production settings. If it is local, same-host, or same-VPC, that is different.

So you need him to answer:

Is this truly remote API latency, or just internal correction compute time?

That is an important distinction.

5. Cross-domain breadth may be overstated

The celestial mechanics row has 2,604 experiments.
Some other rows have 45 or 47+.

That tells me:

celestial mechanics may be the real home base
several other domains may still be early or lightly tested

So I would not treat this as “proven across 9 domains.”

I would treat it as:

possibly strong in one or two domains, exploratory in several others.

6. Energy drift alone is not enough

This is another reason Rich’s skepticism was healthy.

In this area, it is a known trap to judge accuracy mainly from energy-type metrics. A method can look dramatically better on energy error while not improving the physically important error you actually care about over long times. (OUP Academic)

So if he mostly demonstrates:

drift reduction
cleaner invariants
prettier conservation plots

that is not enough.

He has to show:

state accuracy
trajectory accuracy
task accuracy
or workload-level outcome improvement

depending on the domain.

What I think is really going on

My best guess is this:

He probably has something real, but it is likely one of these three things:

Most likely

A useful correction / processing / tuning layer for a subset of Hamiltonian or near-Hamiltonian workloads.

Next most likely

A strong implementation and packaging of ideas that are adjacent to known symplectic-corrector literature, with practical advantages but limited novelty.

Less likely

A genuinely broad new method that changes the economics of simulation across many domains.

That last one is possible, but the current evidence does not justify believing it yet.

Where Rich was exactly right

Rich’s best moment in the call was not the excitement. It was the Jack Wisdom gate.

That is the right first filter.

In plain terms, Rich was saying:

“Before we get excited, make sure this is not just a re-expression of known symplectic corrector / processed integrator work.” That is exactly right.

Because if this collapses into known prior art, then the opportunity becomes:

maybe a good engineering implementation,
maybe a useful package,
maybe a sellable service,

but not necessarily a defensible new invention.

My updated recommendation

Here is the clean recommendation:

Do pursue

but only as a controlled technical validation

Do not yet do

patent strategy
fundraising help
customer introduction
strategic partnership talk
“this changes data centers” positioning

until it survives a hard benchmark.

The right next step

Do one short, brutal test cycle.

Ask him for these six things and nothing else:

Exact definition of “improvement” for every domain row
Baseline integrators and settings used in each comparison
One fully local run path
- no cloud dependency
- no black-box API-only requirement
One benchmark package you choose, not one he chooses
One reference case in celestial mechanics, since that appears strongest
One prior-art memo explaining why this is not just a standard Wisdom/corrector/kernel variant

The kill-shot questions for the next meeting

If you want to find out fast whether this is solid or soft, ask these:

1. For celestial mechanics, what exactly is the x-axis and y-axis of the benchmark?
If he cannot answer that crisply, stop.

2. What is the baseline integrator and what parameters did you use?
If he compared against weak baselines, the big numbers may be meaningless.

3. Is the correction applied every step, only at outputs, or as pre/post processing?
This reveals whether he actually understands the numerical method class he is in.

4. What workloads make EK-TORUS fail or provide no benefit?
If he says “it helps everything,” that is a bad sign.

5. Can we run it fully offline in our own environment with our own benchmark?
If no, that is a serious diligence problem.

Final judgment

Updated score: promising enough to test, not strong enough to trust.

The screenshots moved this from:

“interesting but too vague”

to:

“coherent enough for a real bake-off”

But they did not move it to:

“validated breakthrough”

The smartest path is:

benchmark first, prior-art second, business third.

If you want, I can turn all of this into a one-page internal memo for your team with:

green flags
red flags
recommended next steps
and a go / no-go decision at the bottom.

Researched and prepared by A.R.I.A.