Benchmark pages get weird fast.
The easiest version is to cherry-pick one flashy example, publish a giant percentage, and call it a day. That's not what we're trying to do with Axint.
What our benchmark page actually measures
The public benchmarks page measures one very specific thing:
How many tokens a compact Axint definition uses compared with the Swift that would normally need to be generated for the same Apple-native surface.
The current public page uses three representative surfaces:
- one App Intent
- one SwiftUI view
- one WidgetKit widget
The token counts are computed at build time from code that is visible in the page source. There is no client-side hiding and no hand-entered marketing number floating above the examples.
What the estimate is based on
We use the simple four-characters-per-token approximation for source code.
That is not a provider-specific billable number. It is a stable comparison method. The goal is not to pretend we know your exact invoice down to the cent. The goal is to compare compact Axint authoring against the larger Swift output the model or human would otherwise need to carry around.
What this does prove
It proves that the authoring surface is materially smaller.
That matters because agent loops pay for verbosity twice:
- once when the model has to generate the larger output
- again when later turns have to read, diff, repair, or regenerate it
If a tool can express the same Apple-native feature in a smaller surface area, it has a real systems advantage.
What it does not prove
It does not prove that every possible Apple feature compresses by the same ratio. It does not prove build-time performance. It does not prove total product ROI by itself.
That's why we keep this page narrow. It is token-efficiency proof, not a catch-all performance claim.
How to rerun the compiler-side benchmarks
The open-source repo also includes compile-time benchmarks for the compiler itself:
npm run benchThose measure compilation throughput across representative fixtures. They answer a different question than the public benchmark page, but they matter for CI and regression detection.
The standard we're aiming for
If a public proof page can't be explained in a minute, it usually isn't honest enough yet.
So the Axint version is deliberately simple:
- visible source
- deterministic token estimate
- clear scope
- reproducible compiler benchmarks in the repo
That's the bar we want to keep raising.