Stability, Plasticity, and Editability in ACI:
Reported Benchmark Evidence and Product Interpretation
Vareon Research
Vareon Inc., Irvine, California, USA · Vareon Limited, London, UK · April 2026

Abstract
This report presents the benchmark evidence for ACI across supervised continual learning and robotics. On supervised suites, ACI variants nearly double baseline final accuracy, reduce forgetting by over 20x, and achieve near-exact editability (removal error on the order of 10⁻⁹).
On continuous-control robotics, ACI achieves 12x lower forgetting than established baselines while providing governed adaptation, rollback, and typed safety constraints. The edge runtime operates as a hybrid layer: ACI handles post-deployment change, editability, and safety while existing deep RL handles raw policy optimization.
This report covers benchmark methodology, complete results with baselines, and the product interpretation across ACI's three products plus the safety and policy add-on.
1. Scope of this report
This report draws on two benchmark sources: benchmark_report_frank7_fixed.pdf (supervised continual learning) and robotics_benchmark_report.pdf (continuous-control RL). Together, they establish three conclusions.
First, ACI achieves strong results on supervised continual learning: higher accuracy, lower forgetting, and near-exact editability. Second, on edge robotics, ACI provides dramatically lower forgetting and governed adaptation as a hybrid layer alongside existing RL policies. Third, these results map directly to ACI's three products plus the safety and policy add-on.
2. Capability contract
2.1 Stability
When the system incorporates new information, previously protected outputs remain within a declared non-regression budget. This makes ongoing change usable in production: new learning does not silently degrade existing behavior.
2.2 Plasticity
New items are incorporated in bounded time and bounded memory. Tenant facts, device habits, or policy items become part of the live system without a full retraining cycle, while stability guarantees hold.
2.3 Editability
Individual learned contributions can be precisely removed and the resulting system state reconstructed. This enables deletion, rollback, and subject-level data removal as first-class operations rather than approximate cleanup.
3. Supervised benchmark evidence
3.1 Streaming-text results
On the wikitext streaming-text suite, ACI variants nearly double the final accuracy of the strongest non-ACI baselines while reducing forgetting by over 20x.
| Suite | Method | Final Accuracy | Forgetting | Note |
|---|---|---|---|---|
| wikitext streaming text | ACI (best) | 0.4750 +/- 0.0248 | 0.0087 +/- 0.0049 | 1.76x accuracy, 23x lower forgetting vs best baseline |
| wikitext streaming text | ACI (second) | 0.4663 +/- 0.0228 | 0.0100 +/- 0.0049 | 1.73x accuracy, 20x lower forgetting vs best baseline |
| wikitext streaming text | best non-ACI baseline | 0.2625-0.2700 | 0.2000+ | Baseline range |
These results demonstrate effective supervised binding: the system learns new streaming content while keeping forgetting within a narrow budget. The accuracy gap (0.47 vs 0.27) and forgetting gap (0.009 vs 0.20+) are both operationally significant.
3.2 Editability evidence
Editability is measured as a benchmark property: after removing a specific learned contribution, how close is the resulting state to the counterfactual where that contribution was never learned?
| Suite | Method | Editability | Note |
|---|---|---|---|
| supervised suite | ACI | ~1.7e-9 | Near-exact removal — effectively zero residual |
| supervised suite | replay baseline | ~0.8 | Baseline — significant residual after removal |
| DomainNet | ACI (exact) | 0 | Exact removal |
| DomainNet | ACI (near-exact) | 2.776e-18 | Near-exact removal |
ACI achieves editability on the order of 10⁻⁹ (effectively exact removal), compared to ~0.8 for replay-based approaches. On DomainNet, editability reaches exact zero for some variants. This makes item-level deletion and rollback operationally viable.
3.3 Domain-incremental evidence
On the DomainNet domain-incremental benchmark, ACI outperforms the strongest baseline (replay) by approximately 40%, with near-zero editability across variants.
| Suite | Method | Final Accuracy | Note |
|---|---|---|---|
| DomainNet domain-incremental | ACI | 0.2626 +/- 0.0019 | 40% above strongest baseline |
| DomainNet domain-incremental | replay (baseline) | 0.1881 +/- 0.0073 | Best non-ACI baseline |
This result validates cloud-side knowledge binding: when the workload involves structured domain shift (new categories, new data distributions), ACI maintains higher accuracy while preserving the ability to remove or roll back individual contributions.
4. Robotics interpretation
4.1 Return comparison
On raw return, established baselines achieve higher scores on hard continuous-control tasks. ACI's edge value is not in replacing these policies but in governing what happens after they are deployed.
| Benchmark | ACI | Established Baseline | Note |
|---|---|---|---|
| 10-task continual control | ~154.17 | ~868.59 | ACI operates as hybrid governance layer, not policy replacement |
| 20-task continual control | ~128.39 | ~514.02 | ACI operates as hybrid governance layer, not policy replacement |
| Robotic manipulation A | ~1.50 | ~3.66 | ACI adds adaptation and safety on top of existing policies |
| Robotic manipulation B | ~1.16 | ~3.74 | ACI adds adaptation and safety on top of existing policies |
ACI at the edge is a hybrid runtime: existing deep RL handles raw policy optimization, while ACI handles post-deployment adaptation, rollback, editability, and typed safety constraints.
4.2 Forgetting comparison
On forgetting, ACI shows a substantial advantage: 12x lower on the 20-task suite and 7x lower on the 10-task suite compared to the established baseline. This is the core edge-runtime value proposition.
| Benchmark | ACI | Established Baseline | Note |
|---|---|---|---|
| 20-task continual control | 46.75 | 572.51 | 12.2x lower forgetting with ACI |
| 10-task continual control | 54.56 | 407.43 | 7.5x lower forgetting with ACI |
When a deployed system needs to adapt without losing previously learned behavior, ACI provides bounded adaptation with explicit rollback and typed safety constraints. Every change goes through governed operations (bind, constrain, audit) rather than unchecked gradient updates.
5. Product interpretation
The benchmark results map to three primary products plus one cross-cutting add-on. ACI Inference provides multi-tenant cloud adaptation with per-tenant binding, isolation, and deletion. ACI Personal Agents covers desktop, laptop, and on-device personalization with user-level reset and erase. ACI Edge Runtime is the compiled edge runtime for bounded local adaptation with rollback. ACI Safety & Policy provides typed constraints, hard-denial boundaries, and auditable evidence when those controls are part of the deployment boundary.
The capability contract (stability, plasticity, editability) establishes what ACI can do. Audit sets, certificates, and signed evidence establish that it did so correctly. Capability first, governance second.
6. Scope and methodology notes
Results are from the benchmark artifacts cited in this report. Performance on other tasks or domains requires separate evaluation.
ACI at the edge is a hybrid governance layer for adaptation, rollback, and safety — it complements existing RL policies rather than replacing them.
Production deployments require calibration on specific evaluation sets, operating envelopes, and hardware budgets.
This report covers benchmark methodology and results. Internal optimization and implementation details are proprietary.
7. Conclusion
ACI demonstrates three capabilities that work together. On supervised continual learning, ACI nearly doubles baseline accuracy while reducing forgetting by over 20x and achieving near-exact editability (10⁻⁹ removal error). On edge robotics, ACI achieves up to 12x lower forgetting than established baselines while providing governed adaptation, rollback, and typed safety as a hybrid layer.
These results establish ACI as infrastructure for post-deployment change: supervised binding for cloud workloads, precise deletion and rollback for compliance, private personalization for on-device use, and governed hybrid adaptation for edge deployment. Each claim maps to a specific deployment surface with benchmark evidence behind it.
© 2026 Vareon Inc. and Vareon Limited. All Rights Reserved.