ACI · Full Benchmark Report

Stability, Plasticity, and Editability in ACI:
Reported Benchmark Evidence and Product Interpretation

Vareon Research

Vareon Inc., Irvine, California, USA · Vareon Limited, London, UK · April 2026

Abstract

This report presents the benchmark evidence for ACI across supervised continual learning and robotics. On supervised suites, ACI variants nearly double baseline final accuracy, reduce forgetting by over 20x, and achieve near-exact editability (removal error on the order of 10⁻⁹).

On continuous-control robotics, ACI achieves 12x lower forgetting than established baselines while providing governed adaptation, rollback, and typed safety constraints. The edge runtime operates as a hybrid layer: ACI handles post-deployment change, editability, and safety while existing deep RL handles raw policy optimization.

This report covers benchmark methodology, complete results with baselines, and the product interpretation across ACI's three products plus the safety and policy add-on.

1. Scope of this report

This report draws on two benchmark sources: benchmark_report_frank7_fixed.pdf (supervised continual learning) and robotics_benchmark_report.pdf (continuous-control RL). Together, they establish three conclusions.

First, ACI achieves strong results on supervised continual learning: higher accuracy, lower forgetting, and near-exact editability. Second, on edge robotics, ACI provides dramatically lower forgetting and governed adaptation as a hybrid layer alongside existing RL policies. Third, these results map directly to ACI's three products plus the safety and policy add-on.

2. Capability contract

2.1 Stability

When the system incorporates new information, previously protected outputs remain within a declared non-regression budget. This makes ongoing change usable in production: new learning does not silently degrade existing behavior.

2.2 Plasticity

New items are incorporated in bounded time and bounded memory. Tenant facts, device habits, or policy items become part of the live system without a full retraining cycle, while stability guarantees hold.

2.3 Editability

Individual learned contributions can be precisely removed and the resulting system state reconstructed. This enables deletion, rollback, and subject-level data removal as first-class operations rather than approximate cleanup.

3. Supervised benchmark evidence

3.1 Streaming-text results

On the wikitext streaming-text suite, ACI variants nearly double the final accuracy of the strongest non-ACI baselines while reducing forgetting by over 20x.

Suite	Method	Final Accuracy	Forgetting	Note
wikitext streaming text	ACI (best)	0.4750 +/- 0.0248	0.0087 +/- 0.0049	1.76x accuracy, 23x lower forgetting vs best baseline
wikitext streaming text	ACI (second)	0.4663 +/- 0.0228	0.0100 +/- 0.0049	1.73x accuracy, 20x lower forgetting vs best baseline
wikitext streaming text	best non-ACI baseline	0.2625-0.2700	0.2000+	Baseline range

These results demonstrate effective supervised binding: the system learns new streaming content while keeping forgetting within a narrow budget. The accuracy gap (0.47 vs 0.27) and forgetting gap (0.009 vs 0.20+) are both operationally significant.

3.2 Editability evidence

Editability is measured as a benchmark property: after removing a specific learned contribution, how close is the resulting state to the counterfactual where that contribution was never learned?

Suite	Method	Editability	Note
supervised suite	ACI	~1.7e-9	Near-exact removal — effectively zero residual
supervised suite	replay baseline	~0.8	Baseline — significant residual after removal
DomainNet	ACI (exact)	0	Exact removal
DomainNet	ACI (near-exact)	2.776e-18	Near-exact removal

ACI achieves editability on the order of 10⁻⁹ (effectively exact removal), compared to ~0.8 for replay-based approaches. On DomainNet, editability reaches exact zero for some variants. This makes item-level deletion and rollback operationally viable.

3.3 Domain-incremental evidence

On the DomainNet domain-incremental benchmark, ACI outperforms the strongest baseline (replay) by approximately 40%, with near-zero editability across variants.

Suite	Method	Final Accuracy	Note
DomainNet domain-incremental	ACI	0.2626 +/- 0.0019	40% above strongest baseline
DomainNet domain-incremental	replay (baseline)	0.1881 +/- 0.0073	Best non-ACI baseline

This result validates cloud-side knowledge binding: when the workload involves structured domain shift (new categories, new data distributions), ACI maintains higher accuracy while preserving the ability to remove or roll back individual contributions.

4. Robotics interpretation

4.1 Return comparison

On raw return, established baselines achieve higher scores on hard continuous-control tasks. ACI's edge value is not in replacing these policies but in governing what happens after they are deployed.

Benchmark	ACI	Established Baseline	Note
10-task continual control	~154.17	~868.59	ACI operates as hybrid governance layer, not policy replacement
20-task continual control	~128.39	~514.02	ACI operates as hybrid governance layer, not policy replacement
Robotic manipulation A	~1.50	~3.66	ACI adds adaptation and safety on top of existing policies
Robotic manipulation B	~1.16	~3.74	ACI adds adaptation and safety on top of existing policies

ACI at the edge is a hybrid runtime: existing deep RL handles raw policy optimization, while ACI handles post-deployment adaptation, rollback, editability, and typed safety constraints.

4.2 Forgetting comparison

On forgetting, ACI shows a substantial advantage: 12x lower on the 20-task suite and 7x lower on the 10-task suite compared to the established baseline. This is the core edge-runtime value proposition.

Benchmark	ACI	Established Baseline	Note
20-task continual control	46.75	572.51	12.2x lower forgetting with ACI
10-task continual control	54.56	407.43	7.5x lower forgetting with ACI

When a deployed system needs to adapt without losing previously learned behavior, ACI provides bounded adaptation with explicit rollback and typed safety constraints. Every change goes through governed operations (bind, constrain, audit) rather than unchecked gradient updates.

5. Product interpretation

The benchmark results map to three primary products plus one cross-cutting add-on. ACI Inference provides multi-tenant cloud adaptation with per-tenant binding, isolation, and deletion. ACI Personal Agents covers desktop, laptop, and on-device personalization with user-level reset and erase. ACI Edge Runtime is the compiled edge runtime for bounded local adaptation with rollback. ACI Safety & Policy provides typed constraints, hard-denial boundaries, and auditable evidence when those controls are part of the deployment boundary.

The capability contract (stability, plasticity, editability) establishes what ACI can do. Audit sets, certificates, and signed evidence establish that it did so correctly. Capability first, governance second.

6. Scope and methodology notes

Results are from the benchmark artifacts cited in this report. Performance on other tasks or domains requires separate evaluation.

ACI at the edge is a hybrid governance layer for adaptation, rollback, and safety — it complements existing RL policies rather than replacing them.

Production deployments require calibration on specific evaluation sets, operating envelopes, and hardware budgets.

This report covers benchmark methodology and results. Internal optimization and implementation details are proprietary.

7. Conclusion

ACI demonstrates three capabilities that work together. On supervised continual learning, ACI nearly doubles baseline accuracy while reducing forgetting by over 20x and achieving near-exact editability (10⁻⁹ removal error). On edge robotics, ACI achieves up to 12x lower forgetting than established baselines while providing governed adaptation, rollback, and typed safety as a hybrid layer.

These results establish ACI as infrastructure for post-deployment change: supervised binding for cloud workloads, precise deletion and rollback for compliance, private personalization for on-device use, and governed hybrid adaptation for edge deployment. Each claim maps to a specific deployment surface with benchmark evidence behind it.

Stability, Plasticity, and Editability in ACI:Reported Benchmark Evidence and Product Interpretation