TPRT

Inline Token Pruning Accelerator

TPRT is a hardware accelerator that sits between AI inference producers and the network, selectively pruning low-value data to reduce bandwidth by up to 50%.

Unlike software approaches that run on the same compute as inference, TPRT operates as a standalone inline device - intercepting, evaluating, and optimising AI traffic without modifying upstream or downstream architecture.

Inline Placement

Data flows through TPRT between inference producer and interconnect. The module intercepts AI traffic non-invasively, requiring no changes to existing accelerator or network architecture.

Confidence-Driven Path Direction

When the confidence margin indicates pruning is safe, low-value tokens are dropped and bandwidth is reduced. When confidence is low, data passes through the bypass path unmodified — switching in under 10 nanoseconds.

Receiver Feedback and Fail-Open Design

The receiver monitors reconstruction quality and feeds metrics back to TPRT, enabling real-time adaptation. If quality degrades or errors spike, the system fails open — data flows through unmodified, mission continues.

KEY FEATURES

  • Standalone inline placement — between producer and interconnect, transparent to existing systems

  • Sub-10ns bypass switching — prune when safe, pass through when uncertain

  • Receiver feedback loop — adapts pruning aggressiveness based on real-time quality metrics

  • Fail-open default — hardware bypass ensures data flow even if optimisation fails

  • Token pruning, not quantization — actual data removal, not precision reduction

APPLICATIONS

Domain

Use Case


KV cache bandwidth reduction for LLM inference

Datacenter


Resilient AI for bandwidth-constrained tactical network

Defense


Gradient compression across interconnects

Distributed Training


Local inference traffic optimisation

Edge AI


COMPETITIVE POSITION

We have validated TPRT's architecture against published hardware approaches including KAIST's Oaken system (ISCA 2025), CXL-based memory tiers, and commercial SmartNIC/DPU platforms.

No existing hardware combines standalone inline placement, sub-10ns bypass, receiver feedback, fail-open behaviour, and token pruning.

Patents pending (UK)

Collaborate