TPRT

Inline Token Pruning Accelerator

TPRT is a hardware accelerator that sits between AI inference producers and the network, selectively pruning low-value data to reduce bandwidth by up to 50%.

Unlike software approaches that run on the same compute as inference, TPRT operates as a standalone inline device - intercepting, evaluating, and optimising AI traffic without modifying upstream or downstream architecture.

Inline Placement

Data flows through TPRT between inference producer and interconnect. The module intercepts AI traffic non-invasively, requiring no changes to existing accelerator or network architecture.

Confidence-Driven Path Direction

When the confidence margin indicates pruning is safe, low-value tokens are dropped and bandwidth is reduced. When confidence is low, data passes through the bypass path unmodified — switching in under 10 nanoseconds.

Receiver Feedback and Fail-Open Design

The receiver monitors reconstruction quality and feeds metrics back to TPRT, enabling real-time adaptation. If quality degrades or errors spike, the system fails open — data flows through unmodified, mission continues.

KEY FEATURES

Standalone inline placement — between producer and interconnect, transparent to existing systems
Sub-10ns bypass switching — prune when safe, pass through when uncertain
Receiver feedback loop — adapts pruning aggressiveness based on real-time quality metrics

Fail-open default — hardware bypass ensures data flow even if optimisation fails
Token pruning, not quantization — actual data removal, not precision reduction

APPLICATIONS

Domain

Use Case

KV cache bandwidth reduction for LLM inference

Datacenter

Resilient AI for bandwidth-constrained tactical network

Defense

Gradient compression across interconnects

Distributed Training

Local inference traffic optimisation

Edge AI

COMPETITIVE POSITION

We have validated TPRT's architecture against published hardware approaches including KAIST's Oaken system (ISCA 2025), CXL-based memory tiers, and commercial SmartNIC/DPU platforms.

No existing hardware combines standalone inline placement, sub-10ns bypass, receiver feedback, fail-open behaviour, and token pruning.

Patents pending (UK)

TPRT

Inline Token Pruning Accelerator

KEY FEATURES

APPLICATIONS

COMPETITIVE POSITION

Collaborate

Contact