Home / Daily News Analysis / Researchers build an encrypted routing layer for private AI inference

Researchers build an encrypted routing layer for private AI inference

Apr 21, 2026 Twila Rosenbaum 4 views

In an era where data privacy is paramount, organizations in sectors such as healthcare and finance seek to leverage large AI models without exposing sensitive information to cloud servers. Researchers have introduced a cryptographic method known as Secure Multi-Party Computation (MPC), which enables this capability. MPC operates by fragmenting data into encrypted pieces and distributing them across multiple servers that do not share their information, allowing them to compute AI results without ever accessing the raw input.

However, a significant challenge remains: speed. While a standard mid-sized language model can deliver results in under a second, processing it via MPC can extend that time to over 60 seconds due to the encryption overhead.

Limitations of Existing Solutions

Previous endeavors in private inference have primarily focused on redesigning AI models to minimize the costs associated with encryption. While these strategies offer some benefits, they are hampered by a fundamental limitation: every query, regardless of its complexity, incurs the same processing cost.

Typically, in standard AI applications, a common practice is to direct simpler queries to smaller, faster models while reserving larger, more costly models for complex queries. This routing is standard in plaintext systems but poses difficulties under encryption, as routing decisions usually require reading the input, which must remain encrypted throughout the process.

Introducing SecureRouter

Researchers from the University of Central Florida have developed SecureRouter, a system designed to facilitate input-adaptive routing for encrypted AI inference. This innovative system maintains a diverse pool of models, ranging from a compact model with about 4.4 million parameters to a more extensive model with approximately 340 million parameters. A lightweight routing component evaluates each incoming encrypted query and determines which model should process it, all while keeping the routing decision completely encrypted.

The router is trained to balance accuracy against computational costs, with costs measured in terms of encrypted execution time rather than parameter counts typically used in traditional systems. Additionally, a load-balancing objective ensures that the router does not over-rely on any single model for all queries.

Performance Improvements

In tests comparing SecureRouter to SecFormer, a private inference system that employs a fixed large model, SecureRouter achieved an average inference time reduction of 1.95 times across five language understanding tasks. The speedup varied from 1.83 times on the most challenging task to 2.19 times on the simplest one, showcasing the router’s adeptness at matching model size to query difficulty.

When compared to the practice of running a large model for every query, regardless of its complexity, SecureRouter demonstrated an average speedup of 1.53 times across eight benchmark tasks. In most cases, the accuracy remained comparable to the large-model baseline, with only one task involving grammatical analysis showing a noticeable drop in accuracy, indicating that certain specialized tasks may be sensitive to being handled by a smaller model.

Minimal Overhead

While adding a routing layer to an encrypted inference system could potentially create a bottleneck, the practical implementation of the routing component consumes approximately 39 MB of memory in a two-server setup. This is slightly more than the 38 MB required for the smallest model running individually, while the largest model necessitates around 3,100 MB. The introduction of the router adds roughly 4 seconds to the inference time and generates about 1.86 GB of network communication, figures that are comparable to running the smallest model alone.

Practical Implications

The SecureRouter system is designed to integrate seamlessly with existing infrastructure, requiring no major overhauls. It operates atop current MPC frameworks and utilizes standard language model architectures available through widely-used libraries. Simple queries are quickly resolved using smaller models, while more complex queries are escalated to larger models. Notably, clients submitting queries only receive the final results and are kept entirely unaware of which model processed their requests.

Source: Help Net Security News

Researchers build an encrypted routing layer for private AI inference

Limitations of Existing Solutions

Introducing SecureRouter

Performance Improvements

Minimal Overhead

Practical Implications

Is ‘Assi’ the Best Courtroom Drama Ever Made? A Deep Dive into the ZEE5 Film

Toaster Netflix Review: Rajkummar Rao Proves He is the King of Dark Comedy

Top 5 Most Expensive Indian Movies Releasing in the Second Half of 2026

Ramayana Part 1: Box Office Update – Has it Crossed 500 Crores?

The Fall of the Superhero Genre? Why Indie Films are Winning 2026

Why ‘Stree 3’ is Set to Break the Record for the Highest Horror Opening

Pushpa 3: The Rule of Bhavani – First Day Collection Prediction