Perplexity Unveils Hybrid Local-Cloud AI Inference, Coming to PC in July
At Computex 2026 in Taipei, Perplexity CEO Aravind Srinivas, alongside Intel CEO Lip-Bu Tan, announced 'hybrid agentic inference,' a system that automatically splits AI workloads between a user's local device and cloud-based frontier models. The feature, coming to Perplexity Computer in July, is demoed on Intel Core Ultra Series 3 processors and currently exclusive to the Windows PC app. The orchestrator uses a compact local model as a 'traffic cop' to decide which tasks (e.g., summarizing documents) can run locally and which (e.g., complex reasoning) need cloud resources, all without manual configuration. Srinivas emphasized cost efficiency: Perplexity's revenue grew fivefold to $500 million while headcount rose only 34%, and offloading inference to user hardware helps maintain that ratio. The privacy benefit is real—sensitive data like financial records or health information stays on-device—but aligns conveniently with Perplexity's financial incentives. This move places Perplexity among major players like Apple, Microsoft, and Nvidia, all pushing hybrid inference, but Perplexity's real-time orchestration layer is unique. However, it's not a fully offline solution; the local model is deployed by Perplexity, and cloud queries still route through its servers. The July rollout will test reliability.
Key facts
- Perplexity announces hybrid local-cloud AI inference, splitting tasks automatically without user input.
- Feature coming to Perplexity Computer in July, demoed on Intel Core Ultra Series 3.
- Compact local model acts as 'traffic cop' for sensitive data; complex tasks go to cloud.
- Perplexity revenue grew to $500M with only 34% headcount increase; offloading cuts costs.
- Privacy benefit aligns with financial incentive; not a fully offline solution.
- Real-time orchestration differentiates from Apple, Microsoft, and Nvidia approaches.