Neysa and Pipeshift launch real-time inference for open-source AI models, fully deployed within India
India is emerging as one of the world’s largest inference-heavy AI markets, driven by the rise of voice agents, enterprise copilots, AI assistants, and reasoning workflows across sectors. Yet much of the infrastructure serving those workloads continues to sit outside the country.
Against this backdrop, Neysa, a purpose built AI Compute and Acceleration Cloud provider, and Pipeshift, a managed inference platform for open-source AI models, today announced a partnership to launch production-grade real-time inference infrastructure fully deployed within India.
As enterprises scale AI adoption across customer support, software development, analytics, operations, and enterprise workflows, inference is increasingly becoming a recurring dependency layer tied to foreign infrastructure, foreign pricing, and dollar-denominated APIs. While shared token-based APIs helped companies get started with AI adoption, many Indian enterprises are now encountering production-scale challenges around unpredictable latency, escalating token costs, shared infrastructure bottlenecks, and overseas data routing.
The partnership addresses this gap by extending Velocis, Neysa’s AI Acceleration Cloud System, with dedicated, low-latency real-time inference for enterprises deploying production AI applications. Pipeshift’s inference platform running on Neysa’s AI-native GPU infrastructure, enables enterprises to deploy single-tenant inference environments for open-source models including Gemma, Qwen, GPT-OSS, Llama, DeepSeek, and Mistral through OpenAI-compatible APIs, without managing underlying GPU infrastructure themselves.
Designed for latency-sensitive workloads including voice AI, enterprise search, copilots, workflow automation, and reasoning-based systems, the platform is tuned at the kernel and inference-engine level for production traffic, dynamically auto-scales during demand spikes, and keeps prompts, inference, and enterprise data fully within India. The platform also supports workloads including speech-to-text, text-to-speech, OCR, and enterprise automation systems within a unified infrastructure environment.
The platform also eliminates shared rate limits, cold-start delays, and cross-region routing overheads that often affect shared inference environments, while enabling enterprises to transition between newer GPU generations and open-source model releases without rearchitecting applications.
Commenting on the announcement, Karan Kirpalani, Chief Product Officer, Neysa, said, “Scaling open-source models introduces a dual bottleneck: volatile token economics and high Time-to-First-Token (TTFT) driven by shared rate limits and cross-region routing. By integrating Pipeshift’s inference-engine optimizations directly onto Neysa’s single-tenant, optimized bare metal, we eliminate this friction entirely. The upshot for enterprises is a seamless, OpenAI-compatible drop-in replacement that guarantees cold-starts, predictable and highly optimized token latency, and absolute sovereign data control at scale.”
“There is a clear line between AI that works in a demo and AI that works in production. Crossing that line takes more than a good model. It takes infrastructure that holds latency under load and keeps costs predictable at scale. That is the line our partnership with Neysa helps Indian companies cross,” said Arko Chattopadhyay, Co-Founder and CEO, Pipeshift.
Typical deployment timelines from evaluation to production are under two weeks, allowing enterprises to move production AI workloads without rebuilding applications or reconfiguring existing API integrations.
Early deployments on the platform already include production AI workloads across voice AI and enterprise automation. Nurix AI achieved a 3x reduction in Time to First Token (TTFT) for its voice AI deployments in India. “We needed sub-second LLM latency for voice agents in production, and real-time inference from Neysa and Pipeshift cut our TTFT 3x versus our prior setup in India,” said Pushkar Patel, Nurix AI.
Arrowhead AI is using the platform for multilingual inference workloads. “We fine-tuned a custom LLM for colloquial Indian languages and needed a deployment partner who could give us predictable tail latency in production. Neysa and Pipeshift had our fine-tuned model live as an inference endpoint within a day, and now also host the SLMs we use for predictive caching and the custom containers running ASR models,” said Vengadanathan Srinivasan, CTO, Arrowhead AI.
The platform is immediately available for enterprises evaluating production-scale open-source AI deployments across customer support, voice AI, enterprise copilots, workflow automation, and regulated AI workloads.