It is critical to secure data before building AI models: Roshmik Saha, Skyflow

In a world where enterprises are struggling to strike the right balance between personalisation and privacy, Skyflow has emerged as a data privacy infrastructure company that treats sensitive data as a first-class category, much like identity or authentication. In this conversation with Express Computer, Roshmik Saha, Co-founder and CTO, Skyflow, shares insights on how Skyflow is redefining customer data protection with tokenisation, zero-trust architecture, and polymorphic encryption. He explains how enterprises can secure sensitive data while still enabling innovation in areas like AI and customer personalisation, and why data sovereignty is becoming central to global expansion strategies.

Skyflow has been at the forefront of rethinking how sensitive data is stored and protected. As Co-founder and CTO, how do you define the problem Skyflow set out to solve, and what makes your approach different from traditional data security models?

The motivation behind Skyflow came from my experiences at Lyft and Microsoft. At Lyft, we realised that even something as simple as knowing a rider’s destination could reveal a lot of personal information. The only safeguard we had then was compliance training, telling employees not to look at their friends’ or celebrities’ data. At Microsoft, we faced similar challenges with automotive data ownership and usage. Despite good intentions, enforcement largely depended on trust and manual processes.

At Skyflow, we asked a fundamental question, how do we truly protect data without breaking business use cases? Unlike traditional “bolt-on” security solutions, we designed a new kind of infrastructure, treating PII and other sensitive data as its own category, just like identity or authentication. We built a data privacy platform, or what we call a data vault, which keeps sensitive data secure while ensuring all business use cases remain intact. Everything is built on zero trust, fully verifiable, and enforced through the system rather than relying on processes.

Interestingly, we later realised companies like Netflix and Apple had built internal versions of similar solutions. Our customers, like Nomi Health, quickly saw the value, choosing us instead of building their own vaults. Today, we manage PII, PCI, and PHI data for enterprises across sectors like healthcare and financial services.

As CDPs become the backbone of customer engagement, what are the biggest risks enterprises face when handling sensitive customer data, and why do organisations often struggle to balance personalisation with privacy?

It’s a difficult balance. Take Apple as an example, they market privacy as a core product feature, which is why customers trust them with financial and health data. But most companies, especially in e-commerce, rely heavily on personalisation, be it marketing, sales, or AI-driven recommendations. To enable this, customer data is spread across hundreds of integrations, from CRMs to marketing platforms.

The problem is, once you remove personal data, personalisation loses its power. On the other hand, keeping personal data unprotected creates privacy risks. Our approach is simple: tokenise sensitive data. Tokenisation anonymises the data without losing its analytical value. Unlike synthetic data, tokenisation preserves the utility of the data for AI training, inference, or personalisation, while enforcing strict governance.

Think of it like securing a moving car, you can’t just lock it in a garage. Similarly, locking away data makes it safe but useless. Tokenisation ensures data remains usable, while only authorised users can de-tokenise when necessary. This lowers operational costs, enables seamless integrations, and improves trust with end users.

With AI models increasingly trained on sensitive customer data, how should CDPs evolve to enable innovation while still ensuring strong privacy protections?

The biggest problem enterprises face is PII sprawl, sensitive data scattered across countless systems. This makes security and compliance extremely difficult. The answer is centralisation. Every engineering leader agrees: you don’t want your most valuable data scattered everywhere.

By centralising PII into a data vault, you unlock the full power of a Customer Data Platform (CDP). Without it, security and data teams are constantly in conflict, limiting innovation. With Skyflow’s vault architecture, compliance, security, and technology teams can finally work together toward business outcomes instead of blocking each other.

This enables companies to innovate with AI while ensuring privacy. For example, Walmart, with millions of use cases, benefits from this approach by making data usable for ML and analytics, without compromising on governance or compliance.

With AI agentic workflows becoming more critical, how does Skyflow ensure privacy and security in enterprise AI use cases?

As AI agentic workflows become central to enterprises, the role of data, especially PII, has become even more critical. To build the best AI models, companies need to use data ethically, securely, and responsibly. That’s where Skyflow comes in, we act as a data privacy filter for AI use cases, both during model training and inference.

Today, many enterprises are experimenting with AI agents and bots, but very few pilots actually make it to production. The main reason is the lack of confidence in security and privacy. We help enterprises overcome this barrier by enabling privacy-aware AI use cases. For example, one of our customers is building digital nurses, and they pass their entire dataset through our platform to ensure sensitive data is protected.

A major challenge with AI is that models don’t have a “delete button.” Once sensitive data is trained into a model, you can’t simply remove it, unlike GDPR compliance where deletion is possible. That’s why it’s critical to secure data before building AI models.

Some enterprises think running AI models entirely within their infrastructure solves the problem, but that’s similar to the old assumption that on-premise systems are always safer than cloud, it doesn’t address the real challenge. The real solution lies in adding a dedicated privacy and security layer on top of AI workflows. That’s exactly how we are helping enterprises unlock secure, production-ready AI use cases.

Does Skyflow itself ever have access to customer data stored in its vaults?

No. Just like any major cloud provider, Skyflow has no access to customer data or keys. Our infrastructure is built with multiple layers of protection and zero trust architecture. Even in rare emergency debugging cases, customers must explicitly grant us access to specific data.

We’ve developed polymorphic encryption, which ensures data remains encrypted even during runtime. This prevents insider threats or bad actors from accessing data inappropriately. Most breaches globally come from insider misuse, and our architecture directly addresses that risk.

With data sovereignty and residency laws evolving across regions, how should enterprises rethink the architecture of their CDPs, and what role do approaches like tokenisation or vault-based architectures play?

Nearly every country now mandates that citizen data remain within its borders, DPDP in India, GDPR in Europe, and similar laws elsewhere. This poses a challenge for global companies that have already invested heavily in infrastructure.

Our solution is simple: tokenise and separate PII from the rest of the data. Tokens, which carry no sensitive information, can be stored and processed globally. The actual PII remains local, in-country vaults. This dramatically reduces both cost and time to market, we’ve helped companies cut globalisation costs by 90% while ensuring compliance with local data laws.

Beyond just meeting compliance requirements, what does a truly “privacy-first” CDP look like in practice, and can you share surprising or unconventional use cases where secure CDPs have delivered significant value?

A mature CDP delivers value in multiple ways. It reduces costs by eliminating redundant infrastructure and security tools. It also breaks down silos between marketing, sales, and analytics, ensuring seamless data flow.

That said, a CDP can also become a honeypot, if it’s breached, the impact is catastrophic. That’s why securing it with approaches like tokenisation and vault-based architecture is critical. Only then can enterprises fully unlock the business value of CDPs without compromising customer trust.

AICDPdata privacysecuritySkyflow
Comments (0)
Add Comment