Why enterprise AI fails without a strong data foundation

By Rohit Kumar, COO, SCIKIQ

Almost every large company wants AI now. The board has asked, the CEO read something on a flight, and there is usually money set aside before anyone has worked out what problem they are solving. So, a team picks a model, builds a pilot, demos it once to applause, and nine months later it has quietly dropped off the steering committee deck. Nobody calls it a failure. It just stops coming up.

After twenty years building data platforms in banks and global enterprises, the pattern is hard to miss: the model is almost never what broke. It was everything underneath it.

People treat AI like a fortune teller. It is not. It is a very fast reader that believes whatever you put in front of it. Give it a mess and it reads the mess back fluently, in good English, with a confident tone, and that is the danger.

What are they key reasons?

1. Data Quality: Take a common scenario. A team wants an assistant that can answer basic questions about a customer, spend, contract status, that sort of thing. Simple ask. Except the customer record lives in the CRM, in the billing system, and in two spreadsheets a regional finance team has quietly maintained since about 2014. Same customer, four slightly different names, three addresses, two IDs. Nobody ever agreed which was correct because nobody was ever forced to.
So, you ask a one-line question and the assistant pulls from whichever source it hits first. Sometimes right, sometimes not, no way to tell without checking by hand. In one such rollout it quoted a closed account as active twice in the first month, and once handed a relationship manager the wrong credit limit in front of the client. The instinct is to blame the AI. But the AI was fine. It read what was there. Garbage in, garbage out did not stop being true just because the garbage now comes back in a polite paragraph.

2. Data Silos – Mini Kingdoms: The second one is silos, older than AI by about thirty years. Finance owns its numbers, sales own its pipeline, operations own the delivery data. Everyone is polite about it in meetings and nobody actually shares anything, partly out of habit and partly because their bonus depends on their own version of the truth looking good.

Then someone asks something reasonable. Which of our accounts are profitable and also at risk of leaving. Sounds like one question. It is four datasets, in four systems, owned by three departments, joined by nothing. The result is predictable.

One churn model, built only on billing because billing was the only data available without a turf war, scored great on the test set. Everyone was happy. Then it went live and caught barely a third of the accounts that actually left, because the customers who walked over bad service were invisible to it. Their support tickets sat in a different system that was never wired in. Although they were accurate in test, it failed in production.

3. Business semantics: My definition vs your definition : This is the one that gets missed, and it has sunk more projects than dirty data and silos put together. The data can be spotless and all in one place, and the thing still gives wrong answers, because the humans never agreed on what the words mean.

Take revenue. One word, sounds unambiguous. Ask finance and it is booked when the invoice goes out. Ask sales and it is the day the contract is signed. Ask the regional head and they are quietly netting off returns first. None of them are wrong. They have just never had to reconcile it, because until now a human sat in the middle and did the translation without anyone noticing.

Now put an AI in the middle and ask what revenue was last quarter. It does not know there are three definitions. It picks one, states it with total confidence, and half the room decides the system is broken. The same landmine sits under active customer, under churn, under margin, under almost every word that matters.

This layer has a name. People call it business semantics, or a semantic layer, or just the metric definitions, depending on who is selling it. Strip the jargon and it is the agreed dictionary. This is what we mean by revenue. This is how we count an active customer. Net, not gross. Without that dictionary written somewhere the machine can read, an AI is translating a language nobody gave it the grammar for.

It does not take much to see this go wrong. A reporting project can stall for weeks over two words: on-time delivery. The data is clean, the pipelines work. But one department means the date promised at order and another means the date promised after the last revision, and each has been reporting its own version up the chain for years. The AI does not cause that argument rather drags it into the open, in front of the people who sign off budgets.

4. Nobody owns governance until it goes wrong: Governance is the boring one. It gets nodded at in slides and ignored in practice, right until the moment it becomes the only thing anyone cares about.

All it means is knowing what data you hold, where it came from, who can see it, and whether you are even permitted to use it the way you just did. Skip that and a shiny assistant will ingest personal customer data it was never cleared to touch and serve it to whoever asks. With DPDP now live in India and GDPR in place across Europe, that is not a theoretical risk for an appendix. That is a letter from a regulator and a very bad week.

What to do?

None of the fixes are exciting, which is probably why so many teams skip them. The data has to be clean enough to trust. It has to be connected, so the locked rooms open. The agreed dictionary has to be written down, so the machine and the humans share definitions. And there has to be enough governance to always know where a number came from and whether it was allowed to be used.

The most common mistake is treating all of this as cleanup for later, something to sort out once the AI is impressive enough to justify it. It does not work in that order and never has. You do not lay the roof and then go back to dig the foundation.

The companies genuinely pulling ahead with AI right now are not the ones with the cleverest models. The models are a commodity; you can rent a good one by the hour. They are the unglamorous ones who spent two boring years fixing their data and arguing their definitions into the ground while everyone else shipped demos.

Comments (0)
Add Comment