Express Computer
Home  »  Guest Blogs  »  Why things go catastrophically wrong when AI agents query raw data without a semantic layer

Why things go catastrophically wrong when AI agents query raw data without a semantic layer

0 0

By Pratik Jain, Senior Director of Technology, Kyvos Insights

AI systems querying raw enterprise data can produce inconsistent and misleading insights due to lack of business context. An interoperable and governed universal semantic layer that grounds AI in meaning is essential to ensure accurate, consistent and trustworthy AI-driven decisions.

Today’s enterprise users expect to interact with data just as easily as they ask a question on ChatGPT, demanding immediate answers. They want to query enterprise data in plain, natural language without navigating through dashboards, reports, writing SQL or understanding underlying data structures.

AI is expected to do all the heavy lifting: translating their intent, generating queries, joining datasets and applying the right logic. This transformative shift lowers the barrier to data insights and enables more people across the organisation to engage with data. 

However, this also introduces the need for ensuring that AI is responsible and is architected with guardrails – since users trust the insights to be accurate enough to drive business decisions. This complexity is often overlooked. 

AI blindsighted

Raw data is architected for storage and processing efficiency but is not designed to capture the business context required for accurate interpretation and insight generation. When AI systems are exposed to raw data, the results can be inconsistent or misleading, with different queries producing different answers to the same question.

For example, the same business expression is often represented in multiple ways across the data environment. “Revenue” may exist as bookings in one table, realised revenue in another and net revenue post tax adjustments in a third. Similarly, entities such as customers or accounts may appear under different field ids like “cust_name”, “customer” or “account”, with no explicit indication that they refer to the same objects. Relationships between datasets are ambiguous, with multiple possible join paths that all appear valid at a structural level.

Critical business logic is usually embedded within transformation pipelines. None of this context is visible to AI systems interacting directly with raw tables and schemas. Without a defining layer that overlays business meaning, relationships and rules, AI systems are left to deduce them from the data structure alone. While the results produced may all be technically correct, they could be inconsistent and contradictory. Depending on query logic used, they may not always reflect the definitions the business actually relies on.

The faultlines

Though significant, the failure nodes are not always obvious. The outputs look credible, the queries execute correctly, and yet the insights can be misaligned with business.

The first breakdown occurs while using business definitions. As seen earlier, the same metric like Revenue can exist in multiple forms in the data environment, and without clear guidelines, AI systems choose one arbitrarily. 

The second issue is with data relationships. Schemas allow multiple ways to connect datasets, each technically correct. Without business context, AI systems may choose a join path that introduces duplication or misalignment between records. The resulting numbers may still look reasonable, making these errors very difficult to detect, but they distort the underlying truth.

A more subtle but critical problem is semantic drift across datasets. Business logic embedded in transformation pipelines, such as exclusions, adjustments and evolving definitions, is not always present in the warehouse schema. One dataset might include ‘cancelled transactions’ while another one excludes them completely. Due to these differences, when AI queries raw tables that encode marginally different rules, it can produce conflicting answers to the same question.

AI systems appear intelligent and reliable, but operate on fragmented and inferred meanings. The result is not an error in execution, but an error in interpretation and choice among multiple plausible paths. 

Moreover, there is a lack of traceability and governance. Without a unifying layer that defines how data should be interpreted, it is difficult to explain how an answer was derived. Different queries apply different assumptions, and there is no consistent way to validate whether the output aligns with business intent.

From raw data to meaning: The semantic shift

The root of the problem lies in the fact that raw enterprise data is a product of ingestion patterns and performance optimisations. It is augmented for storage and computation, and any transformation logic applied on it is not visible in the final tables. 

This is where semantics becomes critical. Semantics defines the business meaning of data- how it should be understood and interpreted. It acts as the translation layer between raw data and the consumption layer. 

Without this layer of meaning, the resultant data store cannot reliably convey the “context” that is key for AI agents and BI tools to function accurately.

With AI becoming mainstream, a semantic layer is being seen by the industry as a foundational component of modern data architecture, providing a governed pathway to insights.

The semantic landscape

As the need for consistent data interpretation has grown, different approaches to semantic layers have emerged. 

BI tools like Looker, Power BI and Tableau built their own native semantic layers, facilitating metrics and KPIs to be defined within dashboards, without writing SQL. This made semantics accessible to business teams and reduced dependence on data engineering for every query. However, definitions remained tied to that tool and invisible to any others.

Data platforms followed suit, with Databricks and Snowflake embedding their own semantic capabilities closer to storage. This brought business logic nearer to the data itself, which worked well for organisations running on a single platform. For those operating across multiple platforms, it introduced yet another layer of definitions that could not easily speak to each other.

The cumulative result was an explosion of semantic layers in the enterprise analytics stack. These native semantic layers are tightly coupled to the host, making logic and definitions hard to reuse across the stack, and thus a challenge in a multi-platform environment. Cross-system governance suffers in this siloed approach because each tool and platform carries its own version of the truth, with no shared mechanism to align them.

Universal semantic layers emerged as a response. Vendors like Kyvos, Cube, dbt and AtScale built layers designed to sit independently between all data sources and all-consuming systems, whether AI or BI. When semantics are decoupled from any single platform, business meaning can be defined once and referenced consistently across the environment. The layer remains constant as the data stack evolves and as new AI agents are introduced. 

With each approach carrying its own tradeoffs, the more useful question for any data leader is not which architecture is right in theory but what a semantic layer must demonstrably deliver, regardless of how it is built.

Semantic layer must-haves

What capabilities must semantic layers present, irrespective of how they are architected? Firstly, it should provide independence from any constraints imposed by the underlying data platforms. The data stack may change or evolve, but the semantic layer must remain constant, allowing business logic to persist even with technological refreshes.

It must also be interoperable and coordinated across systems, acting as a shared foundation and eliminating ambiguity at source. With that, AI systems and BI tools interpret data using the same, consistent business definitions. 

The semantic layer should also be key to handling scalability and complexity of enterprise data. This includes supporting large, multidimensional models that reflect real-world business structures.

For AI agents, the semantic layer should be the go-to interpretation interface with clear, governed definitions of entities. It should assume the role of being the architectural control plane that ensures that AI-driven insights are fast, consistent, explainable and thus reliable.

Conclusion

AI is fast becoming the primary interface to analyse enterprise data. In the new paradigm, data is abundant and AI is powerful, but it must be anchored in business context. A semantic layer is what makes that possible, providing a single governed foundation for how data is interpreted across every tool, every agent and every query, so that the insights driving business decisions are never the product of an assumption AI made on its own.

Leave A Reply

Your email address will not be published.