By Venkataraman Swaminathan, Vice President, Secure Power Division, Schneider Electric Greater India
The IT industry is in the middle of what’s arguably one of its most defining shifts to date, thanks to the explosive growth of generative AI. These powerful language models and tools are pushing the limits of traditional data center infrastructure, but the upgrades that operators need to prioritize will largely depend on which type of workloads they’re running: training or inference.
Training an AI model consumes enormous amounts of power (often more than 100kW per rack) and requires advanced cooling and electrical designs. AI inferencing workloads on the other hand, which were once considered less demanding, are evolving rapidly and becoming more complex. Inference has spread to a variety of environments: in the cloud, colocation facilities, on-premise data centers, and even at the edge. It’s at this inference stage where the real business value and financial rewards of AI will materialize, which is what makes optimizing for these workloads an imperative.
Together, training and inferencing are redefining what data centers must be able to handle, not only in scale, but in density, flexibility, and resilience. To stay competitive, facility operators must understand the shifting landscape of AI workloads and proactively adapt their infrastructure to meet a new era of intelligent computing.
AI training vs. inferencing: The infrastructure divide
AI training and inferencing present vastly different demands on data center infrastructure. Training is the process of teaching AI models to understand patterns from vast amounts of data and requires clusters of high-performance servers packed with the latest GPUs. These accelerators are well-suited for training because they offer a large number of cores and high-bandwidth memory. They work in parallel to process trillions of parameters, often across multiple racks that are operating as a single, large virtual compute node. As a result, power densities regularly exceed 100kW per rack, and advanced thermal management, like direct-to-chip liquid cooling and rear door heat exchangers, becomes non-negotiable.
Inferencing, by contrast, is when a trained model makes predictions or decisions based on new, unseen data in real time (as opposed to the data it was trained on). While typically requiring less power per server than training, inferencing workloads are increasingly varied and pervasive.
They now range from simple chatbot prompts to complex real-time analysis in healthcare, retail, and other industries using autonomous systems. Depending on the deployment and workload, inference environments can range from less than 40kW up to 80kW per rack for more advanced use cases.
By 2030, we expect the data center market to reflect this diversity:
~25% of new builds will be <40kW/rack, primarily inference-focused.
~50% will fall into the 40–80kW/rack range for mixed inference and training workloads.
~25% will exceed 100kW/rack, dedicated to large-scale training clusters.
Where is inferencing happening and why does it matter?
While AI training tends to be centralized in massive hyperscale facilities due to its intensive hardware requirements, inferencing is where AI goes to work, and it’s happening everywhere.
Public cloud
The public cloud remains the dominant environment for inferencing today due to flexibility, scale, and ecosystem maturity. As a result, many organizations begin their AI journey in the cloud for both development and inference.
To develop effective inference models, cloud data centers must be equipped with GPU-accelerated servers, high-throughput networking, and intelligent workload orchestration capable of scaling resources to meet demand. Additionally, advanced thermal management and energy-efficient architectures are gaining importance to maintain performance and sustainability at scale, particularly as AI inference workloads grow in complexity and volume.
Colocation & on-premise
As use cases become mature, organizations seek more control. When trying to manage costs, latency, or data sovereignty, inference workloads are moving into colocation and on-premise environments. This becomes more important in regulated industries like healthcare, financial services, and manufacturing, where localized AI models can deliver real-time insights without data leaving the premises.
In these environments, future-proofing infrastructure is critical. Racks that support 20kW today may need to handle double that in the near future. Cooling, power distribution, and backup systems must be scalable and adaptable to changing workload demands.
Edge computing
In some of the most dynamic AI applications, including autonomous vehicles, smart retail, or telecom infrastructure, inferencing must happen at the edge. Here, real-time decision-making is essential, and therefore eliminating latency from sending data back to the cloud is paramount.
Edge locations often operate under strict space and energy constraints, requiring compact, high-efficiency designs. The ability to deliver performance in a smaller footprint while maintaining thermal and power stability is a critical success factor.
Designing infrastructure for AI workloads
As inference accelerates across these varied deployment environments, infrastructure design must become more flexible, modular and optimized for density and efficiency. Up until now, training has been the center of the industry’s attention. Those clusters require large amounts of power. Optimized inference workloads that run over and over again on new data, on the other hand, are expected to use fewer IT resources and be optimized for power management.
Businesses can expect to see process and automation improvements when inference workloads that are specialized by industry are put into action.
Based on the expected rack densities of training and inference workloads, data centers can adhere to a few specific digital infrastructure design best practices to support these workloads.
Supporting training: Ultra-High-density strategies
Training workloads demand infrastructure that can support >100kW per rack power densities.
Liquid cooling technologies, such as direct-to-chip and rear door heat exchangers, to manage thermal loads.
Modular electrical architectures with scalable capacity and redundancy.
High-performance interconnects to enable GPU clusters to function as unified “compute platforms.”
Supporting training workloads means planning for growth in thermal design power (TDP). As accelerators evolve, their power draw will only increase, making flexible, scalable cooling essential.
Supporting inference: Medium-to-High-Density design strategies
Inference workloads must be optimized for repeatability, efficiency, and low-latency responsiveness. Infrastructure should be designed to:
Support at least 40kW per rack, though actual rack densities will vary greatly depending upon workload.
Use compressed, optimized models that consume fewer resources while delivering fast results.
Rely on hot/cold aisle containment for <40kW deployments, with potential upgrades to liquid cooling as density increases.
Deploy intelligent power distribution units (PDUs) capable of dynamic scaling to accommodate changing loads.
Overall, rack power and cooling strategies should be matched to the deployment strategy with an eye on future-proofing infrastructure for anticipating growth in density and complexity in the near future.
The importance of software
For both inference and training, infrastructure must be designed with monitoring, automation, and lifecycle support embedded into the architecture.
Tools like data center infrastructure management (DCIM), electrical power monitoring systems (EPMS), building management systems (BMS), and digital electrical design software are now foundational. Managing clusters of high-power, liquid-cooled servers alongside traditional air-cooled IT requires precise coordination and software that enables real-time monitoring, capacity planning, and automated response. This becomes a frontline defense against risks, especially as operators navigate hybrid environments and mixed workloads.
What’s next: Preparing for evolving inference demands
As generative AI matures, inference workloads will become more complex and ubiquitous. Three key trends are shaping the future:
More Complex Models: Inference will involve larger models, multimodal capabilities (text + image + video), and increased context windows. These trends will push power densities higher, even outside of training environments.
Closer to the Data Source: Demand for low-latency, on-site decision-making will grow. Expect to see more high-density inference moving to edge and colocation environments.
AI-as-a-Service Proliferation: Managed AI services will decentralize inference even further, requiring modular infrastructure capable of supporting a wide range of customer workloads.
The result? Data centers will need to support a broad spectrum of power and cooling configurations, from lightweight edge servers to dense racks running inference and training side by side. Designs must account for the rising TDP of accelerators, increasingly mixed workloads, and the growing importance of energy efficiency and sustainability.
Whether you’re training massive models in hyperscale environments or deploying optimized inference at the edge, these workloads demand new thinking around power and cooling.
While this piece offers a foundation for designing around AI, there’s no universal blueprint. Every deployment is different. The type of AI workload, GPU power requirements, and cluster size will directly influence rack densities, and in turn, your facility’s infrastructure needs.
Operators looking to stay ahead should dive deeper into how AI is reshaping data center design. Understanding the unique demands of training and inference is the first step; building infrastructure that can evolve alongside them is the real challenge.
When all pieces of the design equation (from cooling to power to software management) are aligned, operators will be prepared not only for today’s AI workloads, but for the more complex, distributed, and demanding workloads yet to come.