Express Computer
Home  »  Guest Blogs  »  The rise of AI factories: Powering an era of pervasive intelligence

The rise of AI factories: Powering an era of pervasive intelligence

0 12

By Prith Banerjee, Senior Vice President of Innovation at Synopsys

For artificial intelligence to become the next great global economic engine, it will require an entirely new means of production.

Tech giants are betting hundreds of billions of dollars on a future powered by AI. By 2030, McKinsey projects investments of as much as $7 trillion in AI-related data centers — one of the largest infrastructure buildouts in history, surpassing previous technology booms in both scale and pace.

In India alone, Google is building a gigawatt-scale AI hub in Visakhapatnam. Microsoft is expanding its cloud and AI footprint in Pune and Chennai and creating a new “India South Central” region in Hyderabad. In partnership with NVIDIA, Reliance Jio is developing a major AI data center in Jamnagar for nationwide GPU-as-a-service offerings. TCS is planning a 1-gigawatt AI data center, likely in Gujarat or Maharashtra, to support startups, hyperscalers, and government institutions. And as part of its Stargate project, OpenAI is actively scouting locations in India for what could become one of the largest AI data centers in all of Asia.

Beyond just big
But size isn’t everything. Even as data centers grow larger and more powerful, AI demands a distinct computing architecture — a shift that makes the transition from mainframe to cloud seem rather quaint.

The growth of AI represents a fundamental transformation in how the world builds and operates computing infrastructure. While traditional data centers are designed for general-purpose workloads, AI superclusters are purpose-built facilities that function as industrial-scale intelligence production systems. And their output is defined by new metrics — most notably tokens per watt and tokens per dollar — that quantify the efficiency and productivity of intelligence at scale.

Building for AI production
To generate and process the massive volumes of data across the full spectrum of AI production — from data ingestion and model training to fine-tuning and large-scale inference — AI data centers need to overcome enormous engineering design challenges.

Addressing these complex issues requires a transformative approach that impacts every aspect of the system design and its individual components, right down to the silicon itself.

Specialized AI chips
To handle massive parallel computations and real-time inference, the semiconductor industry is innovating with specialized AI chips, including graphic processing units (GPUs), application-specific integrated circuits (ASICs), and custom accelerators like neural processing units (NPUs). Although these deliver extreme performance per watt, they also push up power density, thermal load, and interconnect complexity, which in turn requires re-engineering boards, racks, and facility infrastructure to maximize their performance.

Interconnect bottlenecks
AI training runs require terabytes of data across thousands of compute nodes, and even the highest-speed links between chips and server components can present bottlenecks. As a result, AI data center servers need optimized GPU‑to‑GPU links, CPU‑to‑memory buses, PCIe, and rack‑scale fabrics to achieve ultra‑low‑latency, high‑bandwidth communication between chips or nodes.

Memory constraints
A related challenge is memory. Training complex AI models requires substantially larger memory pools and high-bandwidth memory (HBM) to avoid hitting the “memory wall” — a situation where insufficient memory forces frequent data transfers to slower storage tiers, dramatically reducing performance. To feed specialized AI chips with as much data as they can process, system designers need to place optimized memory close to the processors running the AI workloads.

Advanced packaging
To deliver the performance at scale that AI requires, silicon designers are increasingly turning to multi-die designs, including 3D integrated circuits (3DIC) and chiplet-based architectures. While these chip designs offer gains that traditional monolithic SoCs cannot achieve cost-effectively, they also introduce significant complexity to the design process.

Security
Complex, multi‑die SoCs, chiplets, and high‑bandwidth interconnects introduce an expanded attack surface, where a single weak link in the silicon or protocol stack can expose entire clusters to compromise. Protecting models and data now requires end‑to‑end cryptographic safeguards across memory, PCIe, and network fabrics, plus a silicon-level security stack, to ensure the integrity and confidentiality of workloads in motion and at rest.

Challenges at scale
The challenges of AI production do not stop at the servers. Training clusters span tens of thousands of GPUs, so facilities must be capable of supporting higher rack densities, gigawatt‑class campuses, and rapid capacity growth — all of which are essential for growing AI compute workloads.

Networking
Poorly designed data center networks leave GPUs idle, and the bandwidth capabilities of traditional “leaf-and-spine” cabling models won’t cut it. AI servers demand four to five times more fiber connections with high-bandwidth, ultra-low-latency fabrics that use multiple specialized technologies.

Power management
As has been widely covered in the media, some AI factories can consume as much energy as a small city. Racks drawing 30–100 kW or more — equivalent to approximately 75-100 homes — represent a 10× increase in power density compared to traditional data centers. And the power requirements of next-generation racks are set to rise.

Achieving sustainable power usage requires multi-layered strategies, including at the silicon level.

Designing chips for high power densities that stay within tight energy and cooling budgets makes power delivery not just a circuit issue, but a packaging and system‑level problem. Every milliohm in the power path turns into heat that must be removed without throttling performance. At the same time, minimizing data movement on-chip helps ensure each joule delivers more productive AI tokens, reducing stress on both the data center power system and its cooling infrastructure.

Thermal management
As far as heat goes, the math is simple: more racks of AI servers = more heat. Traditional data center air cooling is insufficient, and once again, what happens at the silicon level has an exponential impact. Thermal loads are highly concentrated and continuous under AI training workloads, creating hot spots in both the silicon and the stack that require more aggressive heat‑spreading, advanced materials, and liquid cooling to avoid reliability loss and clock throttling.

Enabling a new economy
The rise of AI factories signals a fundamental shift, redefining computing infrastructure as industrial-scale intelligence production. It demands innovation at every level — from the foundational silicon to the overarching supercluster. Integrated tools and IP that enable design, simulation, and verification across chips, packages, boards, racks, and entire campuses are essential for overcoming the interconnected challenges of AI at scale.

As investments and complexity continue to grow, those who master the end-to-end design and operation of AI factories will shape the future of global technology. These purpose-built facilities are set to become the backbone of a new economy, powering an era of pervasive intelligence and unlocking transformative opportunities across industries.

– Prith Banerjee is Senior Vice President of Innovation at Synopsys. He works closely with technical leaders, partners, and customers to drive advancements in EDA, simulation and analysis, and IP, while shaping the company’s long-term technology strategy and vision for engineering innovation.

Leave A Reply

Your email address will not be published.