By Atul Rai, Co-founder & CEO, Staqu Technologies
Artificial intelligence has entered what appears to be a race of scale. Every few weeks, the industry sees the launch of another large language model with more parameters, more compute and marginally better benchmark scores. While these announcements generate excitement, it is worth examining whether building increasingly larger models is the most meaningful direction for AI progress.
Consider a mid-sized modern language model with roughly 30 billion parameters. Running such a model efficiently in production typically requires high-end GPU infrastructure such as the NVIDIA H100 or similar-class hardware. The model weights alone can occupy around 60 GB of memory, and once runtime buffers, inference frameworks and key-value caches are included, the practical requirement often reaches 70–90 GB.
The economics of operating such systems are revealing. A single high-end GPU instance in the cloud typically costs around $3–$4 per hour. If operated continuously, that translates to approximately
$2,500–$3,000 per month for a single GPU.
Yet the throughput of these systems is not unlimited. A 30B parameter model might generate roughly 100–120 tokens per second under typical deployment conditions. If an average user request involves around 300–400 tokens, including prompt and response, the system can realistically serve only around 10–20 concurrent users while maintaining acceptable latency.
This means infrastructure costing several thousand dollars per month may support only a few dozen simultaneous users. Scaling to thousands or millions of users requires hundreds or thousands of GPUs. Training frontier models is even more capital-intensive. Large-scale models are often trained on clusters consisting of thousands of GPUs running for weeks or months. The financial investment required for such training runs can reach tens of millions of dollars, placing the frontier of model development largely in the hands of a small number of well-funded organisations such as OpenAI, Google DeepMind and Anthropic.
Another important observation is that many modern language models share a similar architectural foundation. Most are derived from transformer-based designs introduced in 2017 and are trained on massive internet-scale datasets. While there are meaningful improvements in training techniques, alignment and multimodal capabilities, the fundamental approach across many models remains broadly similar.
This raises a broader strategic question for the industry. Is the future of artificial intelligence primarily about building larger models, or about building more useful systems? Many real-world AI deployments do not require frontier-scale language models. Enterprises often need solutions that integrate with domain-specific data, operate reliably in production environments and deliver measurable operational benefits. In many cases, smaller specialised models, multimodal analytics systems or hybrid architectures can solve these problems more efficiently.
The next stage of AI progress may therefore be defined less by parameter counts and more by practical deployment. Efficient models, domain intelligence and system-level integration could prove more important than simply scaling compute. Large language models represent a remarkable technological breakthrough. However, the long-term impact of artificial intelligence will likely depend not only on how large these models become, but on how widely and economically they can be deployed.
Ultimately, the question facing the industry is not just how to build bigger models, but how to build AI systems that are sustainable, accessible and genuinely useful at scale.