All we hear and see is “Deep Seek,” but is it worth the hype and the market’s knee-jerk reaction?
By Naushath Raja Mohammed, Associate Vice President – Platform Engineering & AI Center Of Excellence, Sutherland
From attention mechanisms to the Deep Seek R1
Artificial Intelligence (AI) has come a long way in the past decade, and its transformation continues to shape industries across the globe. As a passionate AI leader, it’s exhilarating to witness how the evolution of AI technologies, from the 2017 introduction of “Attention is All You Need” to the recent release of DeepSeek R1 in 2025, is unlocking new potentials, addressing constraints, and driving accessibility in ways we couldn’t have imagined a few years ago.
The dawn of a new era: “Attention is All You Need” (2017)
In 2017, a ground-breaking paper titled “Attention is All You Need” introduced Transformer architecture, fundamentally changing how we process and understand data. Before this, AI models relied heavily on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to process sequential data. However, these models struggled with long-range dependencies, meaning they had difficulty understanding context over longer stretches of text or data.
The Transformer model, as introduced in the “Attention is All You Need” paper, solved this problem by using a mechanism called attention. This mechanism allowed the model to focus on relevant parts of the input data, regardless of their position, and process them in parallel. This change unlocked a new dimension in AI’s ability to handle contextual data, making it easier for models to understand meaning, relationships, and even nuance in the data.
This concept became the foundation of many future breakthroughs, including OpenAI’s GPT series, which would go on to change the way we interact with data.
ChatGPT and the changing landscape of data interaction (2022)
Fast forward to 2022, OpenAI introduced ChatGPT, an AI model based on the Transformer architecture. ChatGPT didn’t just improve how we understood data; it redefined the way we interact with it. This new form of AI, capable of generating human-like responses to queries, brought the conversational capabilities of machines to the forefront.
The implications were profound: businesses, educators, and individuals began to leverage AI for everything from automating customer service to providing instant learning assistance. By transforming how we interact with information, ChatGPT exemplified the incredible potential of AI in our daily lives.
However, while the tech was revolutionary, there were still significant challenges that needed to be addressed.
Constraints in the AI Landscape: Cost and Accessibility
The success of models like GPT-3 and GPT-4 (released later) proved AI’s power, but also brought a major constraint into focus: cost. Training and operating these models require huge amounts of computational power, which in turn demands expensive infrastructure—especially high-performance GPUs. The cost of running these models has been a barrier for many, with OpenAI’s pricing for GPT-4 running at approximately $60 per million output tokens.
At the same time, the availability of high-end GPUs was limited. Particularly in markets like China, where access to GPUs like the NVIDIA H100 was restricted, AI development faced significant roadblocks. Moreover, the reliance on proprietary models from tech giants left little room for smaller players to innovate.
DeepSeek R1: Democratising AI and Driving Affordability (2025)
Enter DeepSeek R1 in 2025—a game-changing release that directly addresses these challenges. DeepSeek R1 emerged as an open-source alternative, offering significantly cheaper access to high-performance AI models. For instance, where OpenAI charges $60 per million output tokens, DeepSeek R1 slashes this cost to just $2.19 per million tokens—a 96.4% reduction in cost.
What makes DeepSeek R1 truly transformative is not just its affordability but also its performance.
The model has been tested on various benchmarks, including AIME 2024, Codeforces, GPQA Diamond, MMLU, and SWE Bench Verified Testings, and has performed on par with the top models, including OpenAI -o1 mini , Open AI and DeepSeek R1’s performance has been a testament to how effective AI can be when constraints are addressed, making powerful tools more accessible to the broader community.
Refer to the image below from the research paper : DeepSeek-R1: Incentivising Reasoning Capability in LLMs via Reinforcement Learning.
The vision of Liang Wenfeng and the DeepSeek team
The story behind DeepSeek’s success is equally inspiring. Liang Wenfeng, co-founder of quantitative hedge fund High-Flyer, and his team encountered significant challenges during their development of the large language model (LLM). The primary issue? Limited access to high-end GPUs, especially in China, where many cutting-edge models were unavailable. But instead of seeing this as an obstacle, Liang and his team turned it into an opportunity to innovate.
By leveraging more affordable GPUs like the H800, DeepSeek R1 was able to achieve breakthrough results. While companies like OpenAI are investing heavily in expensive H100 GPUs, DeepSeek’s ingenuity in working with available resources allowed them to create an open-source LLM that performs just as well—and at a fraction of the cost.
A game changer for the AI landscape
DeepSeek’s open-source release is not just a technical marvel—it represents a democratisation of AI. By significantly reducing costs and offering accessibility to powerful tools, DeepSeek R1 has opened the door for smaller companies, startups, and research organisations to leverage state-of-the-art AI without the massive infrastructure investment typically required. This is the future of AI: more inclusive, more accessible, and ultimately more impactful.
The impact of DeepSeek’s release is already being felt across the industry. Within a single day of launching, DeepSeek’s affordable approach wiped off $1 trillion in market value from some of the largest US tech giants. This shake-up reflects the growing sentiment that affordability and open access to high-quality AI can lead to a fundamental shift in how AI is used and developed.
The road ahead
As we look toward the future, it’s clear that AI will continue to evolve, disrupt industries, and reshape the way we interact with technology. AI Agents, which have emerged as part of this wave, are poised to revolutionise business processes, from automating customer service to making complex decisions based on data analysis.
DeepSeek R1 represents a pivotal moment in this journey. It has reduced the financial barriers to entry for AI, enabling countless organisations to experiment, innovate, and scale. The release of this model is not just a step forward for AI technology but a leap toward making AI truly accessible to everyone.
Let me explain the critical difference between Deep Seek and ChatGPT using an analogy of cricket (my favourite game)
Imagine GPT models as a super experienced cricket coach. This coach has watched thousands of matches and can give you great advice based on what they’ve seen. But if you ask the coach how to play a tricky shot like Dhoni’s helicopter shot Or Rishabh’s sweep scoop —something that’s unique to Dhoni & Rishabh and only they truly understands—the GPT Coach might struggle.
That’s because the coach only knows what they’ve observed, played, and experienced during his past games and training, not something entirely new and creative.
Now, I picture Deep Seek as a young cricketer. This cricketer practices a lot, but more importantly, they think through challenges step by step. They don’t just rely on what others have taught them, they experiment, fail, and learn from those mistakes. For example, they’ll try to mimic Dhoni’s helicopter shot, fail, and then keep trying, learning with each attempt. This process of reinforcement learning continues until they master the shot. So, if you ask DeepSeek about a difficult shot like Dhoni’s helicopter or Rishabh’s sweep scoop, instead of giving a “I don’t know” or relying on past experiences, it will figure out the best approach by analysing the situation, even if it’s something completely new.
Deep Seek won’t just give you an old-school answer. It will break down the problem, figure out the best way to play the ball, and come up with its own strategy, thinking about the physics of the shot and how to handle the challenge—whether it’s something it’s seen before or not.
This explains why Deep Seek doesn’t require massive training, resulting in lesser dependency on GPU resources. As a result, it reduces the need for the high-powered hardware traditionally used in AI training. This shift in efficiency may be one of the reasons why giant chips and GPU manufacturers are facing pressure in the market.
The revolution is just beginning—and I, for one, can’t wait to see where it goes next.
What is the learning from this article
When faced with challenges, turn them into opportunities.
Disruptions are inevitable—it took 5 years for ChatGPT to emerge, and now Deep Seek has arrived in just over 2 years. There will always be another disruptor on the horizon.
Let’s seize this moment and build an AI-first culture in our organisation. Together, we can use AI to make human life smarter, better, and more empowered
Until we “AI” again,
Naushath