As someone who spent about 2 years working in AI training and use AI just about every day in my current line of work (although all of my Substack articles are human-written, with the occasional assist from AI in copyediting and grammar), I don’t profess to be an AI expert in terms of the ML and AI science side of things (most of my work here was in writing, editing, and project management), but I have been staying on top of what’s happening with DeepSeek.
You may already be tired of hearing about it, or maybe you have no idea what’s going on and why it could be an important event, but I will attempt to distill what’s going through my head on a high-level view while linking to articles that helped lead to these thoughts in case you want to do your own digging to form your own conclusions.
With that being said, let’s dive in.
Cutting Through the Noise
The common belief is that DeepSeek’s “sudden rise” caused Nvidia, and the stock market as a whole, to tank earlier this week. Understanding AI posits that the stock market drop was more likely tied to a tip that Trump would drop tariffs on chips made in Taiwan. His theory, which seems plausible based on the timeframes, is based on the fact that the market did not drop further after the actual announcement, meaning this was now “priced in” to the markets already. Also, the most recent DeepSeek model was released about a week prior to this drop occurring.
AI Proem breaks the veil on the myth that DeepSeek was built on $6 million – one of the main points leading to all of the current uproar and panic. Two quotes, one on the presumed actual costs, and another that indicates this is still an impressive feat:
“The technical paper published by DeepSeek says that the $6 million does not include “costs associated with prior research and ablation experiments on architectures, algorithms, and data.” The $6 million covers ONLY the final training run on Nvidia’s H800 chips. At the same time, total cost, including prior research, has been estimated to come to “hundreds of millions” based on GPU infrastructure cost.”
“Why it matters: Let’s say the total cost is ~$500 million. Even at that, DeepSeek’s spending is still a fraction of what OpenAI has spent. OpenAI has raised a total of $16.6 billion in funding to date and has burned through most of that on the training and inference of all its models.”
Then, there’s the news today, which really isn’t news, but ChatGPT came out and claimed that DeepSeek was built by tapping into OpenAI’s API, meaning it used the data OpenAI paid billions to curate in building its system and essentially piggy-backed off this to build DeepSeek.
This isn’t surprising because DeepSeek has been noted to respond to questions by saying that it is ChatGPT.
It helps explain how it was so quickly able to make inroads among the top frontier models.
It’s also a bit ironic when you consider the controversy around OpenAI (and other LLMs) pilfering data to initially train its own model.
The implications seem to run deep in terms of original content, copyright infringement, and the below comments about the political side of things.
Political Implications
Trump has already said the DeepSeek revelation is a “wake up call for America.”
With the recent whirlwind episode of TikTok, centering on America’s concern over China’s access to American data, naturally this sparked even more intense conversations about the same concerns related to DeepSeek.
With all of the concerns about export controls and tariffs, this is a good point from Gavin Baker, CIO of Atreides Management, LP: “kinda funny to restrict access to leading edge GPUs and not do anything about China’s ability to distill leading edge American models - obviously defeats the purpose of the export restrictions. Why buy the cow when you can get the milk for free?”
With the recent announcement of StarGate, in which a reported $100 billion will be invested into AI infrastructure (potentially growing to $500 billion), this all leads to a possible revisitation of how all of these policies should be approached to ensure US national security, protection of this monstrous investment, and what it could mean for the AI race, which is really just beginning.
Noah Smith has already written a few times previously about Cold War 2, and situations like this seem to indicate that we are already on the cusp of this proxy war occurring or are already in the midst of it.
The implication is that there is reason to believe the hype around DeepSeek may have been at least partially manufactured by misinformation from China, something it is no stranger to, and the interpretations of this implication will differ depending on who you’re talking to (or which government you’re a part of).
This is something we will likely see a lot more of in the coming years with AI being more and more an arbiter of information (for example, China censoring information about some content, such as Tiananmen Square, on DeepSeek), and not just from China, but perhaps from all sides. We’ve obviously already been living in a world where misinformation tends to run rampant. This could be the beginning of the surge of an information war considering the stakes involved, both in terms of controlling information presented and gathering information to build AI.
I do think the political implications may end up being much deeper here than the actual model itself.
DeepSeek in Action
That’s because, by many accounts, DeepSeek actually is not as good as other frontier LLMs in most areas.
I haven’t played around with it personally yet, but a few people I have spoken with and others I have read, have indicated that it is very strong in coding and math types of tasks but very weak in writing and reasoning tasks (your mileage and definition of these terms may vary). This is not to discount the power of being able to build a model that’s strong in these areas, but it does seem to discount some of the hype around DeepSeek.
There has always been a concern that AI labs game the benchmarking tests that determine which models are actually the best at which categories. For example, it’s easier to pass a test when you have all of the answers beforehand, so you know what to study. There have been inroads to improve this but there are a lot of investment dollars on the line to ensure the narrative aligns with showing success and progress, so always take these claims with a grain of salt.
But, DeepSeek has utilized some ingenuity to get here, too, which I think will be duplicated, as DeepSeek duplicated and built upon some things itself. These include self-learning, which may cut out the money spent on humans who help annotate data to build AI, and some of the methodologies to allow models to more effectively generate information with less energy and computing power. With the concerns about the power consumption required to run AI models, these could be positive developments.
Bottom Line
We are early in the AI race and what comes next and the overall implications are still unclear. I do feel these are things we should all pay attention to, both in terms of understanding a potential new technological revolution unfolding before our eyes and how we can play a role in it ourselves to not get left behind.
As people have said to me recently, “it’s not AI that will necessarily take your job, it’s people who know how to use AI effectively who will.” Take that with a grain of salt, depending on your line of work, but also don’t wait too long until you heed that advice.
Key takeaway: Every AI model is different, and a model like DeepSeek may be perfect for your particular workflow needs, or another one might be. There is no model that is superior in every area to other models. Test them out and find out which ones are most effective for you.
I don’t pretend to know everything, or maybe even that much about the situation, so if you have some other insights or thoughts, please share them. Thanks for reading!