2025 01 30 deepseek
Just a quick heads up. As early as Monday afternoon the Deepseek-r1 reasoning model was available to all of our agent technology licensees.
Just a quick heads up. As early as Monday afternoon the Deepseek-r1 reasoning model was available to all of our agent technology licensees.
DeepSeek trained its model on a cluster of 2,048 H800 GPUs, leveraging Nvidia's Hopper architecture. These GPUs are designed to excel at tasks like matrix multiplications and sparse attention, particularly when using FP8 mixed precision. While the H800 has lower interconnect bandwidth compared to the H100 due to export regulations, its computational efficiency remains strong for the kinds of workloads needed in large-scale AI training.
Using their stated figures, let’s verify whether the throughput aligns with the claimed GPU-hour budget.
Throughput per GPU-Hour:
\(\frac{14.8 \, \text{trillion tokens}}{2.664 \, \text{million GPU-hours}} = 5.56 \, \text{million tokens per GPU-hour}.\)
\(2,048 \, \text{GPUs} \times 5.56 \, \text{million tokens per GPU-hour} = 11.38 \, \text{billion tokens per hour}.\)
\(\frac{14.8 \, \text{trillion tokens}}{11.38 \, \text{billion tokens per hour}} = 1,300 \, \text{hours (or ~54 days)}.\)
This aligns with their claim of completing pretraining in less than two months.
DeepSeek-V3’s claim of processing 14.8T tokens in 2.664M GPU-hours is plausible. The numbers are internally consistent, and the described techniques align with established principles of efficient large-scale training. While reproduction by other labs will provide final confirmation, the absence of red flags suggests that DeepSeek's reported achievements are feasible.
For more details on DeepSeek-V3 and its training methodology, refer to the technical report on arXiv.
Qualifications: Through my work on the LIT platform, I’ve developed tools that enable data scientists to efficiently design, train, and deploy deep learning models, including advanced workflows for LLMs. Prior to that, I spent 8 years providing professional services in deep learning, building custom AI solutions across diverse industries. In support of that work I’ve read and analyzed hundreds of technical reports and academic papers. My expertise lies in building tooling, pipelines, and integrations for both predictive and generative AI, supported by a strong foundation in deep learning and software engineering.