DeepSeek unveils new AI reasoning method

Chinese artificial intelligence (AI) start-up DeepSeek has introduced a novel approach to improving the reasoning capabilities of large language models (LLMs), as the public awaits the release of the company’s next-generation model. In collaboration with researchers from Tsinghua University, DeepSeek developed a technique that combines methods referred to as generative reward modeling (GRM) and self-principled critique tuning. The dual approach aims to enable LLMs to deliver better and faster results to general queries. The resulting DeepSeek-GRM models outperformed existing methods, having “achieved competitive performance” with strong public reward models, the researchers wrote. Reward modeling is a process that guides an LLM towards human preferences.

DeepSeek intended to make the GRM models open source, according to the researchers, but they did not give a timeline. The academic paper, published on the online scientific paper repository arXiv, comes amid speculation about the start-up’s next move following the global attention garnered by the firm’s V3 foundation model and R1 reasoning model. Reuters reported last month that DeepSeek-R2, the successor to R1, could be released as soon as this month, as the company rushes to capitalize on its rising profile. The release of DeepSeek-R1 rocked the global tech community with its cost-efficient performance that rivaled leading models. DeepSeek has remained tight-lipped about the rumored R2 release. It has not commented on the matter through official public channels, but a customer service account denied the report in a group chat with business clients, Chinese media outlets reported last month.

While Hangzhou-based DeepSeek, founded in 2023 by entrepreneur Liang Wenfeng, has been in the global spotlight for the past couple of months, the company has largely avoided public communication, preferring to focus its energy on research and development (R&D). The company last month upgraded its V3 model, named DeepSeek-V3-0324, which it said offered “enhanced reasoning capabilities, optimized front-end web development and upgraded Chinese writing proficiency”. In February, it also open-sourced five of its code repositories, allowing developers to review and contribute to its software development. The start-up promised to make “sincere progress with full transparency”. That same month, Liang published a technical study on “native sparse attention”, a method of improving the efficiency of LLMs in processing large amounts of data. Liang, 40, is also the founder of DeepSeek’s parent company High-Flyer Quant, the hedge fund whose deep pockets have funded the start-up’s technical advances, the South China Morning Post reports.