The Beijing government has unveiled a new draft policy supporting the city’s artificial intelligence (AI) industry, which includes the provision of state-funded computing power to relevant companies, as the country sees renewed interest in the technology amid intensified rivalry with the U.S. The policy, which is aimed at “grabbing and seizing” opportunities in large language models (LLMs) and the field of artificial general intelligence (AGI), tasks Beijing with facilitating the development of an innovative ecosystem to bolster its leading position in AI innovation. Besides LLMs and AGI, three other areas are highlighted: computing power, training data, and regulations.
Beijing’s move to coordinate computing resources, which is one of the most critical components for training AI models, comes as the U.S. has been implementing export restrictions on high-end chips, including California-based Nvidia’s top-end A100 graphics processing unit, which is used for training and deploying large-scale AI models. The draft policy calls for top-tier public cloud service providers to collaborate and pool their computing power for use by Beijing-based tertiary institutions, research facilities, and small- and medium-sized enterprises (SMEs). Public cloud refers to any technology platform that lets customers access shared storage and computing power via the internet.
Under the policy, the development of computing power projects will be accelerated to support the training of LLMs with hundreds of billions of parameters. These include the Beijing AI Public Computing Platform in Haidian district, and the Beijing Digital Economy Computing Center in Chaoyang district. Parameters are the variables used in training an AI model. In general, the more the parameters, the more powerful a model becomes. GPT-3, the model originally underpinning Microsoft-backed OpenAI’s ChatGPT, has around 175 billion parameters. Google’s competing service Bard was trained with 137 billion parameters.
However, the Beijing government, in its draft policy, also acknowledged the challenge posed by the lack of high-quality Chinese language materials that can be used as training data. In light of the difficulties, Beijing will be looking to combine existing open-source pre-training data sets with better Chinese text found on the internet to correct corrupted or inaccurate records before putting them to use, a process it calls “cleansing”. Cleaned data that is in compliance with Chinese regulations, including text, images, audio and video, will be open for public use through Beijing’s International Big Data Exchange. But China’s censorship, which limits the amount of data available for training AI models, could impede its technological ambitions, analysts have warned, as reported by the South China Morning Post.