Cheetah Mobile's Fu Sheng: Data is the real barrier to big model competition
我放心你带套猛
发表于 昨天 23:50
120
0
0
21st Century Business Herald reporter Bai Yang reports from Beijing
In the fierce competition of AI big models, computing power resources and algorithm optimization have always been the focus pursued by major enterprises. However, as technology gradually matures, the focus of the industry is undergoing a subtle shift - from simple model training and computing power investment, to how to process and utilize massive, high-quality data.
In fact, data has become the decisive factor in whether big models can be successfully implemented. On November 27th, Fu Sheng, Chairman and CEO of Cheetah Mobile, clearly stated in an interview with 21st Century Business Herald that "algorithms and computing power are not the core competitiveness of big models, the real barrier is data
Fu Sheng mentioned that most large model companies do not have significant differences in algorithms. Although chips and algorithms are still crucial, their gap is not as profound as data. If the data is not of sufficient quality and quantity, no algorithm or computing power advantage can be fully utilized
The training of large models relies on a large amount of labeled data, which directly determines the actual performance of the model. Fu Sheng metaphorically said that a model is like a growing child, only by receiving the correct information can it learn correctly.
Data faces dual challenges of quality and quantity
However, in terms of data acquisition and utilization, the development of large models is facing many challenges.
Firstly, the real data available for training large models is becoming depleted. DeepMind delved into the Scaling problem in a paper and concluded that in order to fully train a model, its token count needs to be 20 times the number of parameters in the model.
Currently, it is known that GPT has the highest number of training tokens in closed source models, approximately 20T; The open source model with the highest number of training tokens is LLaMA3, which is about 15T. According to this calculation, if a 500 billion parameter Dense model wants to achieve the same training effect, it needs to train about 107T tokens, which is far beyond the amount of data currently available in the industry.
Therefore, using synthetic data has become a consensus for large models. According to forecast data, by 2026, all natural data will be used up by big models, and by 2030, artificial intelligence will use more synthetic data than real data.
But Fu Sheng believes that using synthetic data directly to train large models carries huge risks. Due to the inherent systematic biases in synthetic data, if it is directly used for training, the model may mistakenly consider these biases as routine, and in the long run, the model's cognition may have fatal flaws.
So the synthesized data also needs some processing, such as manual tuning or enhancement with other data, to improve the quality of the synthesized data.
The most significant issue with real data is the low utilization rate. Many companies have sufficient data, but the performance of the large models trained is always unsatisfactory, also because their data quality is not high enough.
Explore business opportunities in data services
Based on this, Cheetah Mobile also sees a business opportunity, and its holding company, Orion Starry Sky, has launched a new data service product - AI Ready Data Service (AirDS).
The services provided by AI Data Treasure AirDS include data collection, cleaning, annotation, prompt word engineering, and evaluation. Fu Sheng stated that because Cheetah Mobile is also training large models, compared to traditional data annotation companies, Cheetah Mobile has a deeper understanding of large models and is better able to meet the data needs of enterprises.
It should be pointed out that current data services still rely on manual labor. In the era of big models, tools can be used to improve efficiency in data filtering, cleaning, and other processes. However, in order to obtain high-quality data, manual fine annotation is still indispensable.
Fu Sheng stated that in the era of big models, Cheetah Mobile's core business model is not to make money through model interfaces, but to create value by helping customers implement AI applications.
The core of this business model is to conduct in-depth mining around the application scenarios of large models. Taking AirDS as an example, Cheetah Mobile uses data service products to help enterprise customers achieve a full process service from data cleaning to labeling, and then to application optimization. This not only greatly improves the AI application effectiveness of enterprises, but also creates huge commercial space for Cheetah Mobile.
At present, the successful cases of AI Databao have covered many industries, including mobile communication, Internet entertainment, new energy vehicles, etc.
Regarding the future development of large models, Fu Sheng believes that although technological bottlenecks have slowed down the iteration speed of models, the depth and breadth of application scenarios are constantly expanding. Especially in vertical industries such as search and enterprise services, with the improvement of data quality and application capabilities, AI is expected to bring revolutionary changes to the industry.
Next year will be a year of great prosperity for applications, "Fu Sheng predicted." The ability of big models has become relatively stable, and the next step of competition will depend more on how to apply big models in specific scenarios. As long as the scenarios are clear enough, their explosive power will be very strong
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
-
唐唯実(CarlosTavares)氏はStellantisグループCEOを辞任した。新CEOが就任する前に、Stellantisグループはグループ会長のジョン・エルカン(John Elkann)氏が議長を務める臨時執行委員会を設立する。 唐唯実はSt ...
- 情义无价943
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
ロイター通信が12月2日に見た内部メモによると、JPモルガン・チェースグローバル投資銀行のジェニファー・ネイソン(Jennifer Nanson)会長は、40年近くモーガンで働いていた来年初めに退職する。コンサルティング ...
- 123458133
- 前天 15:25
- 支持
- 反对
- 回复
- 收藏
-
Stellantisグループは現地時間12月1日、グループの取締役会が、すぐに有効になる唐唯実(Carlos Tavares)氏のStellantisグループ最高経営責任者辞任の要請を受け入れたと発表した。 StellantisグループのHenri de ...
- SNT
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
12月3日、極クリプトンと領克が戦略統合を発表した後、新会社は極クリプトン科学技術グループと命名された。 極クリプトン知能科学技術副総裁の林金文氏は、極クリプトンと領克の戦略統合後、「浙江極クリプトン知 ...
- SNT
- 前天 14:51
- 支持
- 反对
- 回复
- 收藏