Robin Lee breaks the illusion of "running points" of big models: the list does not mean that the gap between all future models will become larger
丽颜美容院郧
发表于 2024-9-11 19:00:44
1167
0
0
Whenever a new version of the big model is released, the industry is always enthusiastic about referencing third-party ranking data, using their own big model and GPT-4 together; quot; Run a score& quot;, Claiming to have surpassed certain indicators in order to demonstrate their expertise in large-scale modeling technology.
But in a recent communication between Baidu Chairman Robin Lee and internal employees, he broke the gap in the big model industry& quot; Window paper& quot;。" Every time a new model is released, I have to compare it with GPT-4o and say that my score is already similar to it, and even exceeds it in some individual items, but this does not mean that there is no gap with the most advanced model& amp;quot;
He further explained that the differences between models are multidimensional. One dimension is the gap in basic abilities such as comprehension, generation, logical reasoning, and memory; Another dimension is cost. Although some models can achieve the same effect, their high cost and slow inference speed are actually not as good as advanced models.
&Amp; quot; Another issue is the over fitting of the test set. Every model that wants to prove its ability will go to the leaderboard, and when it comes to the leaderboard, it has to guess what others are testing and which questions I can use what techniques to do correctly. Therefore, from the leaderboard or test set, you may think that the abilities are very close, but there is still a significant gap in practical applications& amp;quot; Robin Lee said.
A large model practitioner told the reporter that Robin Lee mentioned the over fitting of the test set, which mainly refers to the phenomenon that the model learned the training data too carefully during the model training process, so that the model performed very well on the training data, but performed poorly on the test data that he had never seen before. This usually means that the model is too complex, to the point where it can& quot; Remember& quot; The noise and details in the training data are not universal, so the model cannot generalize well to more new data.
The above-mentioned individuals believe that there are indeed limitations to ranking and scoring, for example, due to the openness of the evaluation dataset, models can be trained in a targeted manner to improve rankings, resulting in; quot; Brushing the charts& quot; Although it is a phenomenon, it is not completely meaningless. The ranking still provides a quantitative evaluation standard, helping people quickly understand the performance of different large models, promoting continuous optimization of the technical level of large models through competition, and also has a certain promotional and advertising effect.
In Robin Lee's opinion; quot; The hype from some self media, coupled with the motivation to promote each new model when it is released, gives people the impression that the differences in capabilities between models are already relatively small, but in fact, it is not the case& amp;quot; Robin Lee said that in the actual use process, Baidu does not allow technicians to compete in the rankings. The real measure of the ability of the big model should be in specific application scenarios to see whether it can meet user needs and generate value gains.
And for the large model industry, it is often mentioned that; quot; Leading by 12 months or trailing by 18 months; quot;, He doesn't think it's that important either. Because every company operates in a perfectly competitive market environment, there are many competitors in any direction they pursue& amp;quot; If you can always guarantee a lead of 12-18 months over your competitors, then you are invincible. Don't think that 12-18 months is a short time. Even if you can guarantee a lead of 6 months over your competitors, you have won. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share& amp;quot;
He judged that the gap between large models in the future may continue to widen. Due to the high ceiling of the large model, it is still far from the ideal situation, so the model needs to be constantly iterated, updated, and upgraded quickly; We need to invest continuously for several years or even decades to meet user needs, reduce costs, and increase efficiency.
In addition to discussing whether there are barriers to the competition of big models, Robin Lee also mentioned that there are quite a lot of misunderstandings about big models in the outside world, including open source closed source model efficiency, AI Agent and other topics.
Robin Lee is a firm supporter of the closed source model; quot; Before the era of big models, people were accustomed to open source meaning free and low cost& amp;quot; He explained that, for example, open-source Linux is free to use because computers already exist. But these are not valid in the era of big models. Big model inference is expensive, and open-source models do not provide computing power. You have to buy your own equipment, which cannot achieve efficient utilization of computing power.
&Amp; quot; Open source models are not efficient& amp;quot; He said,& quot; To be precise, the closed source model should be called the business model, which is a machine resource and GPU used by countless users to share research and development costs and inference. The GPU usage efficiency is the highest, with Baidu Wenxin Big Model 3.5 and 4.0 having GPU usage rates of over 90%& amp;quot;
Robin Lee analyzed that the open source model is valuable in the fields of teaching and scientific research; But in the business world, when pursuing efficiency, effectiveness, and lowest cost, open source models have no advantages.
He also expressed his views on the evolution of the application of large models, with Copilot being the first to appear, providing assistance to humans; Next is the Agent intelligent agent, which has a certain degree of autonomy and can use tools, reflect, and evolve on its own; If this level of automation continues to develop, it will become an AI worker capable of independently completing various tasks.
At present, agents have attracted more and more attention from large model companies and customers. Robin Lee believes that although many people are optimistic about this development direction, so far, agents have not reached a consensus.
&Amp; quot; The threshold for intelligent agents is indeed very low; quot;, He said that many people don't know how to turn big models into applications, and intelligent agents are a very direct, efficient, and simple way to build intelligent agents on top of models, which is quite convenient.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Hesai Technology has been awarded the exclusive spot for laser radar on the next-generation model platform of Zero Run Automotive
- The delivery volume of Jike 7X model has exceeded 10000
- Meta releases new AI model: capable of self checking and reviewing the work of other AI models
- Huaqiangbei Merchant: iPhone 16 All Models Breakthrough
- Google launches new paid feature: using search results to help solve AI illusion problems
- Xiaopeng Motors announces launch of chip upgrade crowdfunding for different car models: successful, immediately developed, failed, original refund
- The delivery volume of Jike 7X model exceeds 20000
- Robin Lee said that the illusion of the big model has basically eliminated the actual measurement of ERNIE Bot?
- AI Weekly | Yang Zhilin claims that Kimi has over 36 million monthly active users; Robin Lee: The illusion of big model is basically eliminated
- The delivery volume of Jike 7X model exceeds 25000 units
-
11月21日、2024世界インターネット大会烏鎮サミットで、創業者、CEOの周源氏が大会デジタル教育フォーラムとインターネット企業家フォーラムでそれぞれ講演、発言したことを知っている。周源氏によると、デジタル教 ...
- 不正经的工程师
- 昨天 16:36
- 支持
- 反对
- 回复
- 收藏
-
アリババは、26億5000万ドルのドル建て優先無担保手形と170億元の人民元建て優先無担保手形の定価を発表した。ドル債の発行は2024年11月26日に終了する予定です。人民元債券の発行は2024年11月28日に終了する予定だ ...
- SOGO
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
スターバックスが中国事業の株式売却の可能性を検討していることが明らかになった。 11月21日、外国メディアによると、スターバックスは中国事業の株式売却を検討している。関係者によると、スターバックスは中国事 ...
- 献世八宝掌
- 前天 16:29
- 支持
- 反对
- 回复
- 收藏
-
【意法半導体CEO:中国市場は非常に重要で華虹と協力を展開】北京時間11月21日、意法半導体(STM.N)は投資家活動の現場で、同社が中国ウェハー代工場の華虹公司(688347.SH)と協力していると発表した。伊仏半導体 ...
- 黄俊琼
- 前天 14:29
- 支持
- 反对
- 回复
- 收藏