Robin Lee breaks the illusion of "running points" of big models: the list does not mean that the gap between all future models will become larger
丽颜美容院郧
发表于 2024-9-11 19:00:44
1111
0
0
Whenever a new version of the big model is released, the industry is always enthusiastic about referencing third-party ranking data, using their own big model and GPT-4 together; quot; Run a score& quot;, Claiming to have surpassed certain indicators in order to demonstrate their expertise in large-scale modeling technology.
But in a recent communication between Baidu Chairman Robin Lee and internal employees, he broke the gap in the big model industry& quot; Window paper& quot;。" Every time a new model is released, I have to compare it with GPT-4o and say that my score is already similar to it, and even exceeds it in some individual items, but this does not mean that there is no gap with the most advanced model& amp;quot;
He further explained that the differences between models are multidimensional. One dimension is the gap in basic abilities such as comprehension, generation, logical reasoning, and memory; Another dimension is cost. Although some models can achieve the same effect, their high cost and slow inference speed are actually not as good as advanced models.
&Amp; quot; Another issue is the over fitting of the test set. Every model that wants to prove its ability will go to the leaderboard, and when it comes to the leaderboard, it has to guess what others are testing and which questions I can use what techniques to do correctly. Therefore, from the leaderboard or test set, you may think that the abilities are very close, but there is still a significant gap in practical applications& amp;quot; Robin Lee said.
A large model practitioner told the reporter that Robin Lee mentioned the over fitting of the test set, which mainly refers to the phenomenon that the model learned the training data too carefully during the model training process, so that the model performed very well on the training data, but performed poorly on the test data that he had never seen before. This usually means that the model is too complex, to the point where it can& quot; Remember& quot; The noise and details in the training data are not universal, so the model cannot generalize well to more new data.
The above-mentioned individuals believe that there are indeed limitations to ranking and scoring, for example, due to the openness of the evaluation dataset, models can be trained in a targeted manner to improve rankings, resulting in; quot; Brushing the charts& quot; Although it is a phenomenon, it is not completely meaningless. The ranking still provides a quantitative evaluation standard, helping people quickly understand the performance of different large models, promoting continuous optimization of the technical level of large models through competition, and also has a certain promotional and advertising effect.
In Robin Lee's opinion; quot; The hype from some self media, coupled with the motivation to promote each new model when it is released, gives people the impression that the differences in capabilities between models are already relatively small, but in fact, it is not the case& amp;quot; Robin Lee said that in the actual use process, Baidu does not allow technicians to compete in the rankings. The real measure of the ability of the big model should be in specific application scenarios to see whether it can meet user needs and generate value gains.
And for the large model industry, it is often mentioned that; quot; Leading by 12 months or trailing by 18 months; quot;, He doesn't think it's that important either. Because every company operates in a perfectly competitive market environment, there are many competitors in any direction they pursue& amp;quot; If you can always guarantee a lead of 12-18 months over your competitors, then you are invincible. Don't think that 12-18 months is a short time. Even if you can guarantee a lead of 6 months over your competitors, you have won. Your market share may be 70%, while your competitors may only have 20% or even 10% of the market share& amp;quot;
He judged that the gap between large models in the future may continue to widen. Due to the high ceiling of the large model, it is still far from the ideal situation, so the model needs to be constantly iterated, updated, and upgraded quickly; We need to invest continuously for several years or even decades to meet user needs, reduce costs, and increase efficiency.
In addition to discussing whether there are barriers to the competition of big models, Robin Lee also mentioned that there are quite a lot of misunderstandings about big models in the outside world, including open source closed source model efficiency, AI Agent and other topics.
Robin Lee is a firm supporter of the closed source model; quot; Before the era of big models, people were accustomed to open source meaning free and low cost& amp;quot; He explained that, for example, open-source Linux is free to use because computers already exist. But these are not valid in the era of big models. Big model inference is expensive, and open-source models do not provide computing power. You have to buy your own equipment, which cannot achieve efficient utilization of computing power.
&Amp; quot; Open source models are not efficient& amp;quot; He said,& quot; To be precise, the closed source model should be called the business model, which is a machine resource and GPU used by countless users to share research and development costs and inference. The GPU usage efficiency is the highest, with Baidu Wenxin Big Model 3.5 and 4.0 having GPU usage rates of over 90%& amp;quot;
Robin Lee analyzed that the open source model is valuable in the fields of teaching and scientific research; But in the business world, when pursuing efficiency, effectiveness, and lowest cost, open source models have no advantages.
He also expressed his views on the evolution of the application of large models, with Copilot being the first to appear, providing assistance to humans; Next is the Agent intelligent agent, which has a certain degree of autonomy and can use tools, reflect, and evolve on its own; If this level of automation continues to develop, it will become an AI worker capable of independently completing various tasks.
At present, agents have attracted more and more attention from large model companies and customers. Robin Lee believes that although many people are optimistic about this development direction, so far, agents have not reached a consensus.
&Amp; quot; The threshold for intelligent agents is indeed very low; quot;, He said that many people don't know how to turn big models into applications, and intelligent agents are a very direct, efficient, and simple way to build intelligent agents on top of models, which is quite convenient.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Zhihu: Accelerating Innovation in Large Model Applications and Launching a New AI Product "Zhihu Direct Answer"
- Baidu Robin Lee: "We don't want to roll models but applications!"
- Meta releases cultural 3D models
- Baidu Wenxin Big Model 4.0 Turbo Open to Enterprises, Wenxin Flagship Model Full Line Price Reduction
- NIO releases intelligent driving world model NWM
- Google Gemma 2 Series adds 2 billion parameter models
- Can it solve 99% of usage scenarios! "Microsoft and Nvidia are betting that small models and big models are no longer popular?
- Baidu launches new search for Wen Xiaoyan, free open to Wenxin 4.0 big model capabilities
- Baidu launches new search for Wenxiaoyan, users can experience Wenxin Big Model 4.0 for free in September
- Robin Lee's internal speech exposes that the gap between the future big models may become larger and larger
-
【ゆとりサイクルが始まった!歴史を鑑とする:FED金利調整と大統領選結果には“隠れた関連”があった】データによると、選挙日前に金利が引き上げられた5年間で、大統領やホワイトハウスを支配していた政党が4年連 ...
- 不正经的工程师
- 前天 21:56
- 支持
- 反对
- 回复
- 收藏
-
【トヨタが米国で約4万2000台のカローラCrossをリコール】トヨタ自動車は当時時間9月20日、2022年6月から2024年9月までに生産された一部のカローラCrossハイブリッド車を安全にリコールしていると発表した。リコー ...
- 伍六三
- 6 小时前
- 支持
- 反对
- 回复
- 收藏
-
インテルは9月19日、イスラエルの自動運転技術会社Mobileyeの大株主として、ビジネスの将来に自信を持っており、現在、同社の株式の多数を切り離す計画はないと発表した。 ...
- hecgdge4
- 前天 20:20
- 支持
- 反对
- 回复
- 收藏
-
インタフェースジャーナリスト|伍洋宇 9月20日、発表会が2週間近く行われた後、アップルのiPhone 16シリーズが発売された。 この日の朝、上海は台風の影響で大雨に見舞われ、新しい携帯電話を集荷しようとした消費 ...
- hygxyxkg
- 昨天 12:33
- 支持
- 反对
- 回复
- 收藏