Domestic developers looking at Google Gemini: Although it is embroiled in controversy over "counterfeiting", it has found a way beyond OpenAI
六月清晨搅
发表于 2023-12-13 11:05:11
254
0
0
It has been almost a week since Google launched its most powerful model, Gemini, and many domestic AI companies are trying to explore the power of this large model.
Unlike many large models previously launched in the industry, Google Gemini has bypassed the textual aspect and relied directly on visual and sound to understand the world, despite its on-site demo being suspected of fraud and excessive exaggeration of its capabilities.
Gemini's demonstration videos have led many users to mistakenly believe that Gemini can read video information in real-time and answer user questions through understanding. However, in reality, Google employees only generate these responses through prompts. Image source: Google
In order to understand the impact of Gemini's emergence on OpenAI and other AI companies, Interface News recently visited the business leaders and developers of several top generative AI companies. They believe that Gemini's biggest feature is its "native" multimodal large model.
"In theory, native multimodal models are more effective than 'concatenated' multimodal models because the latter is prone to encountering bottlenecks during the training phase." Chen Yujun, the AI manager of Recurrent Intelligence, told Interface News reporters that as Gemini has not been deeply used yet, its actual advantages need to be further understood.
Several start-up developers of large models have stated that even though the largest size Ultra in the Gemini series has not yet been officially launched, Gemini has already demonstrated the same level of ability as the GPT-4 in terms of text.
In the benchmark test set released by Google, Gemini Ultra performs better than GPT-4 in most text tests and GPT-4v in almost all multimodal task tests. If using the testing conditions of GPT-4 as a benchmark, Gemini Ultra performs weaker on MMLU than GPT-4, but still outperforms other mainstream large models. Image source: Gemini Technical Report CITIC Construction Investment Research Report
In Gemini's demonstration video, this large model seems to be able to observe human behavior in real-time and provide feedback, for example, it can perfectly describe the process of a duck from sketching to coloring; Can track paper balls in the cup changing game and assist in solving math and physics problems; Can distinguish gestures, engage in hands-on classroom games, and rearrange planetary sketches.
Developers generally believe that regardless of the geometry of the fake components, Gemini has demonstrated strong abilities in understanding, reasoning, creation, and real-time interaction, achieving a comprehensive surpassing of the OpenAI multimodal model GPT-4v. Google's response has also been widely accepted by the industry, "All user prompts and outputs are genuine, only shortened for simplicity."
The GPT-4v, which was low-key released by OpenAI three months ago, can perform multimodal tasks such as comprehension and image generation, but the results are not very good, and its key reasoning ability is to cooperate with other models to complete. And abstract reasoning ability itself is the most critical ability of large models.
Image source: CITIC Construction Investment
Yin Bohao explained to Interface News that GPT-4v and Gemini are based on two completely different training logics. "GPT-4v is a nearsighted person who cannot see clearly, so its performance is not good. It is a typical cheating scheme. Gemini trains multiple modalities together."
But in the opinion of an algorithm manager at a multimodal large model company, Gemini should not have completely surpassed GPT-4. "During the evaluation, GPT-4 and Gemini did not form a completely fair comparison in text generation."
Many netizens have also tested and expressed that the Gemini Pro's ability to search for objects and images accurately surpasses the GPT-4. For this situation, Liu Yunfeng from Zhuiyi Technology believes that Google's search business naturally has text and other modal aligned data, which is indeed more conducive to training native multimodal large models.
Gemini is able to correctly recognize handwritten answers from students and verify the reasoning process of physics problems. Image source: Gemini Technical Report
Any major move by Google in the field of artificial intelligence will unlock emerging exploration directions in the market, but before the release of Gemini, the trend towards comprehensive multimodality of AI models had become increasingly clear.
As early as the release of GPT-4 in March, OpenAI stated that it would add multimodal integration in this iteration. Starting from September, star companies such as Runway, Midjournal, Adobe, and Stability AI have successively launched multiple multimodal products.
On the domestic side, Baidu's Wenxin Big Model 4.0 has made significant progress in the field of cross modal cultural and biological images. The largest model startup in China, Zhipu AI, has the highest public financing, and its generative AI assistant Zhipu Qingyan has significant advantages in the visual field.
Multiple developers have told Interface News that multimodal big models are recognized as a clear development direction in the industry and will not be awakened by Google's big actions. However, the arrival of Gemini will stimulate domestic companies to accelerate research and development. The algorithm manager of the aforementioned multimodal large model company also pointed out Gemini's limitations, "its ability in image generation and its reference significance in video and image generation are limited."
At present, it is difficult to come to the conclusion that Gemini has completely surpassed the GPT-4, but it is an undeniable fact that Google has become the strongest opponent of OpenAI. It also proved a truth with Gemini: any multimodal large model must rely on the training process of the large language model in order to achieve true multimodal AI.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- AstraZeneca acquires a Chinese cancer drug developer with a transaction value of approximately $1.2 billion
- The Baidu Create AI Developer Conference will be held from April 16th to 17th
- Meta is considering showcasing its AR glasses Orion at the fall developer conference
- Apple Developer Conference scheduled for Dragon Boat Festival, executive tweet suggests AI elements will be included
- Robin Lee's first speech at Baidu Create AI developer conference on April 16 will bring three development artifacts
- One quarter of Baidu's code is written by AI programmers, and now individual developers can use Baidu Comate for free
- Outlook for Google I/O Developer Conference: Faced with a pincer battle between OpenAI and Microsoft, imminent
- A fresh move or all? Google Developer Conference launches 22 consecutive moves to counter OpenAI
- Only for paid developers! The debut of "Apple Intelligence" features comprehensive upgrades such as Siri, but ChatGPT has not yet been integrated. This time, Apple has "abandoned" Nvidia
- Apple responds to EU regulations, EU developers will be able to promote products independently
-
【英偉達の需要が高すぎる?SKハイニックス:黄仁勲がHBM 4チップの6カ月前納入を要求!】SKハイニックスの崔泰源(チェ・テウォン)会長は月曜日、インビダーの黄仁勲(ファン・インフン)CEOが同社の次世代高帯域 ...
- 琳271
- 前天 17:54
- 支持
- 反对
- 回复
- 收藏
-
ファイザーが前立腺がんを治療する革新薬テゼナ& ;reg;(TALZENNA®,一般名:トルエンスルホン酸タラゾールパーリカプセル)は2024年10月29日に国家薬品監督管理局(NMPA)の承認を得て、HRR遺伝子突然変異 ...
- 什么大师特
- 昨天 17:41
- 支持
- 反对
- 回复
- 收藏
-
南方財経は11月5日、中央テレビのニュースによると、現地時間11月5日、米ボーイング社のストライキ労働者が59%の投票結果で新たな賃金協定を受け入れ、7週間にわたるストライキを終えた。ストライキ労働者は11月12 ...
- Dubssgshbsbdhd
- 昨天 16:27
- 支持
- 反对
- 回复
- 收藏
-
【マスクはテスラが携帯電話を作ることに応えた:作れるが作らないアップルとグーグルが悪さをしない限り】現地時間11月5日、有名ポッドキャストのジョローガン氏のインタビューに応じ、「携帯電話を作るのは私たち ...
- 波大老师
- 昨天 14:41
- 支持
- 反对
- 回复
- 收藏