Google's "counterattack" has sparked investor discussions on the commercialization of multimodal models
白云追月素
发表于 2023-12-11 11:13:42
266
0
0
After the one-year anniversary of ChatGPT's launch, recently, Google's multimodal Gemini suddenly launched, which was seen by the industry as the "strongest counterattack by Google". The discussion among investment institutions regarding Gemini has exploded. Industry insiders believe that Gemini has significant optimization in visual recognition and inference, and when it comes to commercial scenarios, real-time interaction scenarios may become the focus of multimodal artificial intelligence model applications.
Gemini is "too stunning"
Recently, Google CEO Sandal Pichai announced the official launch of Gemini 1.0. Eli Collins, Vice President of Google DeepMind Products, stated that this is Google's most powerful and versatile large model to date.
It is understood that compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
After watching the Gemini series demonstration video, many investors expressed that it was "too shocking". "After watching Gemini's demonstration video, its ability to understand multimodality is astonishing. In addition, Gemini's reasoning ability currently seems to surpass ChatGPT." Associate Professor Sun Haifeng from the School of Computer Science at Beijing University of Posts and Telecommunications said that on the one hand, Gemini far surpasses OpenAI's ChatGPT in multimodal information processing. Gemini can support both multimodal information input and multimodal information output. A typical feature of Gemini is its support for interleaved sequences of text, images, audio, and video as inputs, which is difficult to implement for ChatGPT or traditional multimodal models. Generally speaking, ChatGPT only supports text output, and other modalities of output require calling third-party APIs for implementation. Gemini's interleaved sequence input method is more suitable for the needs of the vast majority of scenarios. On the other hand, in Gemini's technical report, its accuracy in MMLU dataset testing reached 90.04%, surpassing human experts, marking a milestone in the evolution of its reasoning ability.
On the day after Gemini was launched, Google was questioned by the outside world for claiming that multimodal videos were edited and collaged, and Gemini was suspected of exaggerating its advertising. Google also provided an explanation: the video does indeed have elements of post production and editing, and all interactions with Gemini are not perceived in real time, but rather the effects of images and prompts given by the staff. Therefore, Gemini still needs further development in reading videos.
Real time interaction scenarios or commercial focus
Affected by this news, domestic investors have launched heated discussions on multimodal technology and its applications.
A first tier investor in a certain technology track stated that compared to ChatGPT-4, Gemini's image recognition and reasoning abilities, as well as its current apparent response speed, have greatly improved. He personally believes that Gemini and OpenAI have their own unique products, and suitable scenarios need to be found for commercial implementation. "Having suitable scene adaptation and identifying value-added needs is still crucial, but Gemini has indeed further opened up the imagination space of AI models."
"It can be boldly imagined that when a multimodal model runs on a robot, it may achieve embodied intelligence. In addition, when the multimodal model is combined with Google Glass, it may be upgraded to a super intelligent agent," said another investor.
A technician introduced that humans have five senses, and the world we build and the media we consume are presented in this way. The multimodal model means that Gemini can understand the world around her in the same way as humans and absorb any type of input and output - whether it's text, code, audio, images, videos. The most crucial technology among them is how to mix all these patterns, how to collect as much data as possible from any number of inputs and senses, and then provide equally diverse responses.
"Gemini is more like a human, closer to human visual recognition and some reasoning and judgment. OpenAI's ChatGPT is more like a big knowledge base, which can provide people with information reference. The two are not about who surpasses each other, but have significant differences in focus direction," said an investor.
Sun Haifeng said that it is not yet clear what the specific implementation structure of Gemini is, but this pattern that can interleave multiple modal information as input is very needed in many scenarios, especially real-time interaction scenarios.
Another technology investor believes that the release of Gemini means that big companies have a more definite first mover advantage in artificial intelligence. For example, Google's Gemini has outstanding visual reasoning capabilities because they have a variety of search engine based materials as a large amount of training data. In addition, large factories have obvious advantages in data, traffic, capital, computing power, and application scenarios.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Rolling crazy! The big model price war continues! Alibaba announces: 85% price reduction!
- European privacy regulators investigate Google's use of data for artificial intelligence models
- NIO China receives a new round of capital increase from strategic investors
- Tesla wins lawsuit over whether FSD system misled investors
- During the 11th week, there was an inflow of 5.2 billion US dollars! US investors are investing heavily in Chinese concept ETFs
- Hong Kong stock market's apple industry chain surges! What will be the future sales of the iPhone 16 Pro model as demand remains strong?
- Elon Musk's Cybercab is about to be released! Many domestic giants are competing to explore the Robotaxi China model
- After watching the Tesla Robotaxi launch event, investors were disappointed: limited surprises and insufficient details!
- Boeing proposes a 35% salary increase! The new contract has not yet been voted on, but investors' financial concerns have arisen
- Alibaba agrees to settle with investors and pay $433.5 million to settle class action lawsuit
-
【英偉達の需要が高すぎる?SKハイニックス:黄仁勲がHBM 4チップの6カ月前納入を要求!】SKハイニックスの崔泰源(チェ・テウォン)会長は月曜日、インビダーの黄仁勲(ファン・インフン)CEOが同社の次世代高帯域 ...
- 琳271
- 昨天 17:54
- 支持
- 反对
- 回复
- 收藏
-
ファイザーが前立腺がんを治療する革新薬テゼナ& ;reg;(TALZENNA®,一般名:トルエンスルホン酸タラゾールパーリカプセル)は2024年10月29日に国家薬品監督管理局(NMPA)の承認を得て、HRR遺伝子突然変異 ...
- 什么大师特
- 5 小时前
- 支持
- 反对
- 回复
- 收藏
-
南方財経は11月5日、中央テレビのニュースによると、現地時間11月5日、米ボーイング社のストライキ労働者が59%の投票結果で新たな賃金協定を受け入れ、7週間にわたるストライキを終えた。ストライキ労働者は11月12 ...
- Dubssgshbsbdhd
- 6 小时前
- 支持
- 反对
- 回复
- 收藏
-
【マスクはテスラが携帯電話を作ることに応えた:作れるが作らないアップルとグーグルが悪さをしない限り】現地時間11月5日、有名ポッドキャストのジョローガン氏のインタビューに応じ、「携帯電話を作るのは私たち ...
- 波大老师
- 8 小时前
- 支持
- 反对
- 回复
- 收藏