Google's "counterattack" has sparked investor discussions on the commercialization of multimodal models
白云追月素
发表于 2023-12-11 11:13:42
272
0
0
After the one-year anniversary of ChatGPT's launch, recently, Google's multimodal Gemini suddenly launched, which was seen by the industry as the "strongest counterattack by Google". The discussion among investment institutions regarding Gemini has exploded. Industry insiders believe that Gemini has significant optimization in visual recognition and inference, and when it comes to commercial scenarios, real-time interaction scenarios may become the focus of multimodal artificial intelligence model applications.
Gemini is "too stunning"
Recently, Google CEO Sandal Pichai announced the official launch of Gemini 1.0. Eli Collins, Vice President of Google DeepMind Products, stated that this is Google's most powerful and versatile large model to date.
It is understood that compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
After watching the Gemini series demonstration video, many investors expressed that it was "too shocking". "After watching Gemini's demonstration video, its ability to understand multimodality is astonishing. In addition, Gemini's reasoning ability currently seems to surpass ChatGPT." Associate Professor Sun Haifeng from the School of Computer Science at Beijing University of Posts and Telecommunications said that on the one hand, Gemini far surpasses OpenAI's ChatGPT in multimodal information processing. Gemini can support both multimodal information input and multimodal information output. A typical feature of Gemini is its support for interleaved sequences of text, images, audio, and video as inputs, which is difficult to implement for ChatGPT or traditional multimodal models. Generally speaking, ChatGPT only supports text output, and other modalities of output require calling third-party APIs for implementation. Gemini's interleaved sequence input method is more suitable for the needs of the vast majority of scenarios. On the other hand, in Gemini's technical report, its accuracy in MMLU dataset testing reached 90.04%, surpassing human experts, marking a milestone in the evolution of its reasoning ability.
On the day after Gemini was launched, Google was questioned by the outside world for claiming that multimodal videos were edited and collaged, and Gemini was suspected of exaggerating its advertising. Google also provided an explanation: the video does indeed have elements of post production and editing, and all interactions with Gemini are not perceived in real time, but rather the effects of images and prompts given by the staff. Therefore, Gemini still needs further development in reading videos.
Real time interaction scenarios or commercial focus
Affected by this news, domestic investors have launched heated discussions on multimodal technology and its applications.
A first tier investor in a certain technology track stated that compared to ChatGPT-4, Gemini's image recognition and reasoning abilities, as well as its current apparent response speed, have greatly improved. He personally believes that Gemini and OpenAI have their own unique products, and suitable scenarios need to be found for commercial implementation. "Having suitable scene adaptation and identifying value-added needs is still crucial, but Gemini has indeed further opened up the imagination space of AI models."
"It can be boldly imagined that when a multimodal model runs on a robot, it may achieve embodied intelligence. In addition, when the multimodal model is combined with Google Glass, it may be upgraded to a super intelligent agent," said another investor.
A technician introduced that humans have five senses, and the world we build and the media we consume are presented in this way. The multimodal model means that Gemini can understand the world around her in the same way as humans and absorb any type of input and output - whether it's text, code, audio, images, videos. The most crucial technology among them is how to mix all these patterns, how to collect as much data as possible from any number of inputs and senses, and then provide equally diverse responses.
"Gemini is more like a human, closer to human visual recognition and some reasoning and judgment. OpenAI's ChatGPT is more like a big knowledge base, which can provide people with information reference. The two are not about who surpasses each other, but have significant differences in focus direction," said an investor.
Sun Haifeng said that it is not yet clear what the specific implementation structure of Gemini is, but this pattern that can interleave multiple modal information as input is very needed in many scenarios, especially real-time interaction scenarios.
Another technology investor believes that the release of Gemini means that big companies have a more definite first mover advantage in artificial intelligence. For example, Google's Gemini has outstanding visual reasoning capabilities because they have a variety of search engine based materials as a large amount of training data. In addition, large factories have obvious advantages in data, traffic, capital, computing power, and application scenarios.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- NIO China receives a new round of capital increase from strategic investors
- Tesla wins lawsuit over whether FSD system misled investors
- During the 11th week, there was an inflow of 5.2 billion US dollars! US investors are investing heavily in Chinese concept ETFs
- Hong Kong stock market's apple industry chain surges! What will be the future sales of the iPhone 16 Pro model as demand remains strong?
- Elon Musk's Cybercab is about to be released! Many domestic giants are competing to explore the Robotaxi China model
- After watching the Tesla Robotaxi launch event, investors were disappointed: limited surprises and insufficient details!
- Boeing proposes a 35% salary increase! The new contract has not yet been voted on, but investors' financial concerns have arisen
- Alibaba agrees to settle with investors and pay $433.5 million to settle class action lawsuit
- Baidu Robin Lee: In the past 24 months, the biggest change in the AI industry is that the big model has basically eliminated illusion
- Novo Nordisk's stock price returns to its starting point at the beginning of the year! Falling sales of weight loss drugs scare off investors
-
11月21日、2024世界インターネット大会烏鎮サミットで、創業者、CEOの周源氏が大会デジタル教育フォーラムとインターネット企業家フォーラムでそれぞれ講演、発言したことを知っている。周源氏によると、デジタル教 ...
- 不正经的工程师
- 昨天 16:36
- 支持
- 反对
- 回复
- 收藏
-
アリババは、26億5000万ドルのドル建て優先無担保手形と170億元の人民元建て優先無担保手形の定価を発表した。ドル債の発行は2024年11月26日に終了する予定です。人民元債券の発行は2024年11月28日に終了する予定だ ...
- SOGO
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
スターバックスが中国事業の株式売却の可能性を検討していることが明らかになった。 11月21日、外国メディアによると、スターバックスは中国事業の株式売却を検討している。関係者によると、スターバックスは中国事 ...
- 献世八宝掌
- 前天 16:29
- 支持
- 反对
- 回复
- 收藏
-
【意法半導体CEO:中国市場は非常に重要で華虹と協力を展開】北京時間11月21日、意法半導体(STM.N)は投資家活動の現場で、同社が中国ウェハー代工場の華虹公司(688347.SH)と協力していると発表した。伊仏半導体 ...
- 黄俊琼
- 前天 14:29
- 支持
- 反对
- 回复
- 收藏