Google's "counterattack" has sparked investor discussions on the commercialization of multimodal models
白云追月素
发表于 2023-12-11 11:13:42
252
0
0
After the one-year anniversary of ChatGPT's launch, recently, Google's multimodal Gemini suddenly launched, which was seen by the industry as the "strongest counterattack by Google". The discussion among investment institutions regarding Gemini has exploded. Industry insiders believe that Gemini has significant optimization in visual recognition and inference, and when it comes to commercial scenarios, real-time interaction scenarios may become the focus of multimodal artificial intelligence model applications.
Gemini is "too stunning"
Recently, Google CEO Sandal Pichai announced the official launch of Gemini 1.0. Eli Collins, Vice President of Google DeepMind Products, stated that this is Google's most powerful and versatile large model to date.
It is understood that compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
After watching the Gemini series demonstration video, many investors expressed that it was "too shocking". "After watching Gemini's demonstration video, its ability to understand multimodality is astonishing. In addition, Gemini's reasoning ability currently seems to surpass ChatGPT." Associate Professor Sun Haifeng from the School of Computer Science at Beijing University of Posts and Telecommunications said that on the one hand, Gemini far surpasses OpenAI's ChatGPT in multimodal information processing. Gemini can support both multimodal information input and multimodal information output. A typical feature of Gemini is its support for interleaved sequences of text, images, audio, and video as inputs, which is difficult to implement for ChatGPT or traditional multimodal models. Generally speaking, ChatGPT only supports text output, and other modalities of output require calling third-party APIs for implementation. Gemini's interleaved sequence input method is more suitable for the needs of the vast majority of scenarios. On the other hand, in Gemini's technical report, its accuracy in MMLU dataset testing reached 90.04%, surpassing human experts, marking a milestone in the evolution of its reasoning ability.
On the day after Gemini was launched, Google was questioned by the outside world for claiming that multimodal videos were edited and collaged, and Gemini was suspected of exaggerating its advertising. Google also provided an explanation: the video does indeed have elements of post production and editing, and all interactions with Gemini are not perceived in real time, but rather the effects of images and prompts given by the staff. Therefore, Gemini still needs further development in reading videos.
Real time interaction scenarios or commercial focus
Affected by this news, domestic investors have launched heated discussions on multimodal technology and its applications.
A first tier investor in a certain technology track stated that compared to ChatGPT-4, Gemini's image recognition and reasoning abilities, as well as its current apparent response speed, have greatly improved. He personally believes that Gemini and OpenAI have their own unique products, and suitable scenarios need to be found for commercial implementation. "Having suitable scene adaptation and identifying value-added needs is still crucial, but Gemini has indeed further opened up the imagination space of AI models."
"It can be boldly imagined that when a multimodal model runs on a robot, it may achieve embodied intelligence. In addition, when the multimodal model is combined with Google Glass, it may be upgraded to a super intelligent agent," said another investor.
A technician introduced that humans have five senses, and the world we build and the media we consume are presented in this way. The multimodal model means that Gemini can understand the world around her in the same way as humans and absorb any type of input and output - whether it's text, code, audio, images, videos. The most crucial technology among them is how to mix all these patterns, how to collect as much data as possible from any number of inputs and senses, and then provide equally diverse responses.
"Gemini is more like a human, closer to human visual recognition and some reasoning and judgment. OpenAI's ChatGPT is more like a big knowledge base, which can provide people with information reference. The two are not about who surpasses each other, but have significant differences in focus direction," said an investor.
Sun Haifeng said that it is not yet clear what the specific implementation structure of Gemini is, but this pattern that can interleave multiple modal information as input is very needed in many scenarios, especially real-time interaction scenarios.
Another technology investor believes that the release of Gemini means that big companies have a more definite first mover advantage in artificial intelligence. For example, Google's Gemini has outstanding visual reasoning capabilities because they have a variety of search engine based materials as a large amount of training data. In addition, large factories have obvious advantages in data, traffic, capital, computing power, and application scenarios.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Tesla recalls over 1.68 million models across the entire lineup in China
- Japanese stocks surged nearly 9% in a single week, marking the largest increase in four years! Japanese stock market frenzy: familiar feeling returns
- Market confidence has fully recovered! Investors return to the stock market after achieving the longest annual rise in the US stock market
- Apple reportedly will produce high-end iPhone Pro models in India for the first time this year
- The Dow Jones Industrial Average hits a new high! US stock market rotation reappears: What should investors look for before September interest rate cut?
- Renowned Tesla investor 'disheartened': Half of its holdings have been cut, and neither AI nor robots can save it!
- OpenAI reportedly considers changing company structure or removing profit caps for investors
- What are investors worried about when the stock price is hovering at a low level?
- OpenAI's commercial subscription users have exceeded 1 million, and there are rumors of astonishing price increases for new models
- Rolling crazy! The big model price war continues! Alibaba announces: 85% price reduction!
-
【ゆとりサイクルが始まった!歴史を鑑とする:FED金利調整と大統領選結果には“隠れた関連”があった】データによると、選挙日前に金利が引き上げられた5年間で、大統領やホワイトハウスを支配していた政党が4年連 ...
- 不正经的工程师
- 前天 21:56
- 支持
- 反对
- 回复
- 收藏
-
【トヨタが米国で約4万2000台のカローラCrossをリコール】トヨタ自動車は当時時間9月20日、2022年6月から2024年9月までに生産された一部のカローラCrossハイブリッド車を安全にリコールしていると発表した。リコー ...
- 伍六三
- 12 小时前
- 支持
- 反对
- 回复
- 收藏
-
アイルランドのライアン航空のオリーリ最高経営責任者は現地時間9月14日、最近のボーイング社員のストライキが、2025年夏までにライアン社に納入されるボーイング737 MAX機の数に影響を与える可能性があると述べた ...
- rlf2000
- 1 小时前
- 支持
- 反对
- 回复
- 收藏
-
インテルは9月19日、イスラエルの自動運転技術会社Mobileyeの大株主として、ビジネスの将来に自信を持っており、現在、同社の株式の多数を切り離す計画はないと発表した。 ...
- hecgdge4
- 前天 20:20
- 支持
- 反对
- 回复
- 收藏