Google's "counterattack" has sparked investor discussions on the commercialization of multimodal models
白云追月素
发表于 2023-12-11 11:13:42
280
0
0
After the one-year anniversary of ChatGPT's launch, recently, Google's multimodal Gemini suddenly launched, which was seen by the industry as the "strongest counterattack by Google". The discussion among investment institutions regarding Gemini has exploded. Industry insiders believe that Gemini has significant optimization in visual recognition and inference, and when it comes to commercial scenarios, real-time interaction scenarios may become the focus of multimodal artificial intelligence model applications.
Gemini is "too stunning"
Recently, Google CEO Sandal Pichai announced the official launch of Gemini 1.0. Eli Collins, Vice President of Google DeepMind Products, stated that this is Google's most powerful and versatile large model to date.
It is understood that compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
After watching the Gemini series demonstration video, many investors expressed that it was "too shocking". "After watching Gemini's demonstration video, its ability to understand multimodality is astonishing. In addition, Gemini's reasoning ability currently seems to surpass ChatGPT." Associate Professor Sun Haifeng from the School of Computer Science at Beijing University of Posts and Telecommunications said that on the one hand, Gemini far surpasses OpenAI's ChatGPT in multimodal information processing. Gemini can support both multimodal information input and multimodal information output. A typical feature of Gemini is its support for interleaved sequences of text, images, audio, and video as inputs, which is difficult to implement for ChatGPT or traditional multimodal models. Generally speaking, ChatGPT only supports text output, and other modalities of output require calling third-party APIs for implementation. Gemini's interleaved sequence input method is more suitable for the needs of the vast majority of scenarios. On the other hand, in Gemini's technical report, its accuracy in MMLU dataset testing reached 90.04%, surpassing human experts, marking a milestone in the evolution of its reasoning ability.
On the day after Gemini was launched, Google was questioned by the outside world for claiming that multimodal videos were edited and collaged, and Gemini was suspected of exaggerating its advertising. Google also provided an explanation: the video does indeed have elements of post production and editing, and all interactions with Gemini are not perceived in real time, but rather the effects of images and prompts given by the staff. Therefore, Gemini still needs further development in reading videos.
Real time interaction scenarios or commercial focus
Affected by this news, domestic investors have launched heated discussions on multimodal technology and its applications.
A first tier investor in a certain technology track stated that compared to ChatGPT-4, Gemini's image recognition and reasoning abilities, as well as its current apparent response speed, have greatly improved. He personally believes that Gemini and OpenAI have their own unique products, and suitable scenarios need to be found for commercial implementation. "Having suitable scene adaptation and identifying value-added needs is still crucial, but Gemini has indeed further opened up the imagination space of AI models."
"It can be boldly imagined that when a multimodal model runs on a robot, it may achieve embodied intelligence. In addition, when the multimodal model is combined with Google Glass, it may be upgraded to a super intelligent agent," said another investor.
A technician introduced that humans have five senses, and the world we build and the media we consume are presented in this way. The multimodal model means that Gemini can understand the world around her in the same way as humans and absorb any type of input and output - whether it's text, code, audio, images, videos. The most crucial technology among them is how to mix all these patterns, how to collect as much data as possible from any number of inputs and senses, and then provide equally diverse responses.
"Gemini is more like a human, closer to human visual recognition and some reasoning and judgment. OpenAI's ChatGPT is more like a big knowledge base, which can provide people with information reference. The two are not about who surpasses each other, but have significant differences in focus direction," said an investor.
Sun Haifeng said that it is not yet clear what the specific implementation structure of Gemini is, but this pattern that can interleave multiple modal information as input is very needed in many scenarios, especially real-time interaction scenarios.
Another technology investor believes that the release of Gemini means that big companies have a more definite first mover advantage in artificial intelligence. For example, Google's Gemini has outstanding visual reasoning capabilities because they have a variety of search engine based materials as a large amount of training data. In addition, large factories have obvious advantages in data, traffic, capital, computing power, and application scenarios.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Boeing proposes a 35% salary increase! The new contract has not yet been voted on, but investors' financial concerns have arisen
- Alibaba agrees to settle with investors and pay $433.5 million to settle class action lawsuit
- Baidu Robin Lee: In the past 24 months, the biggest change in the AI industry is that the big model has basically eliminated illusion
- Novo Nordisk's stock price returns to its starting point at the beginning of the year! Falling sales of weight loss drugs scare off investors
- Tesla Model 3/Y's 5-year, interest free car purchase campaign extended until the end of the year
- Tesla Model 3/Y's 5-year, interest free car purchase campaign extended until the end of the year
- Microsoft denies using user data to train artificial intelligence models
- Qifu Technology: Helping to improve the security level of digital finance and the practical application of financial models
- Nvidia brings a new AI model to 'revolutionize' the audio industry: capable of creating music and modifying vocals
- Nagi 20000 points! Except for US bonds, American investors are buying everything
-
生成式人工知能(AI)が巻き起こす技術の波の中で、電力会社は意外にも資本市場の寵児になった。 今年のスタンダード500割株の上昇幅ランキングでは、Vistraなどの従来の電力会社が注目を集め、株価が2倍になってリ ...
- xifangczy
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
隔夜株式市場 世界の主要指数は金曜日に多くが下落し、最新のインフレデータが減速の兆しを示したおかげで、米株3大指数は大幅に回復し、いずれも1%超上昇した。 金曜日に発表されたデータによると、米国の11月のPC ...
- SNT
- 前天 12:48
- 支持
- 反对
- 回复
- 收藏
-
長年にわたって、昔の消金大手の捷信消金の再編がようやく地に着いた。 天津銀行の発表によると、同行は京東傘下の2社、対外貿易信託などと捷信消金再編に参加する。再編が完了すると、京東の持ち株比率は65%に達し ...
- SNT
- 前天 12:09
- 支持
- 反对
- 回复
- 收藏
-
グーグルは現地時間12月19日、新しい「推理」モデルとしてGemini 2.0 Flash Thinkingを発売すると発表した。紹介によると、このモデルはまだ実験段階であり、訓練を経た後、モデルが反応を起こした時に経験した「思 ...
- 地下水
- 3 天前
- 支持
- 反对
- 回复
- 收藏