首页 News 正文

On the early morning of December 7th Beijing time, Google suddenly released its "most powerful and versatile model to date" Gemini. The Gemini model, as the first multimodal model released by Google and globally, supports cloud and edge testing.
According to relevant test data, Gemini Ultra outperforms human expert models in MMLU (Massive Multi Task Language Understanding), and outperforms GPT-4 in multiple tasks horizontally. Therefore, it is also considered the strongest competitor of GPT-4.
From the perspective of the industry, the emergence of Gemini will further expand the application scenarios of large models. On the other hand, it can also bring about continuous upgrades in computing power demand, and also bring more catalysts for the subsequent launch of large models, including GPT-5.
Launch three versions

It is understood that Gemini is a true competitor to the GPT-4 that Google has been preparing for for a year, and it is currently the most powerful and adaptable large model that Google can offer. Gemini is a multimodal model built on Transformer decoder, which can process information in different forms of content such as video, audio, and text.
The latest Gemini model is able to perform more complex reasoning and understand finer information compared to previous technologies. This time, its first version Gemini 1.0 launched three different size models: Ultra, Pro, and Nano.
1) The Ultra version is the most powerful version and can demonstrate the highest efficiency in the corresponding TPU infrastructure. In multiple tests, the performance of the Ultra version exceeds that of GPT-4V;
2) The Pro version is a cost-effective optimized version with strong capabilities in reasoning, multimodality, and other aspects. It has good scalability and can complete pre training within a few weeks. In multiple tests, it is second only to GPT-4V and stronger than mainstream large models such as PaLM2, Claude2, LLaMA2, and GPT3.5;
3) The Nano version is a 4-bit model distilled from other models, with two versions: 1.8B and 3.25B, targeting low memory and high memory devices respectively, and supporting local deployment.
At present, Gemini 1.0 has been launched on various Google products and platforms, including connecting to the chatbot Bard and the smartphone Pixel 8 Pro. In the coming months, Gemini will be applied to more Google products and services, such as Search, Ads, Chrome, and Duet AI.
Performance compaction GPT-4?

The Gemini model, as the first multimodal model released by Google and globally, supports cloud and edge testing. According to relevant test data, Gemini Ultra outperforms human expert models in MMLU (Massive Multi tasking Language Understanding), with performance surpassing GPT-4 in multiple tasks when compared horizontally.
It is reported that out of the 32 academic benchmarks widely used in large-scale language model research and development, from natural image, audio, and video understanding to mathematical reasoning, Gemini Ultra's performance has exceeded the current state-of-the-art level in 30 of them.
Among them, in the MMLU (Massive Multi Task Language Understanding) test, Gemini Ultra's score rate was 90.0%, making it the first model to surpass human experts, with a GPT-4 score rate of 86.4%; In terms of image understanding, Gemini Ultra also performed better in the new MMMU benchmark test, with a score rate of 59.4% and a GPT-4V score rate of 56.8%.
In addition, Gemini 1.0 has complex reasoning abilities, extracting insights from hundreds of thousands of documents by reading, filtering, and understanding information. Google engineers demonstrated an example of Gemini extracting key information from 200000 scientific research papers.
Since 2021, this research field has added over 200000 research papers and needs to be updated to existing research. In the past, scientific researchers could only manually process it, but now Gemini can automatically distinguish and filter literature information related to the research field. In just one lunch, Gemini helps scientists read 200000 papers and draw new images with updated data information.
Gemini Ultra is currently undergoing large-scale trust and security checks. During the model refinement process, Google will provide Gemini Ultra to some customers, developers, partners, and security and responsibility experts for early testing and feedback. And the model will be provided to developers and enterprise clients early next year.
Innovation in hardware, algorithms, and datasets

A research report by Minsheng Securities commented that by evaluating the Gemini model series in over 50 benchmark tests, as the model size increases, the Gemini model continues to improve its quality in reasoning, mathematics/science, and long texts. Among all six abilities, Gemini Ultra is the best model.
As the second largest model in the Gemini model family, Gemini Pro is also highly competitive in performance and more efficient in providing services.
The institution claims that the Gemini training process can also innovate infrastructure, algorithms, and datasets. It is worth noting that Google has also released the so-called "most powerful" TPU system Cloud TPU v5p, aimed at providing support for training cutting-edge AI models. Google claims that the cost-effectiveness of the previous generation TPU v4 has increased by 2.3 times.
The new generation TPU will accelerate the development of Gemini, helping developers and enterprise clients train large-scale generative AI models faster, and thus launch new products and features faster.
In terms of algorithms, Google uses techniques such as single control algorithms and XLA compilers to optimize the training process, and also achieves stable training by preventing SDC and other issues. In terms of datasets, Google improves Gemini training and inference speed through word segmentation technology, and also ensures high quality of data used for training through a series of filtering methods.
Catalyze models such as GPT-5

The release of Google Gemini will inevitably bring new catalysts for the iterative upgrading of other AI models.
CITIC Securities analysis suggests that Gemini can reduce latency by approximately 40% in the current search scenario. For the entire industry, the promotion of Google's productization and commercialization will also bring about overall changes in the industry. The organization expects that more and more AI scenarios and products will emerge in the future, coupled with cost optimization brought about by hardware upgrades and algorithm optimization. The progress of To C products is worth looking forward to.
At the same time, the institution also believes that the release of Gemini will further bring more expectations for multimodal models. For the industry, multimodal materials will drive an increase in computing power demand, and at the same time, it will bring more catalysis for the subsequent release of models such as GPT-5.
What cannot be ignored at present is that the monthly user count of Open AI began to decline in May and rebounded to 1.7 billion in October. Compared to Google's search engine Bard, it has 260 million users. So, will Open AI users turn to Google?
In response to this issue, a leading domestic model manufacturer believes that Open AI still has advantages in the short term, but in the long run, Google's massive user base and product ecosystem will become a powerful force.
Compared to OpenAI, Google has accumulated a massive number of PC and mobile users, possesses massive real-time data (while OpenAI's data relies on the Internet, including Google), and has a large amount of user information by integrating GPT into user phones to provide navigation such as subway navigation. "The pressure has put pressure on Open AI, and it is necessary to supplement the product ecosystem," the above person analyzed.
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

六月清晨搅 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    30