How strong is Google's newly released Gemini 1.5 Pro when GPT-4 Turbo is crushed?

Before this week, when asked "What is the strongest AI model on Earth?", the vast majority of people would instinctively point to OpenAI's GPT-4 Turbo released in November last year. However, several months have passed and the answer to this question has quietly changed.
On Thursday local time, Google suddenly released the "next-generation AI model" - Gemini 1.5. Compared to the speed of updating iterations (note: Google only released Gemini 1.0 in December last year), the productivity surge demonstrated by the new generation model has aroused strong interest in the industry.
(Source: Google)
Core explosive point: processing one million tokens at once
As the biggest highlight of the new version, Gemini 1.5 Pro, the first multimodal universal model to be introduced in Gemini 1.5, expands the upper limit of stable context processing to 1 million tokens. It should be emphasized here that although the literal expression is "context", AI models have entered the era of multimodality, and the emerging cutting-edge large models basically support processing text, code, as well as rich media such as images, speech, and videos.
Horizontally, the Gemini 1.0 Pro released two months ago has a context understanding limit of 32000 tokens, while its old rival OpenAI's GPT-4 Turbo only supports 128000 tokens.
(Source: Google)
The concept of Token may be a bit convoluted. To put it another way, it means being able to throw over 700000 words of text, 30000 lines of code, 11 hours of audio, or 1 hour of video to the Gemini 1.5 Pro at once, and then direct it to work. For large models, the context window is the main limitation on the application scenario - if a file of 500000 words cannot be "read" completely, it cannot be processed.
The reason why Google can process a large amount of data at once is because its model adopts the algorithm of "Multi Expert Model" (MoE), which means that when responding to requirements, only a part of the overall model will be run, so the response speed is faster and Google can process it more easily.
Google CEO Pichai also revealed that Google researchers have successfully tested the context window of 10 million tokens. This means that future AI models can handle the entire "Game of Thrones" at once (the total number of published 5 books has reached 1.73 million words).
It is worth mentioning that currently, in the test versions open to developers by Google, the Gemini 1.5 Pro has a token limit of only 128000. However, Demis Hassabis, CEO of Google DeepMind, bluntly stated that a new fee level will be introduced in the future for unlocking versions with 1 million tokens. As a comparison, the subscription service currently using the Google Gemini 1.0 pro model charges $19.99 per month.
(Gemini 1.5 Pro has significantly stronger capabilities than 1.0 Pro, and can even be compared to 1.0 Ultra)
Can you provide some practical cases?
In the demonstration video released on Thursday, Google researchers uploaded a 402 page live voice transcription document of the Apollo moon landing program and drew a picture of "boots landing", requesting AI to search for information about this moment in the file. Subsequently, the large model accurately identified the conversation of the astronauts when they landed on the moon and accurately marked the location in the document.
In addition, the researchers also uploaded a 44 minute video asking the AI to search for the moment when they took out a piece of paper from their clothing pocket and provide a detailed description of the content on the paper. Without a doubt, the AI accurately output the content that the questioner wanted.
At the same time, researchers uploaded hand drawn images of a person being doused by a faucet, and AI successfully found similar scenes in the film.
Unlock more professional scenarios
In the technical documentation, Google also presented an interesting use case: throwing an AI a grammar book in Kalamang language (which only a few hundred people around the world are proficient in), and then conducting English to Kalamang and Kalamang to English translation tests on several models. The test is evaluated on a scale of 0-6, with 6 being perfect translation.
The results show that Gemini 1.5 Pro is currently the best performing model in the Kalamang language to English translation test, and in the English to Kalamang language translation test, it received a rating of 5.52, which is only one step away from the 5.6 score of real language learners. Don't forget, AI only spent a few minutes typing a grammar book.
At the same time, due to the text window limitations of both GPT-4 Turbo and Claude 2.1, only half of the grammar book can be read, so the output results are generally in an unusable range.
Google CEO Pichai said that for enterprise applications, a larger context window is very beneficial. Listed companies can load a large amount of financial data at once, while film producers can upload the entire film and ask critics what they may say.
For a wider audience, Google's move undoubtedly rings an alarm for OpenAI - the GPT-4 Turbo has been released for four months now. When will a new generation of big models be released?

映画・テレビ産業がコンテンツの3大激変を迎えた愛奇芸調整マイクロコントビジネスモデル

理想自動車9月新車納入53,709台、前年同月比48.9%増

オーディオストリーミングプラットフォームSpotifyリカバリサービス

今年9月の極クリプトン車の納入台数は前年同期比77%増の2万13万台