Google King Returns? How strong is the latest big model and can it challenge GPT-4

Technology giant Google has launched a long established new model that can run on mobile phones and significantly reduce computing costs.
On December 6th local time, Google announced the launch of the "largest, strongest, and most versatile" new large-scale language model Gemini. Gemini will be the first large-scale model to run directly on a mobile phone, applied to Google Pixel 8 Pro smartphones and chatbot Bard. Google plans to license Gemini to customers through Google Cloud and will integrate it with other products in Google services in the coming months.
Google has invented many computer science concepts that make generative AI applications possible, but was once in a passive position due to OpenAI's chatbot ChatGPT released last year. Faced with the threat posed by the collaboration between OpenAI and Microsoft, one of Google's biggest competitors, Google launched its own chatbot Bard in September this year. Not long after, OpenAI released a more powerful AI software GPT-4, which became a major benchmark in the field of AI. Now, in response to GPT-4, Google has launched Gemini.
"Google has found its rightful place in the AI competition"
Demis Hassabis, CEO of Google DeepMind and representative of the Gemini team, stated at a press conference that Google has run 32 comprehensive multimodal benchmarks to compare the GPT-4 of Gemini and OpenAI, and Gemini is "significantly ahead of 30 out of 32 benchmarks.".
According to Google, Gemini performs excellently in various tasks during the later stages of training. For example, MMLU (Massive Multi Task Language Understanding) is one of the most popular methods for testing AI model knowledge and problem-solving abilities, and Gemini achieved a score of 90.0% in MMLU for the first time, being the first model to surpass human experts in MMLU testing.
Gemini's score rate on MMLU surpassed that of human experts for the first time. Source: Official Video
Gemini includes a set of three different scale models: Gemini Ultra is the largest and most powerful category, positioned as a competitor to GPT-4; Gemini Pro is a mid-range model that performs better than GPT-3.5 and can scale multiple tasks; Gemini Nano is used for specific tasks and mobile devices.
Among them, the Gemini Nano will be installed on the latest Pixel 8 Pro smartphone in the Google Pixel series, supporting new features such as "summary" in recording applications, and launching the "smart reply" function in the Google Keyboard Input Method Gboard. According to foreign media reports, Google has stated that the Gemini Nano will run "locally" on the device, and the model is specially optimized for mobile devices, so Android developers can easily build AI applications and features that support offline work or use personal information retained on the device.
Analysis suggests that this progress can help solve a major economic problem in the field of technology. Utilizing the computing power of mobile phones to run generative AI, rather than relying on cloud servers operated by large technology companies, will greatly reduce the cost of operating such systems. For those who wish to limit their personal data to devices, this also provides a layer of security. Previously, Samsung Electronics publicly showcased its first generative AI model, Gauss, in November, but it is limited to internal employees and is expected to be installed on the Galaxy S24 series phones in the first half of next year.
"I believe that the AI transformation we are witnessing will be the most profound in our lives, much larger than the previous transformation in mobile technology or the internet. This new era model represents one of the largest scientific and engineering efforts our company has ever made," wrote Sundar Pichai, CEO of Alphabet, Google's parent company, in a blog post
On the eve of Gemini's release, Pichai stated in an interview that one of the main reasons Gemini attracted attention was that it is fundamentally a multimodal model, and stated that the transition to AI is very profound and is still in its early stages, There are infinite opportunities ahead: "When we developed Gemini, we applied a lot of previous experience. We spent more time developing Gemini Ultra, partly to conduct strict security testing. At the same time, we are also fine-tuning it to fully unleash its potential."
On the X (formerly Twitter) platform, Elon Musk also commented under Pichai's Gemini introduction article, "Impressive." Musk also responded to a post by Hasabis, congratulated him, and agreed with SpaceX founder Tom Mueller's comment on Gemini, This comment reads: "I know it's difficult to define what AGI (General Artificial Intelligence) is, but no matter what it is, it's closer than you imagine."
According to Google, as a collaborative effort among various Google teams, including Google Research, Gemini is able to extract insights from hundreds of thousands of documents by reading, filtering, and understanding information, and can also understand numbers well. For example, importing a data graph and new data to Gemini, Gemini can provide the code behind this data graph and generate a data graph that imports the new data.
Gemini generates the right image from the left image and new data. Source: Official Video
In addition to text, Gemini can also understand various forms of input and output, including text, code, audio, images, and videos. Gemini is able to understand information with subtle differences and answer questions related to complex topics, which makes her particularly skilled at explaining reasoning in complex subjects such as mathematics and physics.
Gemini is able to answer questions step by step based on photos. Source: Official Video
Google also released a six minute video showcasing some interesting interactions between testers and Gemini, including asking Gemini to recognize images and describe them in multiple languages, using a map to design intelligence quizzes, and playing cup games and reasoning games with Gemini.
Throughout the process, Gemini's reaction speed was very fast, and he also generated audio and pictures to assist in answering, using some colloquial and even humorous expressions, which can be said to be eye opening. In the comments section, netizens praised the video as "shocking" and celebrated Google's return to its rightful position in the AI competition.
Gemini provides animal shapes that can be made based on two balls of yarn. Source: Official Video
When asked which direction the duck should go, Gemini said it should go to the left side with companions. Source: Official Video
In terms of coding, Gemini can also understand, interpret, and generate high-quality code written in the world's most popular programming languages, including Python, Java, C++, and Go. It can work across languages and reason complex information, and can also be used as an engine for higher-level coding systems.
Starting from December 13th, developers and enterprise clients will be able to access Gemini Pro through the Gemini API (Application Programming Interface) in Google AI Studio or Google Cloud Vertex AI, and Android developers will be able to build using Gemini Nano.
Gemini will bring the largest update since its release to the Google chatbot Bard. Google announced that starting from the day of the launch event, Bard will use Gemini Pro to achieve advanced reasoning, planning, understanding, and other functions, providing English services in over 170 countries and regions. Google plans to expand to different modalities, support new languages and regions in the coming months. At the beginning of next year, Google will launch Bard Advanced, which will use Gemini Ultra.
However, due to regulatory reasons, Bard equipped with Gemini technology will not be available in EU countries and the UK. "We will definitely work hard to solve this problem and are collaborating with local regulatory agencies to ensure that we have sufficient communication with relevant parties before launching the service in any specific region," said Sissie Hsiao, Google's Vice President and Bard Project Leader
Exaggerated promotional videos?
However, shortly after the release of Gemini, some netizens pointed out some inappropriate aspects in the promotional materials.
According to a 60 page technical report released by Google, in MMLU testing, Gemini's results are written below“ cot@32 ”The small word annotation indicates that it used the thought chain suggestion technique, tried 32 times, and selected the best result from them. As a comparison, GPT-4 provides 5 examples of silent word techniques. Under this standard, Gemini Ultra's test result is actually 83.7%, lower than GPT-4's 86.4%.
Moreover, in the graph displaying the comparison of MMLU test scores, Gemini's 90.0% test results were actually only slightly inferior to the 89.8% score of human experts, but were far apart.
Philipp Schmid, the technical director of HuggingFace, has fixed this graph using the data disclosed in the technical report. The following two data points show the GPT-4 (left) and Gemini (right) scores when using the silent word technique to give 5 examples. Source: X
Subsequently, Jeff Dean, Chief Scientist of Google DeepMind, responded to this question in a discussion on the X platform, writing, "We reported on these two methods. We believe it would be interesting for the community to see our newly developed CoT method and understand its differences from other methods."
And for that exciting interactive demonstration video, some people also discovered issues from the disclaimer in the opening text. Machine learning instructor Santiago Valdarrama believes that the statement may imply that the video presented is carefully selected and not recorded in real-time, but edited. In its statement, Google wrote, "We have been shooting video materials, testing them on various challenges, presenting a series of images to Gemini, and asking it to reason out what it sees."
Disclaimer at the beginning of the demonstration video. Source: Official Video
Subsequently, Google explained the multimodal interaction process in a blog post and indirectly acknowledged that only by using static images and multiple prompts to piece together can the effects in the demonstration video be achieved. For example, in the video, Gemini takes turns showing off her fists, scissor hands, and open palms, and Gemini can immediately conclude that she is playing a guessing game. In the article, Google acknowledges that Gemini would only come to the conclusion of a guessing game if they simultaneously displayed these three gestures to Gemini and indicated that it was a game.
Of course, even with some exaggeration in terms of promotion, the performance of Gemini cannot be underestimated.
Who can win the technology giant competition?
Since the beginning of this year, major technology giants have been making continuous moves in the field of AI, each with unique tricks.
Among them, Microsoft, one of Google's biggest competitors, is particularly prominent. In February of this year, Microsoft implanted the chatbot Bing AI into its search engine Bing. A month later, Microsoft launched the Microsoft 365 Copilot, which introduced the capabilities of the large language model GPT-4 into Office software. In addition, to help Microsoft maintain its leading advantage in introducing AI in office tools, Microsoft 365 Copilot Enterprise Edition was officially launched on November 1st, with a monthly subscription fee of $30. More than a month ago, Microsoft announced that the AI assistant Copilot will be officially integrated into Windows 11.
At the first developer conference in November, OpenAI also launched a new model GPT-4 Turbo that supports up to 12800 tokens, as well as a series of upgrades to the chatbot ChatGPT, including custom GPT. Among them, Turbo supports a contextual dialogue length of 12800 tokens and has visual input capability. It enters the multimodal API together with the text graph model DALL · E 3 and the new voice synthesis model (TTS).
For many years, Facebook's parent company Meta has also been an active participant in the AI field. In July of this year, Meta announced that its large model Llama 2, a competitor to GPT4, was officially open source, and anyone can download, modify, and add it to their products for free. This approach has won praise from some tech startups who are concerned that Google, Microsoft, and OpenAI will try to monopolize the AI market and exclude any competitors. But Meta's measures have also been criticized for making it easier for people to use AI technology for evil, such as designing computer viruses, generating sound or images to commit fraud, and so on.
The e-commerce giant Amazon, which has always been considered lagging behind in the AI competition, is also accelerating. At the 2023 re: Invent Global Conference last week, Amazon Cloud Technology (AWS) launched a generative AI assistant called "Amazon Q", which can "easily chat, generate content, and take action.". Amazon Q will focus on the workplace rather than targeting consumers. In the future, Amazon will charge a monthly subscription fee of $20 to enterprise users, while the monthly subscription fee for versions provided to developers and IT personnel is $25.

浏览过的版块