Google releases its most powerful model to attack OpenAI, shifting focus to AI agents

After releasing its strongest quantum chip, Google has made another important move in AI.
On the early morning of December 12th Beijing time, Google released a new model Gemini 2.0 before OpenAI announced the official launch of ChatGPT on the iPhone.
Google CEO Sundar Pichai said that this is Google's most powerful model to date. With improvements in multimodal aspects such as native image and native audio output, Gemini 2.0 is able to build new AI agents, taking Google one step closer to its vision of building a universal assistant.
It should be pointed out that Gemini 2.0 is mainly open to developers and trusted testers. At present, the Gemini 2.0 Flash Experience model is open to all Gemini users.
Gemini 2.0 Flash is a model built on the foundation of 1.5 Flash, which was previously Google's most popular version among developers. Compared to 1.5 Flash, Gemini 2.0 Flash further enhances performance with the same fast response time. Google claims that 2.0 Flash even surpassed 1.5 Pro in key benchmark tests, with its speed being twice that of 1.5 Pro.
At the same time, 2.0 Flash also has new features. In addition to supporting multimodal inputs such as images, videos, and audio, it can also support multimodal outputs, such as directly generating content that mixes images and text, as well as native generation of controllable multilingual text to speech (TTS) audio. It can also natively call tools such as Google Search, code execution, and third-party user-defined functions.
Global Gemini users can now experience chat conversations optimized for 2.0 Flash on both computers and mobile devices, and this version will soon be released in the Gemini mobile application. Based on this new model, users can also experience the Gemini assistant. At the beginning of next year, Google will also expand Gemini 2.0 to more products.
The biggest change in Gemini 2.0 is the shift in focus towards AI agents, aiming to become the foundation model for all AI agents. Based on this, Gemini 2.0 has developed a series of prototypes that can help users complete corresponding tasks.
Among them, the upgraded version of Project Astra is used to explore the research prototype of future general AI assistant capabilities. Since the launch of Project Astra at Google I/O, Google has been collecting feedback from trusted testers who use it on Android phones. The upgraded version launched this time can enable conversations between multiple languages and mixed languages, and can also use new tools such as Google Search, Google Lens, and Google Maps. It can remember conversation content for up to 10 minutes and understand language with a latency close to that of human conversations.
The all-new Project Mariner explores the future development of interaction between humans and intelligent agents from a browser perspective. Project Mariner utilized early research prototypes built with Gemini 2.0, capable of understanding and inferring information in browser pages, including pixels and text, code, images, and forms, among other web elements, and then assisting users with corresponding tasks through experimental Chrome extensions. In this upgrade, Project Mariner has improved the previously slow speed issue.
In short, users can use this feature to let the browser help them complete specific tasks, such as batch searching for email addresses on certain websites, thereby achieving a certain degree of "automatic operation" of the browser.
Jules is a coding agent designed for developers, which can be directly integrated into GitHub workflows to assist developers in completing development tasks.
In Google's demonstration video, the presenter inputs a long string of prompts containing detailed programming questions. Jules will then analyze these requirements and provide a three-step programming solution. After clicking 'agree', the model will start automatic programming and generate code. This undoubtedly helps developers further improve their work efficiency.
At the end of last year, Google released the Gemini 1.0 model, whose main capability is to integrate and understand information. And Gemini 2.0 can make information more useful. Sundar Pichai stated that the progress of Gemini 2.0 is due to Google's 10-year investment in full stack AI innovation research, built on Google's customized hardware sixth generation TPU Trillium.
Just as Google was attracting attention with its most powerful model, OpenAI's 12 day product launch event was still ongoing. On the same day, OpenAI showcased the integration of ChatGPT and Apple Intelligence to the public, but the content was somewhat plain. The sudden release of Google Gemini 2.0 clearly stole a lot of attention from OpenAI.
With the support of Gemini 2.0, Google has launched three intelligent agent products in one go, which also means that it has taken another important step in the competition with Microsoft's OpenAI, Amazon, and Anthropic.
Intelligent agents have become the core direction of competition in the field of large models. The so-called intelligent agent refers to a system that can perceive the environment, make decisions, and take actions to achieve specific goals, and is regarded as a key carrier for the implementation and application of Large Language Models (LLMs).
Nearly two months ago, Microsoft released 10 AI agents for sales, operations, and other scenarios, and later announced that the Copilot Studio platform now supports users in building autonomous agents, while also releasing 5 pre built agents. At the just concluded 2024 re: Invent, Amazon released six large models in one go, among which Amazon Nova Premier is also a multimodal large model designed for complex reasoning tasks.
Whether in consumer or enterprise scenarios, AI agents have a lot of room for imagination and a clear commercial prospect. Several industry insiders predict that 2025 will be the year of the commercial explosion of AI intelligent agents. At that time, the competition among technology giants such as Google and OpenAI around intelligent agents will inevitably become increasingly fierce.

浏览过的版块