Google releases its most powerful model to attack OpenAI, shifting focus to AI agents
老好人啊
发表于 10 小时前
3004
0
0
After releasing its strongest quantum chip, Google has made another important move in AI.
On the early morning of December 12th Beijing time, Google released a new model Gemini 2.0 before OpenAI announced the official launch of ChatGPT on the iPhone.
Google CEO Sundar Pichai said that this is Google's most powerful model to date. With improvements in multimodal aspects such as native image and native audio output, Gemini 2.0 is able to build new AI agents, taking Google one step closer to its vision of building a universal assistant.
It should be pointed out that Gemini 2.0 is mainly open to developers and trusted testers. At present, the Gemini 2.0 Flash Experience model is open to all Gemini users.
Gemini 2.0 Flash is a model built on the foundation of 1.5 Flash, which was previously Google's most popular version among developers. Compared to 1.5 Flash, Gemini 2.0 Flash further enhances performance with the same fast response time. Google claims that 2.0 Flash even surpassed 1.5 Pro in key benchmark tests, with its speed being twice that of 1.5 Pro.
At the same time, 2.0 Flash also has new features. In addition to supporting multimodal inputs such as images, videos, and audio, it can also support multimodal outputs, such as directly generating content that mixes images and text, as well as native generation of controllable multilingual text to speech (TTS) audio. It can also natively call tools such as Google Search, code execution, and third-party user-defined functions.
Global Gemini users can now experience chat conversations optimized for 2.0 Flash on both computers and mobile devices, and this version will soon be released in the Gemini mobile application. Based on this new model, users can also experience the Gemini assistant. At the beginning of next year, Google will also expand Gemini 2.0 to more products.
The biggest change in Gemini 2.0 is the shift in focus towards AI agents, aiming to become the foundation model for all AI agents. Based on this, Gemini 2.0 has developed a series of prototypes that can help users complete corresponding tasks.
Among them, the upgraded version of Project Astra is used to explore the research prototype of future general AI assistant capabilities. Since the launch of Project Astra at Google I/O, Google has been collecting feedback from trusted testers who use it on Android phones. The upgraded version launched this time can enable conversations between multiple languages and mixed languages, and can also use new tools such as Google Search, Google Lens, and Google Maps. It can remember conversation content for up to 10 minutes and understand language with a latency close to that of human conversations.
The all-new Project Mariner explores the future development of interaction between humans and intelligent agents from a browser perspective. Project Mariner utilized early research prototypes built with Gemini 2.0, capable of understanding and inferring information in browser pages, including pixels and text, code, images, and forms, among other web elements, and then assisting users with corresponding tasks through experimental Chrome extensions. In this upgrade, Project Mariner has improved the previously slow speed issue.
In short, users can use this feature to let the browser help them complete specific tasks, such as batch searching for email addresses on certain websites, thereby achieving a certain degree of "automatic operation" of the browser.
Jules is a coding agent designed for developers, which can be directly integrated into GitHub workflows to assist developers in completing development tasks.
In Google's demonstration video, the presenter inputs a long string of prompts containing detailed programming questions. Jules will then analyze these requirements and provide a three-step programming solution. After clicking 'agree', the model will start automatic programming and generate code. This undoubtedly helps developers further improve their work efficiency.
At the end of last year, Google released the Gemini 1.0 model, whose main capability is to integrate and understand information. And Gemini 2.0 can make information more useful. Sundar Pichai stated that the progress of Gemini 2.0 is due to Google's 10-year investment in full stack AI innovation research, built on Google's customized hardware sixth generation TPU Trillium.
Just as Google was attracting attention with its most powerful model, OpenAI's 12 day product launch event was still ongoing. On the same day, OpenAI showcased the integration of ChatGPT and Apple Intelligence to the public, but the content was somewhat plain. The sudden release of Google Gemini 2.0 clearly stole a lot of attention from OpenAI.
With the support of Gemini 2.0, Google has launched three intelligent agent products in one go, which also means that it has taken another important step in the competition with Microsoft's OpenAI, Amazon, and Anthropic.
Intelligent agents have become the core direction of competition in the field of large models. The so-called intelligent agent refers to a system that can perceive the environment, make decisions, and take actions to achieve specific goals, and is regarded as a key carrier for the implementation and application of Large Language Models (LLMs).
Nearly two months ago, Microsoft released 10 AI agents for sales, operations, and other scenarios, and later announced that the Copilot Studio platform now supports users in building autonomous agents, while also releasing 5 pre built agents. At the just concluded 2024 re: Invent, Amazon released six large models in one go, among which Amazon Nova Premier is also a multimodal large model designed for complex reasoning tasks.
Whether in consumer or enterprise scenarios, AI agents have a lot of room for imagination and a clear commercial prospect. Several industry insiders predict that 2025 will be the year of the commercial explosion of AI intelligent agents. At that time, the competition among technology giants such as Google and OpenAI around intelligent agents will inevitably become increasingly fierce.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Robin Lee said that the illusion of the big model has basically eliminated the actual measurement of ERNIE Bot?
- AI Weekly | Yang Zhilin claims that Kimi has over 36 million monthly active users; Robin Lee: The illusion of big model is basically eliminated
- ERNIE Bot has more than 400 million users, Baidu Wu Tian: the big model is reshaping the industrial intelligence engine
- In October of this year, Tesla Model Y won the sales championship for first tier and new first tier city models
- Alibaba CEO Wu Yongming: AI development requires a batch of open-source models of different scales and fields
- Baidu's Q3 core net profit increased by 17%, exceeding expectations. Wenxin's large model daily usage reached 1.5 billion
- The delivery fee pricing has been lowered to 6 yuan, and McDonald's has adjusted the McDonald's delivery fee model
- Ideal Automobile implements a limited time zero interest policy for all models for the first time
- OpenAI launches full health version of the o1 big model and $200 per month ChatGPT Pro
- OpenAI has Rocket again! Officially launched Sora, an AI video generation model
-
10月末に2800ドルのマイルストーンを突破した後、国際金価格は短い調整を経た。 11日のニューヨーク商品取引所で来年2月に引き渡されたCOMEX金先物は1.5%近く上昇し、2750ドルの関門を再び奪還し、最新のインフレデ ...
- 什么大师特
- 10 小时前
- 支持
- 反对
- 回复
- 收藏
-
米大統領選後の株式市場の上昇は年末まで続いており、ウォール街の大物たちが叫んだ来年の目標価格も年々上昇しているが、上昇を追う際にはすべての慎重さを捨ててはならないと警告するアナリストも少なくない。 一 ...
- SOHU
- 昨天 11:47
- 支持
- 反对
- 回复
- 收藏
-
12月10日夜、米株が取引を開始し、市場の注目はグーグルに集中し、終値までにグーグルA(GOOGL)は5.59%上昇し、185.17ドルだった。その時価総額は一夜にして1120億ドル(約8120億元)も大幅に増加した。 情報面で ...
- 内托体头
- 昨天 11:15
- 支持
- 反对
- 回复
- 收藏
-
①北京時間の今夜21時30分に発表された米国の11月CPIデータを、「2024年最後の重量級の米国経済指標」にたとえても、誇張ではないようだ。②FRBが来週12月の金利決定会合を開催するにあたり、今晩のCPIもFRBが金利 ...
- 不正经的工程师
- 昨天 10:29
- 支持
- 反对
- 回复
- 收藏