Google Snipes OpenAI, Concentrates Fire on Attacking AI Agents

六月清晨搅发表于 7 小时前

114 0 0

On December 12th, as OpenAI announced the full integration of ChatGPT with Apple, Google released a new generation of big model Gemini 2.0. It is worth noting that Gemini 2.0 is specifically designed for AI agents.
Google CEO Sundar Pichai stated in an open letter, "Over the past year, we have been investing in developing more 'proxy' models that can better understand the world around you, think multiple steps ahead, and perform tasks under your supervision. Today, we are pleased to welcome a new generation of models - Gemini 2.0, which is our most powerful model to date. Through new advances in multimodality, such as native image and audio output, as well as the use of native tools, we are able to build new AI agents that bring us closer to the vision of universal AI assistants
Demis Hassabis, CEO of Google DeepMind, also stated that 2025 will be the era of AI agents, and Gemini 2.0 will be the latest generation model to support our work based on agents.
At present, Gemini 2.0 version has not been officially launched, and Google has stated that it has been provided to some developers for internal testing. The Gemini 2.0 Flash experimental version, which is stronger than Gemini 1.5 Pro, was launched immediately. The experimental version has been opened on the web, and Gemini users can access Gemini 2.0 Flash through the PC end. The mobile end is about to be launched.
According to benchmark test results released by Google, in terms of multimodal image and video capabilities, as well as encoding and mathematical abilities, the Flash experimental version of Gemini 2.0 almost outperforms Gemini 1.5 Pro in all aspects, and its response speed has been doubled.
Google focuses its firepower on fiercely attacking AI intelligent agents
Through Google's latest update, we can now glimpse a corner of the glacier in its AI layout - everything for intelligent agents.
1. More powerful multimodal capabilities:
Gemini 2.0 Flash Experimental Edition not only supports multimodal inputs such as images, videos, and audio, but also multimodal outputs such as native generated images combined with text, as well as controllable multilingual text to speech (TTS) audio.
2. More professional AI search:
Google has launched a new intelligent agent feature called Deep Research in Gemini Advanced. This feature combines Google's search expertise with Gemini's advanced reasoning abilities to generate research reports around a complex topic, serving as a personal research assistant.
3. Multiple intelligent agents have been updated and launched:
Updated the intelligent agent Project Astra based on Gemini 2.0: Astra's new features include support for multilingual mixed dialogue; Ability to directly call Google Lens and map functions in Gemini applications; Improved memory ability, with up to 10 minutes of intra session memory, resulting in more coherent conversations; With the help of new streaming processing technology and native audio understanding capabilities, this intelligent agent is able to understand language with a latency close to human dialogue. It is worth noting that Astra is a forward-looking project developed by Google for the glasses project. Google mentioned that it is porting Project Astra to more mobile devices such as glasses.
Release Project Mariner, an intelligent agent for browsers: This agent is capable of understanding and inferring information on the browser screen, including pixels and web elements such as text, code, and images, and then using this information through Chrome extensions to help you complete tasks.
Release AI programming agent Jules specially designed for developers: Jules supports direct integration into GitHub workflows, allowing users to describe problems in natural language and generate code that can be merged into GitHub projects;
Release game intelligent agent: capable of real-time interpretation of screen images, providing next operation suggestions through user actions on the game screen, or directly communicating with you through voice communication while you are playing games.
Google has stated that it will expand Gemini 2.0 to more of its products early next year. The previously launched AI Overviews will integrate Gemini 2.0 to enhance complex problem-solving capabilities, including advanced mathematical formulas, multimodal queries, and programming. Limited testing has been conducted this week, and it is expected to be promoted next year and expanded to more countries and languages.

Google Snipes OpenAI, Concentrates Fire on Attacking AI Agents

馬雲は1ヶ月に2回現れてどんな信号を伝えますか。

国際金価格、過去最高を更新来年3000ドル突破か？

「トランプ2.0」が米株を狂わせる中、一部のアナリストはなぜ水を差すのか。一文で読む

グーグルが最強の量子コンピューティングチップを発表、時価総額は一夜にして8000億元を超えた！科学技術の大物マースク、ウルトラマンが「いいね」