A fresh move or all? Google Developer Conference launches 22 consecutive moves to counter OpenAI

Faced with the sudden release of precise "sniping" by OpenAI, Google introduced 22 new features/products in a row at its annual I/O developer conference in the early morning of May 15th Beijing time, intending to use the "multi-point flowering" tactic to grab the attention of users who have been taken away from OpenAI.
Compared to May 14th, OpenAI showcased the stunning interactive capabilities brought by GPT-4o in a 26 minute online live broadcast. The live speech at the Google Developer Conference lasted for 1 hour and 52 minutes, with product line leaders taking turns to showcase Google's capabilities in intelligent assistants, video generation, image generation, music creation, AI search, and more. There were as many as 22 new features and upgrades related to the event.
A reporter from New Beijing News and Shell Finance browsed through the entire press conference and found that Google has launched many impressive new features and concepts, such as Project Astra, an intelligent assistant that helps owners answer questions through mobile phone cameras or AR glasses; Veo, a video model benchmarking Sora; New AI search methods such as ask Photos feature and direct integration of Gemini into Android underlying architecture.
However, as a veteran search engine and the previous AI leader, Google has not forgotten its "original intention" of doing search. Liz Reid, the head of Google's search business, demonstrated a series of new features combining search and AI on site and left a sentence "just ask". "Google can help you search, investigate, plan, brainstorm... all you need to do is ask."
The AI intelligent assistant Astra can solve problems and find things through the camera, but it is for video demonstrations
At the press conference, Demis Hassabis, co-founder and CEO of DeepMind, presented a video. In the video, a tester holding a mobile phone or wearing VR glasses "looks" at the surrounding scenery while asking Google AI Assistant questions, such as "Tell me when you see something that can make a sound." The intelligent assistant Project Astra, equipped with the large model Gemini, answers fluently, such as "This is a speaker." The tester directly drew a red arrow on the black speaker of the speaker on the screen: "What is this called?" "High frequency speaker."
In this presentation, Google AI Assistant's performance is comparable to that of a human expert. Even when the user looks out of the window, the intelligent assistant immediately gives the user's detailed address: "This is obviously the King's Cross Road area in London." At the same time, it can also understand painting and images, such as giving advice on a system flowchart written on a whiteboard. "Adding cache between servers and databases can improve speed.".
Demis stated that Project Astra is the prototype of his AI assistant, which he has been looking forward to for decades, and the future of general AI. "AI personal assistants can process information faster by continuously encoding video frames, combining video and voice inputs into event timelines, and caching this information for effective recall."
Google CEO Sundar Pichai has stated that Google plans to add Astra features to its Gemini applications and products starting this year. However, he also emphasized that although the ultimate goal is to "achieve seamless connectivity in Astra's software," the product will be cautiously launched and "the commercialization path will be driven by quality.".
However, Astra seems to have not reflected the GPT-4o's ability to understand user emotions as demonstrated the previous day, and OpenAI's live broadcast was a live demonstration, while Astra's functionality is only reflected in the video. Of course, Demis firmly stated that the demonstration video has not been forged or tampered with.
Pichai stated that Project Astra's multimedia chat feature will appear on Gemini chatbots later this year.
Launch Gemini 1.5 Pro large model with long text doubling from 1 million tokens to 2 million tokens
Behind Google Smart Assistant, the Google big model Gemini has also been upgraded. At this developer conference, Pichai announced a major update regarding the Gemini 1.5 Pro. Firstly, Google has increased the context length of Gemini 1.5 Pro from the original 1 million tokens (statement units) to 2 million tokens. This upgrade will greatly enhance its data processing capabilities, making the model more adept at handling more complex and massive data.
The upgraded Gemini 1.5 Pro has achieved significant improvements in multiple public benchmark tests, particularly in image and video understanding, demonstrating advanced performance. This model can not only understand the text content, but also accurately interpret the information in images and videos.
It is understood that Gemini 1.5 Pro can infer video images and audio uploaded in Google AI Studio. In addition, Google has integrated 1.5 Pro into Google products, such as Gemini Advanced and Workspace applications. In terms of fees, the Gemini 1.5 Pro charges $3.5 per 1 million tokens.
Google has also launched the Gemini 1.5 Flash, which has been optimized for speed and efficiency. This is the Gemini series model that can provide the fastest API (interface) speed. It is optimized for large-scale, large-scale, and high-frequency tasks, providing more cost-effective services, and has a long text window of 1 million tokens.
Google announced that Gemini 1.5 Pro will be open to developers worldwide. This means that both professional developers and amateur enthusiasts can have a deeper understanding and use of this powerful model.
Wensheng everything? Show off muscles comprehensively in the fields of video, pictures, and music
In addition to benchmarking against the new intelligent assistant feature launched by OpenAI the day before, Google also showcased a series of AI generated big models, including the Veno, a cultural and video model benchmarking against Sora, the Music AI Sandbox, an AI music creation tool benchmarking against Suno, and Google's highest quality cultural and graphic model, Imagen 3.
Among them, the most anticipated one was Google's Wensheng Video Model, and when Demis showed off the Veo icon, the audience erupted with the most enthusiastic applause.
Demis introduced that Veo is the culmination of technology in the field of video generation, including various technologies developed by Google for generating query networks over the years. With just one text, image, or video prompt, Veo can generate and edit high-quality 1080p videos of different visual styles for over 70 seconds, and the video length can be extended arbitrarily.
The Veo generated video displayed by Google at the press conference is a set of shots of a car driving from a cyberpunk style night to a modern realistic style day. The video is relatively blurry in the dark part and clear enough in the day part, with high quality. However, a reporter from Shell Finance noticed that most of the time the video was focused on the rear of the car following the camera, and the performance quality of the video was relatively less refined than Sora, with more shots from different angles.
According to the promotional video, the film director also used Veo. "Veo helps us turn inspiration into reality." The film director said, "Artificial intelligence can help us quickly identify errors in our ideas and correct them, improving efficiency." Google stated that with a deep understanding of natural language and visual semantics, the Veo model has made breakthroughs in understanding video content, rendering high-definition images, simulating physical principles, and other aspects. The videos generated by Veo can accurately and meticulously express the user's creative intention.
Starting from May 15th, Google will provide a preview version of Veo for some creators in VideoFX, and creators can join Google's waiting list. Google will also introduce some features of Veo into products such as YouTube short videos.
It is worth noting that in response to the previous news that OpenAI relies on YouTube video content to train the Sora model (Google is the parent company of YouTube), Pichai stated that if Google confirms the authenticity of this news, it will "solve this problem".
"All you need to do is ask"
Pichai mentioned in his speech that one of the most exciting changes brought about by Gemini is in Google search. "One of our biggest investment and innovation areas is our founding product - search." Pichai recalls that Google created search 25 years ago, and now in the Gemini era, search has also reached a new level.
Pichai demonstrated a new feature called "Ask Photos" on site. When users pay in the parking lot but forget their license plate number, they may search for keywords in their phone photos and browse through a large number of past photos to find the license plate. But now, Google Album is smart enough to determine which car is the expected one based on its location, the number of times it has appeared in photos over the years, and other data, and to return the actual license plate number and verify its image in text replies.
Another new feature is AI Overview, which presents users with complete answers including viewpoints, insights, and links compared to traditional search engine results. Users can input questions in the search box to obtain an AI summary answer, and can handle ultra long questions.
If users want to find a suitable yoga or Pilates studio, they need to consider factors such as time, price, and distance simultaneously. AI search can help users extract and integrate information and present it in the AI search overview, ultimately displaying discount details for the best yoga studio in Boston, walking time from home, and saving users several hours of time. This function is also applicable to travel, gathering and other planning, as well as the formulation of dining plans.
Pichai said that Google's AI search overview has three unique advantages: real-time information, ranking and quality system, and Gemini model capabilities. The AI overview function will be gradually opened to users in the United States and various countries.
In addition, Google will soon launch a video search feature. Rose Yao, Vice President of Search Products, demonstrated on site the method of shooting a broken phonograph through a mobile phone camera and then asking Google questions. She received responses on where the phonograph was broken and how to repair it.
It is worth noting that as the developer of the Android system, Google has stated its intention to do "system level AI", which means using Gemini at the bottom of the Android system. When Gemini runs at the system level, users will not need to install any AI applications, but will directly enjoy related functions in the mobile operating system.
For example, when a user is watching a video, their phone can pop up a prompt asking if they want to know more about the video. When the user asks about the details in the video, Gemini can directly find the answer from the video.
Google specifically emphasizes that these experiences are only available on Android phones, seeming to be in direct opposition to OpenAI's use of Apple phones and computers for demonstrations. The "immortal fight" between Google and OpenAI will continue to be fought on the operating system side.
However, Pichai also stated in a post meeting interview that Google does not rule out maintaining a partnership with Apple. "We have always been committed to providing an excellent experience for the Apple ecosystem, and I believe we have many ways to ensure that our products are accessible. Today, we see that AI overview has become a popular feature on iOS, so we will continue to work hard."

浏览过的版块