首页 News 正文

On the early morning of August 14th Beijing time, Google officially released its intelligent voice assistant Gemini Live at the "Made by Google" conference. This feature directly challenges OpenAI's GPT-4o voice mode and marks another step towards more natural, universal, and user-friendly AI interaction.
According to Google, users can have free and smooth conversations with Gemini Live instead of relying on traditional input and output settings.
During the conversation, users can interrupt to inquire about more details or pause for a period of time before resuming.
In order to make conversations more natural, Google also offers ten voices for users to choose from. Google said, "It's like having a companion in your pocket that you can talk to about new ideas or practice important conversations with
The GPT-4o advanced voice mode previously released by Open AI also allows users to interrupt during conversations and perceive and respond to emotional fluctuations. In terms of voice settings, Open AI offers four types of voices, all produced in collaboration with professional voice actors.
In addition, Google will also connect Gemini Live with other applications and tools. Google has announced that it will launch extension features such as Keep, Tasks, Utilities, Calendar, YouTube Music, etc. in the coming weeks.
Google described the specific application scenarios of these features. For example, if a user needs to host a dinner party, Gemini Live can find specific recipes and add ingredients to the Keep shopping list, as well as customize a playlist that "reminds people of the late 1990s"; For example, by taking a photo of a concert poster, Gemini Live can answer whether the user is available on the day and remind them to buy tickets.
However, during the live demonstration of Gemini Live features at the "Made by Google" conference, there was a small incident. Google executive Dave Citron asked Gemini Live if there were any events on his schedule, but he tried Gemini Live twice in a row without any response until he changed his device for the third time before successfully demonstrating.
Currently, Google has provided an English version to Gemini premium subscribers on Android phones and will expand to iOS in the coming weeks, offering more language modes. The latest Pixel 9 series phones released by Google also feature Gemini Live functionality.
Industry insiders believe that the release of Gemini Live is an important milestone in the development of artificial intelligence interaction. By introducing voice interruption and selection functions, Google is not only competing with OpenAI, but also promoting human-computer interaction, thereby changing the competitive landscape of the artificial intelligence chatbot market and forcing other companies to create more natural, practical, and attractive artificial intelligence assistants.
At the same time, the innovative development of human-computer interaction has also brought new problems and challenges. For example, how will artificial intelligence quickly handle topic changes while maintaining contextual unity and relevance? How to handle interference information without losing important clues? More importantly, with the deepening development of artificial intelligence, where is its boundary with real life?
However, GPT-4o, which OpenAI publicly introduced three months ago, has not yet been fully implemented. On August 9th, OpenAI released a blog post about security, detailing the company's security efforts in developing GPT-4o and exploring the potential risks these technologies may pose to society.
OpenAI pointed out in the report the risks that artificial intelligence's humanoid social model may pose. OpenAI believes that users may establish social relationships with artificial intelligence and reduce the need for human interaction. This is beneficial for lonely individuals, but it can affect healthy interpersonal relationships.
OpenAI revealed that during the early testing of GPT-4o, they observed subtle changes in the interaction language between users and models, such as "This is our last day together" and so on. This seemingly harmless expression may hide bigger problems behind it.
In addition, OpenAI also mentioned that GPT-4o sometimes unintentionally generates outputs that mimic user voices, which means that AI speech engines may be used for fraud.
And these security issues are also one of the reasons why OpenAI controls the landing pace of GPT-4o. As for whether Google Gemini Live has addressed similar security risks, it has not been disclosed.
All security related risks, whether we are aware of them or the additional possibilities attached to Pandora's Box, are issues that need to be further addressed in the field of artificial intelligence to ensure that technological progress serves humanity.
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

六月清晨搅 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    30