Google's Big Model Has Finally taken a Big Step Gemini vs GPT-4

On December 6th, US time, Google officially released the Gemini model. Google CEO Sundar Pichai stated that this is Google's most powerful and versatile model to date.
It has been one year and one week since ChatGPT was released. With the release of ChatGPT, OpenAI has become the most dazzling company in the field of artificial intelligence, especially in the field of large models. It is also a catch up target for all other technology companies, including Google.
For the past eight years, Google has been using AI first as its corporate strategy, and AlphaGo, which defeated the human Go champion in 2016, was created by Google. It is not an exaggeration to say that Google has sparked a wave of AI that has changed the development of the entire AI industry, but now it urgently needs to prove itself in the field of big models.
It is reported that the Gemini 1.0 version includes three different sizes, namely Gemini Ultra, Gemini Pro, and Gemini Nano. Among them, the Gemini Nano is mainly used on the device side, and the Pixel 8 Pro will be the first smartphone equipped with the Gemini Nano; Gemini Pro is suitable for expanding in various tasks, and Google plans to use Gemini Pro to upgrade its chatbot Bard, as well as more Google products including search, advertising, Chrome, and more.
For the most powerful Gemini Ultra, Google stated that it is currently undergoing trust and security checks, as well as further refining the model through fine-tuning and human feedback based reinforcement learning (RLHF). It is expected to be launched to developers and enterprise customers early next year.
Sandal Pichai stated that the release of Gemini is an important milestone in the development of artificial intelligence and the beginning of a new era for Gu Ge.
Beyond GPT-4?
According to Demis Hassabis, CEO of Google DeepMind, Gemini is a multimodal model built by the Google team from scratch, which means it can summarize and seamlessly understand and process different types of information, including text, code, audio, images, and videos.
In terms of performance testing, Gemini Ultra outperformed the current best performance in 30 out of 32 benchmark tests for large language models. Additionally, in MMLU (Massive Multi Task Language Understanding), Gemini Ultra scored 90%, becoming the first large model to surpass human experts.
Demis Hassabis stated that during the testing of image benchmarks, the Gemini Ultra surpassed previously state-of-the-art models without the help of image character recognition (OCR) systems. These benchmark tests highlight Gemini's multimodal ability and also show early signs of its more complex reasoning ability.
At present, the standard method for creating multimodal models is mainly to train individual components of different modalities and then concatenate them together. But the result of this operation is that these models sometimes perform well in performing certain tasks (such as describing images), but often find it difficult to handle more complex reasoning.
"We designed Gemini as a native multimodal model, which was pre trained for different modalities from the beginning, and then we fine tuned it with additional multimodal data to further improve its performance." Demis Hassabis explained, "This helps Gemini seamlessly understand and reason various inputs from the beginning, far superior to existing multimodal models, and its capabilities have reached the most advanced level in almost all fields."
For example, in terms of reasoning, Gemini 1.0 can understand complex written and visual information. By reading, filtering, and understanding information, it can extract insights from hundreds of thousands of documents.
In addition, Gemini 1.0 has been trained to recognize and understand text, images, audio, etc. at the same time, so it can better understand subtle information and answer questions related to complex topics, such as reasoning in complex disciplines such as mathematics and physics.
In terms of coding, Gemini 1.0 is able to understand, interpret, and generate high-quality code for the world's most popular programming languages, such as Python, Java, C++, and Go. Two years ago, Google launched the AI code generation platform AlphaCode. Now, with the help of Gemini, the platform has iterated to AlphaCode 2, and its performance has been greatly improved, which can solve almost twice the number of problems before.
Still continuously optimizing security
Sandal Pichai stated that millions of people are now using generative AI in Google products to do things they couldn't do a year ago, from answering more complex questions to collaborating and creating with new tools. At the same time, developers are using Google's models and infrastructure to build new generative AI applications, and startups and businesses around the world are also continuously growing using Google's AI tools.
In its view, this trend is already somewhat unbelievable, but it is only the beginning.
"We are boldly and responsibly carrying out this work. This means that our research needs to be ambitious, pursuing the ability to bring enormous benefits to humanity and society, while also establishing safeguards and collaborating with governments and experts to address the risks that arise as AI becomes stronger," said Sandal Pichai.
Therefore, during the development process of Gemini, Google also strengthened its security review work. Demis Hassabis introduced that based on Google's AI principles and product security policies, the Google team is adding new protection measures to Gemini's multimodal capabilities.
Not only that, Demis Hassabis also emphasized that at every stage of development, Google considers potential risks and strives to test and mitigate them.
It is reported that Gemini has the most comprehensive security assessment among all Google AI models to date, including assessment of bias and harmful information. Meanwhile, in order to identify blind spots in internal evaluation methods, Google is also collaborating with various external experts and teams to conduct stress tests on the Gemini model on various issues.
Another noteworthy point is that Gemini's training is based on Google's own Tensor Processing Units (TPUs) - v4 and v5e. On these TPUs, Gemini runs faster and has lower costs than previous models from Google. So in addition to the new model, Google has also announced the launch of a new TPU system - Cloud TPU v5p, which is designed specifically for training cutting-edge AI models and will also be used for Gemini development.
Industry insiders have told reporters that although Google's Gemini has surpassed GPT-4 in many aspects of performance, there is still a time gap between it and OpenAI. GPT-4 has been released for more than half a year, and the new generation model should also be in the development process.
"So for Google, comparing various benchmark tests with GPT-4 is only one aspect of demonstrating its current capabilities, and whether it can rely on its own accumulation and powerful resources to shorten the time gap with OpenAI is the key," the person pointed out. In addition, as a new infrastructure built by Google in the era of big models, whether Gemini can meet the needs of daily users and enterprise customers is the true standard for testing Gemini's capabilities, rather than testing data.
Demis Hassabis said that Google has started experimenting with Gemini in search, which makes user search generation experience faster, reducing latency by 40% in English searches in the United States, and also improving quality.
And in the process of accelerating the landing of Gemini 1.0, Google is also further expanding its future version's features, including adding context windows to process more information and provide better response.

浏览过的版块