Google's Big Model Has Finally taken a Big Step Gemini vs GPT-4
阿豆学长长ov
发表于 2023-12-8 10:00:39
253
0
0
On December 6th, US time, Google officially released the Gemini model. Google CEO Sundar Pichai stated that this is Google's most powerful and versatile model to date.
It has been one year and one week since ChatGPT was released. With the release of ChatGPT, OpenAI has become the most dazzling company in the field of artificial intelligence, especially in the field of large models. It is also a catch up target for all other technology companies, including Google.
For the past eight years, Google has been using AI first as its corporate strategy, and AlphaGo, which defeated the human Go champion in 2016, was created by Google. It is not an exaggeration to say that Google has sparked a wave of AI that has changed the development of the entire AI industry, but now it urgently needs to prove itself in the field of big models.
It is reported that the Gemini 1.0 version includes three different sizes, namely Gemini Ultra, Gemini Pro, and Gemini Nano. Among them, the Gemini Nano is mainly used on the device side, and the Pixel 8 Pro will be the first smartphone equipped with the Gemini Nano; Gemini Pro is suitable for expanding in various tasks, and Google plans to use Gemini Pro to upgrade its chatbot Bard, as well as more Google products including search, advertising, Chrome, and more.
For the most powerful Gemini Ultra, Google stated that it is currently undergoing trust and security checks, as well as further refining the model through fine-tuning and human feedback based reinforcement learning (RLHF). It is expected to be launched to developers and enterprise customers early next year.
Sandal Pichai stated that the release of Gemini is an important milestone in the development of artificial intelligence and the beginning of a new era for Gu Ge.
Beyond GPT-4?
According to Demis Hassabis, CEO of Google DeepMind, Gemini is a multimodal model built by the Google team from scratch, which means it can summarize and seamlessly understand and process different types of information, including text, code, audio, images, and videos.
In terms of performance testing, Gemini Ultra outperformed the current best performance in 30 out of 32 benchmark tests for large language models. Additionally, in MMLU (Massive Multi Task Language Understanding), Gemini Ultra scored 90%, becoming the first large model to surpass human experts.
Demis Hassabis stated that during the testing of image benchmarks, the Gemini Ultra surpassed previously state-of-the-art models without the help of image character recognition (OCR) systems. These benchmark tests highlight Gemini's multimodal ability and also show early signs of its more complex reasoning ability.
At present, the standard method for creating multimodal models is mainly to train individual components of different modalities and then concatenate them together. But the result of this operation is that these models sometimes perform well in performing certain tasks (such as describing images), but often find it difficult to handle more complex reasoning.
"We designed Gemini as a native multimodal model, which was pre trained for different modalities from the beginning, and then we fine tuned it with additional multimodal data to further improve its performance." Demis Hassabis explained, "This helps Gemini seamlessly understand and reason various inputs from the beginning, far superior to existing multimodal models, and its capabilities have reached the most advanced level in almost all fields."
For example, in terms of reasoning, Gemini 1.0 can understand complex written and visual information. By reading, filtering, and understanding information, it can extract insights from hundreds of thousands of documents.
In addition, Gemini 1.0 has been trained to recognize and understand text, images, audio, etc. at the same time, so it can better understand subtle information and answer questions related to complex topics, such as reasoning in complex disciplines such as mathematics and physics.
In terms of coding, Gemini 1.0 is able to understand, interpret, and generate high-quality code for the world's most popular programming languages, such as Python, Java, C++, and Go. Two years ago, Google launched the AI code generation platform AlphaCode. Now, with the help of Gemini, the platform has iterated to AlphaCode 2, and its performance has been greatly improved, which can solve almost twice the number of problems before.
Still continuously optimizing security
Sandal Pichai stated that millions of people are now using generative AI in Google products to do things they couldn't do a year ago, from answering more complex questions to collaborating and creating with new tools. At the same time, developers are using Google's models and infrastructure to build new generative AI applications, and startups and businesses around the world are also continuously growing using Google's AI tools.
In its view, this trend is already somewhat unbelievable, but it is only the beginning.
"We are boldly and responsibly carrying out this work. This means that our research needs to be ambitious, pursuing the ability to bring enormous benefits to humanity and society, while also establishing safeguards and collaborating with governments and experts to address the risks that arise as AI becomes stronger," said Sandal Pichai.
Therefore, during the development process of Gemini, Google also strengthened its security review work. Demis Hassabis introduced that based on Google's AI principles and product security policies, the Google team is adding new protection measures to Gemini's multimodal capabilities.
Not only that, Demis Hassabis also emphasized that at every stage of development, Google considers potential risks and strives to test and mitigate them.
It is reported that Gemini has the most comprehensive security assessment among all Google AI models to date, including assessment of bias and harmful information. Meanwhile, in order to identify blind spots in internal evaluation methods, Google is also collaborating with various external experts and teams to conduct stress tests on the Gemini model on various issues.
Another noteworthy point is that Gemini's training is based on Google's own Tensor Processing Units (TPUs) - v4 and v5e. On these TPUs, Gemini runs faster and has lower costs than previous models from Google. So in addition to the new model, Google has also announced the launch of a new TPU system - Cloud TPU v5p, which is designed specifically for training cutting-edge AI models and will also be used for Gemini development.
Industry insiders have told reporters that although Google's Gemini has surpassed GPT-4 in many aspects of performance, there is still a time gap between it and OpenAI. GPT-4 has been released for more than half a year, and the new generation model should also be in the development process.
"So for Google, comparing various benchmark tests with GPT-4 is only one aspect of demonstrating its current capabilities, and whether it can rely on its own accumulation and powerful resources to shorten the time gap with OpenAI is the key," the person pointed out. In addition, as a new infrastructure built by Google in the era of big models, whether Gemini can meet the needs of daily users and enterprise customers is the true standard for testing Gemini's capabilities, rather than testing data.
Demis Hassabis said that Google has started experimenting with Gemini in search, which makes user search generation experience faster, reducing latency by 40% in English searches in the United States, and also improving quality.
And in the process of accelerating the landing of Gemini 1.0, Google is also further expanding its future version's features, including adding context windows to process more information and provide better response.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- In October of this year, Tesla Model Y won the sales championship for first tier and new first tier city models
- Alibaba CEO Wu Yongming: AI development requires a batch of open-source models of different scales and fields
- Baidu's Q3 core net profit increased by 17%, exceeding expectations. Wenxin's large model daily usage reached 1.5 billion
- The delivery fee pricing has been lowered to 6 yuan, and McDonald's has adjusted the McDonald's delivery fee model
- Ideal Automobile implements a limited time zero interest policy for all models for the first time
- OpenAI launches full health version of the o1 big model and $200 per month ChatGPT Pro
- OpenAI has Rocket again! Officially launched Sora, an AI video generation model
- Google releases its most powerful model to attack OpenAI, shifting focus to AI agents
- Challenge OpenAI, Google's new move! Significantly updated generative AI, launching video model VEO 2 and the latest version Imagen3
- Is it increasingly difficult to distinguish between truth and falsehood? Google launches new generation video generation model Veo 2
-
隔夜株式市場 世界の主要指数は金曜日に多くが下落し、最新のインフレデータが減速の兆しを示したおかげで、米株3大指数は大幅に回復し、いずれも1%超上昇した。 金曜日に発表されたデータによると、米国の11月のPC ...
- SNT
- 前天 12:48
- 支持
- 反对
- 回复
- 收藏
-
長年にわたって、昔の消金大手の捷信消金の再編がようやく地に着いた。 天津銀行の発表によると、同行は京東傘下の2社、対外貿易信託などと捷信消金再編に参加する。再編が完了すると、京東の持ち株比率は65%に達し ...
- SNT
- 前天 12:09
- 支持
- 反对
- 回复
- 收藏
-
【GPT-5屋台で大きな問題:数億ドルを燃やした後、OpenAIは牛が吹くのが早いことを発見した】OpenAIのGPT-5プロジェクト(Orion)はすでに18カ月を超える準備をしており、関係者によると、このプロジェクトは現在進 ...
- SNT
- 9 小时前
- 支持
- 反对
- 回复
- 收藏
-
【英偉達はExBody 2システムを発売して2足ロボットのバランスと適応能力を強化】12月18日、英偉達、MIT、カリフォルニア大学は共同で最新の研究を発表し、ロボットが「固定シナリオ」による運動限界を打破し、ロボ ...
- smile929
- 4 小时前
- 支持
- 反对
- 回复
- 收藏