Google's late night amplification trick! Can the most powerful model Gemini defeat GPT4?

In a video, a man makes a slow motion of leaning back and dodging, and the AI immediately guesses that this is the scene of "bullet time" in "Matrix".
When humans pick up a paintbrush and sketch a duck on a piece of paper, they paint it blue. This time the AI said, "This is not a common color for ducks."
Three empty cups were placed side by side on the table, and a blue paper ball was stuffed into one of the cups. After a dazzling human operation, the AI accurately guessed: "The paper ball is in the leftmost cup!"
"Hey" give AI a handwritten physics question, which not only can be understood, but also can distinguish between right and wrong handwritten answers, and provide explanations for step-by-step problem-solving details.
Upload ingredient images and voice input, AI can not only guide you in cooking, but also provide corresponding suggestions at different stages.
These clips are scenes of Google demonstrating its latest AI model Gemini released on December 6th.
"We are getting closer and closer to the vision of a new generation of artificial intelligence models." After a series of video demonstrations, Eli Collins, Vice President of Google DeepMind Products, told media including First Financial that this is Google's most powerful and versatile large model to date.
Compared to existing large models on the market, Gemini has been created as a multimodal model from the beginning, which means it can summarize and smoothly understand, manipulate, and combine different types of information, including text, code, audio, images, and videos. In terms of flexibility, it can run from the data center to mobile devices.
Gemini is seen as Google's "big move" in the field of AI big models. Years ago, Google sparked a wave of AI worldwide with its stunning performance in AlphaGo. But this time, Google is facing considerable pressure in the new wave of AI triggered by OpenAI's ChatGPT. Currently, Google urgently needs a phenomenal AI product to prove its strength in the field of artificial intelligence.
Can we defeat GPT4?
Just before Google released its latest big model, Microsoft announced a major upgrade to its AI assistant Copilot, which will integrate OpenAI's latest model, GPT-4 Turbo.
"Being late is better than not! Finally, there is a strong contender for the OpenAI throne." After Google announced the news, Nvidia AI scientist Jim Fan (Fan Linxi) immediately forwarded and commented.
Google CEO Sundar Pichai commented that Gemini's new era model represents one of Google's greatest efforts in science and engineering as a company. He also mentioned that this is the first time Google's vision was realized when it established Google DeepMind earlier this year.
In April of this year, perhaps sensing the challenges brought by OpenAI's collaboration with Microsoft, and in order to accelerate the achievement of General Artificial Intelligence (AGI) goals, Google merged the Google Brain team, which had previously given birth to TensorFlow and Transformer, with the DeepMind team, which had sparked the previous wave of AI with AlphaGo and created AlphaFold to predict protein folding, to form Google DeepMind. This team was also ridiculed by the outside world as the "AI Avengers Alliance". Eli Collins, the former Google AI product manager, began serving as the Vice President of Products for the new team at that time.
Today, Google Deepmind released its first version of Gemini 1.0, which has been optimized for different sizes, namely Ultra, Pro, and Nano. Gemini Ultra is currently the largest and most powerful model on Google, suitable for highly complex tasks; Gemini Pro is a model suitable for scalable various tasks; Gemini Nano is mainly a model on the end device.
After Gemini's release, the most concerning aspect for the outside world was its challenge to OpenAI GPT4. During the interview, the reporter asked Eli Collins, "Can Gemini defeat all the big models on the market, including GPT4?"
Eli Collins stated in her response that the team has been rigorously testing and evaluating the performance of the Gemini model in various tasks. From natural image, audio, and video understanding to mathematical reasoning, Gemini Ultra's performance exceeds the current state-of-the-art level in 30 out of 32 academic benchmarks widely used in the research and development of Large Language Modeling (LLM).
He cited test results from MMLU, stating that Gemini Ultra has a score rate of 90%, making it the first model to surpass human experts in MMLU testing. MMLU combines 57 subjects including mathematics, physics, history, law, medicine, and ethics to test world knowledge and problem-solving abilities. As a comparison, the score rate of human experts is 89.8%, and the GPT4 score rate is 86.4%.
In terms of multimodality, Gemini Ultra also achieved a SOTA score of 59.4% in the new MMMU benchmark test. This benchmark test consists of multi-modal tasks across different fields, requiring a thoughtful reasoning process for large models.
The technical principles behind the multimodality of the Google Gemini large model have also attracted industry attention. Jeff Dean, Chief Scientist of Google DeepMind, and his team have written a 60 page technical report to elaborate on this.
So far, the standard method for creating multimodal models is to train components of different modalities separately and then stitch them together to roughly simulate certain functions. These models can sometimes perform specific tasks such as describing images well, but they appear inadequate in more conceptual and complex reasoning.
According to DeepMind CEO Demis Hassabis, the team has designed Gemini as a native multimodal device, pre training on different modalities from the beginning. Then, fine tune it using additional multimodal data to further improve its effectiveness. This helps Gemini to smoothly understand and reason about various input contents from the initial stage, and is superior to existing multimodal models.
Complex multimodal reasoning ability can help understand complex written and visual information. This allows it to discover difficult to distinguish knowledge content in massive amounts of data, answer questions related to complex topics, and be particularly skilled at explaining reasoning in complex subjects such as mathematics and physics.
Taking problem-solving as an example, utilizing Gemini's multimodal reasoning ability, AI can read handwritten content with messy handwriting, correctly understand the expression of problems, and convert problems and solutions into numerical typesetting. It can identify specific reasoning steps that humans make mistakes in solving problems and provide correct solutions step by step.
In addition, it has the ability to extract datasets and viewpoints from hundreds of thousands of documents by reading, filtering, and understanding information, which helps to achieve new breakthroughs in multiple fields from science to finance at a digital speed.
Behind the Gemini multimodal large model, Google's self-developed cloud chips TPUs v4 and v5e are used to train Gemini 1.0 on a large scale on an AI optimized infrastructure.
On the same day, Google also released the latest TPU system, Cloud TPU v5p, stating that the training speed is 2.8 times faster than the previous generation, which is expected to help developers and enterprise customers train large-scale generative AI models faster.
The application layer competition has just begun
At present, it seems that Google Gemini has an advantage in "scoring", but next, what is more important is the competition among major models in practical applications.
Eli Collins said in an interview with media such as First Financial that Google hopes to establish a new generation of AI models that are inspired by people's understanding and interaction with the world. Artificial intelligence is more like a helpful collaborator than a smart software.
At present, Google's chatbot Bard has integrated a fine-tuning version of Gemini Pro, providing English services in more than 170 countries and regions, and plans to expand to different modes in the coming months and support new languages and regions. At the beginning of next year, Google will also launch Bard Advanced, providing the best performance of the Gemini Ultra model.
On the mobile device side, Google's Pixel 8 Pro has become the first smartphone equipped with the Gemini Nano. It can support AI functions such as recording summary and intelligent reply, and more information applications will be launched next year.
Based on the customized version of Gemini, Google has launched the code generation system AlphaCode 2. Google claims that AlphaCode 2 has demonstrated excellent performance when faced with problems that involve not only programming but also complex mathematical and computer science theories.
In the coming months, Gemini will be applied to more Google products and services, such as Search, Ads, Chrome, and Duet AI.
It is reported that Google has started experimenting with Gemini in Search, which can provide users with a faster search generation experience (SGE). Users in the United States have reduced English search latency by 40%, while also improving quality.
Google officials have also provided answers to journalists regarding the efforts Google has taken to prevent Gemini from producing hallucinations and factual errors, or being used to create dangerous tools and other unethical purposes.
Amin Vahdat, Vice President of Google Infrastructure and Systems, told reporters that Gemini considers potential risks at all stages of development and strives to conduct testing and mitigate these risks.
He revealed that Gemini's security assessment includes bias and toxicity assessments, and applies Google Research's adversarial testing technology to help detect critical security issues before deploying Gemini.
For example, in order to diagnose content security issues during Gemini's training phase and ensure that its output complies with policies, the Google team used some benchmark tests, such as Real Toxicity Prompts, a set of benchmark tests developed by experts from the Allen Institute of AI, which included 100000 prompts with varying degrees of toxicity extracted from the network.
In addition, to reduce harm, the team has also built a dedicated security classifier to identify, label, and filter content related to violence or negative stereotypes. "In addition, we are continuing to address the known challenges faced by the model, such as factual, foundational, attributive, and collaborative."
Google did not disclose whether it will specifically customize applications for Gemini in the future, but executives told reporters that they would prefer to see users create more applications based on this technology.
Google revealed that starting from December 13th, developers and corporate clients can access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Currently, Google is conducting a large-scale trust and security check on the Gemini Ultra, including red team testing by a trusted external team, and further refining the model through fine-tuning and human feedback reinforcement learning (RLHF) before its widespread adoption. In this process, Google will provide Gemini Ultra to some customers, developers, partners, and security and accountability experts for early testing and feedback.
According to reporters, Google will provide the model to developers and corporate clients early next year.

レイアウトの多様化が活路を求め、二輪電気自動車企業の競争がエスカレート

ベゾスもっと減らせ！アマゾンの50億ドル近くの株式を売却する予定

メディア：バイデン氏は、午後8時以降の活動を中止するためには、より多くの睡眠が必要だと述べた

百人以上のビジネスエリートが連名でバイデン氏再選出馬の見通しに大きな挑戦