Musk's Action Counterattacks Open Source's Top Model Pressure on OpenAI
六月清晨搅
发表于 2024-3-18 13:13:44
271
0
0
It seems that Musk made a completely different choice from Altman to demonstrate his unwavering commitment to open source AI models. On March 17th, Musk announced the open-source Grok-1, making it the largest open-source large language model with the largest number of parameters currently available, with 314 billion parameters, far exceeding the 175 billion of OpenAI GPT-3.5.
Interestingly, Grok-1 announced that its open source cover image will be generated by Midjournal, making it an "AI help AI".
Musk, who has been roast that OpenAI is not open, naturally wants to insinuate something on the social platform, "We want to know more about the open part of OpenAI."
Grok-1 follows the Apache 2.0 protocol to open model weights and architecture. This means that it allows users to freely use, modify, and distribute the software, whether for personal or commercial use. This openness encourages broader research and application development. Since its release, the project has won 6.5k stars on GitHub and its popularity is still increasing.
The project description clearly emphasizes that since Grok-1 is a large-scale (314B parameter) model, a machine with sufficient GPU memory is needed to test the model using the example code. Netizens suggest that this may require a machine with 628 GB of GPU memory.
In addition, the implementation efficiency of the MoE layer in this repository is not high, and the reason for choosing this implementation is to avoid the need for a custom kernel to verify the correctness of the model.
Currently, popular open source models include Meta's Llama2 and France's Mistral. Generally speaking, releasing open-source models helps the community conduct large-scale testing and feedback, which means that the iteration speed of the model itself can also be accelerated.
Grok-1 is a Mixture of Experts (MOE) big model developed by xAI, an AI startup under Musk, over the past four months. Review the development process of this model:
After announcing the establishment of xAI, researchers first trained a prototype language model (Grok-0) with 33 billion parameters. This model approached the capabilities of LLaMA2 (70B) on the standard language model testing benchmark, but used fewer training resources;
Subsequently, researchers made significant improvements to the reasoning and encoding capabilities of the model, ultimately developing Grok-1 and releasing it in November 2023. This is a more powerful SOTA language model that achieved 63.2% performance in HumanEval encoding tasks and 73% in MMLU, surpassing all other models in its computational class, including ChatGPT-3.5 and Inflection-1.
What are the advantages of Grok-1 compared to other large models?
XAI emphasizes that Grok-1 is their own large model trained from scratch, that is, starting from October 2023, using custom training stacks to train on JAX and Rust without fine-tuning for specific tasks (such as conversations);
A unique and fundamental advantage of Grok-1 is that it can understand the world in real-time through the X platform, which allows it to answer spicy questions rejected by most other AI systems. The training data used in the released version of Grok-1 comes from the Internet data as of the third quarter of 2023 and the data provided by the AI trainers of xAI;
The Mixture of Experts model with 314 billion parameters has an active weight ratio of 25% for each token, providing it with powerful language comprehension and generation capabilities due to its large number of parameters.
XAI previously introduced that Grok-1 will serve as the engine behind Grok for natural language processing tasks, including question answering, information retrieval, creative writing, and encoding assistance. In the future, the understanding and retrieval of long context, as well as multimodal ability, will be one of the directions that this model will explore.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Faraday Future plans to launch the first model of its second brand by the end of next year
- Will a third brand launch hybrid models overseas? NIO responds: Continuing the pure electric technology route
- He Xiaopeng: Xiaopeng's car end large model aims to achieve a 100 kilometer takeover once next year
- Faraday Future: Second brand FX plans to launch two models with a price not exceeding $50000
- Robin Lee: The average daily adjustment amount of Wenxin Model exceeded 1.5 billion, 30 times more than that of a year ago
- Will DeepMind's open-source biomolecule prediction model win the Nobel Prize and ignite a wave of AI pharmaceuticals?
- "AI new generation" big model manufacturer Qi "roll" agent, Robin Lee said that it will usher in an era of "making money by thinking"
- Robin Lee said that the illusion of the big model has basically eliminated the actual measurement of ERNIE Bot?
- AI Weekly | Yang Zhilin claims that Kimi has over 36 million monthly active users; Robin Lee: The illusion of big model is basically eliminated
-
今日になっても、世界中のスタンダード500指数投資家は、データセンターのサーバーメーカーである超マイクロコンピュータの説明を待っていない--なぜ一時上昇幅が大きかったAI概念株が、スタンダード500指数に入っ ...
- 就放荡不羁就h
- 前天 12:46
- 支持
- 反对
- 回复
- 收藏
-
【第3四半期の損失は11.39億元に縮小し、領克極クリプトンの安聡明さを統合する予定:来年損益のバランスを実現する】業績報によると、極クリプトンの2024年第3四半期の総売上高は183.6億元で、前年同期比31%増加し ...
- 小姆
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
財中社は11月15日、BOSS直招(02076/BZ)が公告を発表し、会社が2024年11月14日に初めて公開したアフターサービス株式計画に基づいて、自主的に市場でA類普通株37万株を買い戻し、会社の発行済みおよび流通済み株式 ...
- 孤独的男孩
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
11月14日、ダダグループ(ナスダックコード:DADA)は2024年第3四半期の業績報告を発表し、第3四半期、同社の売上高は前年同期比7.3%減の24.294億元、前年同期は26.218億元、Non-GAAPの純利益は5900万元の赤字で、 ...
- 东篱585
- 3 天前
- 支持
- 反对
- 回复
- 收藏