Open source community watershed: Meta model Llama 3 with the highest release parameters or up to 400 billion
六月清晨搅
发表于 2024-4-19 16:12:24
207
0
0
In order to maintain the company's position in the field of AI (artificial intelligence) open source big models, social media giant Meta has launched its latest open source model.
On April 18th local time, Meta announced on its official website the release of its latest large model, Llama 3. At present, Llama 3 has opened two small parameter versions, 8 billion (8B) and 70 billion (70B), with a context window of 8k. Meta stated that by using higher quality training data and fine-tuning instructions, Llama 3 has achieved a "significant improvement" compared to the previous generation Llama 2.
In the future, Meta will launch a larger parameter version of Llama 3, which will have over 400 billion parameters. Meta will also introduce new features such as multimodality for Llama 3 in the future, including longer context windows and Llama 3 research papers.
Meta wrote in the announcement, "Through Llama 3, we are committed to building open-source models that can compete with today's best proprietary models. We want to handle developer feedback, improve the overall practicality of Llama 3, and continue to play a leading role in responsible use and deployment of LLM (Large Language Models)."
On the 18th, Meta's stock price (Nasdaq: META) closed at $501.80 per share, up 1.54%, with a total market value of $1.28 trillion.
"The best open source big model currently on the market"
According to Meta, Llama 3 has demonstrated state-of-the-art performance on various industry benchmarks, providing new features including improved inference capabilities, and is currently the best open-source large model on the market.
At the architecture level, Llama3 has chosen the standard decoder only Transformer architecture, using a tokenizer that includes a 128K token vocabulary. Llama 3 was pre trained on two 24K GPU clusters created by Meta, using over 15T of publicly available data, including 5% non English data covering over 30 languages. The training data volume was seven times that of the previous generation Llama 2, and the code included was four times that of Llama 2.
According to Meta's test results, the Llama 38B model outperforms Gemma 7B and Mistral 7B Instrument on multiple performance benchmarks such as MMLU, GPQA, and HumanEval, while the 70B model surpasses the well-known closed source model Claude 3's intermediate version Sonnet, with three wins and two losses compared to Google's Gemini Pro 1.5.
Llama 3 performs exceptionally well on multiple performance benchmarks. Source: Meta official website
In addition to conventional datasets, Meta is also committed to optimizing the performance of Llama 3 in practical scenarios, and has specifically developed a high-quality manual testing set for this purpose. This test set contains 1800 pieces of data, covering 12 key use cases such as seeking advice, closed ended question answering, brainstorming, coding, and writing, and is kept confidential by the development team.
In this test set, the results show that Llama 3 outperforms Llama 2 significantly and also surpasses well-known models such as Claude 3 Sonnet, Mistral Medium, and GPT-3.5.
Llama 3 achieved excellent results on the manual test set. Source: Meta official website
Although the 400B+model of Llama 3 is still being trained, Meta has also demonstrated some of its testing results, seemingly aimed at benchmarking against the strongest version of Claude 3, Opus. However, Meta has not released the comparison results between the Llama 3 larger parameter model and GPT-4 equivalent specification players.
The 400B+model of Llama 3 is still being trained. Source: Meta official website
The Llama 3 model will soon be available to developers on Amazon AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson X, Amazon Azure, Nvidia NIM, and Snowflake, and will receive hardware platform support from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm. In order for Llama 3 to be developed responsibly, Meta will also provide new trust and security tools, including Llama Guard 2, Code Shield, and CyberSec Eval 2.
Meanwhile, Meta has released an official web version of Meta AI based on Llama3. At present, the platform is still in its early stages, with only two major functions: dialogue and painting. Users do not need to register to use the dialogue function, while using the painting function requires users to register and log in to an account.
Injecting vitality into the open source community
Meta's AI path has always been closely linked to open source, and once Llama 3 was launched, it was warmly welcomed by the open source community.
Although there are some roast about the size of Llama 3's 8k context window, Meta said that it would soon expand the Llama 3's context window. Matt Shumer, CEO and co-founder of email startup Otherside AI, is also optimistic about this and said, "We are entering a new world where GPT-4 level models are open source and accessible for free."
According to Jim Fan, a senior research scientist at Nvidia, the upcoming larger parameter Llama 3 model marks a "watershed" for the open source community, which can change the decision-making methods of many academic research and startups, and "is expected to see a surge in vitality throughout the entire ecosystem.".
However, it is worth noting that Meta has not released the training data for Llama 3, only stating that it is entirely from publicly available data. Strictly speaking, so-called "open source" software should be fully open to the public during the development and distribution process, including the source code of software products, training data, and other content. Previously, the "strongest open source model" DBRX released by data company Databricks not only had standard configurations far beyond ordinary computers, but also had this issue.
The launch of Llama 3 closely follows the progress made by Meta's self-developed chips. Just last week, Meta announced the latest version of its self-developed chip MTIA. MTIA is a customized chip series designed by Meta specifically for AI training and inference work. Compared to the Meta's first generation AI inference accelerator MTIA v1, which was officially announced in May last year, the latest version of the chip has significantly improved performance, specifically designed for the ranking and recommendation system of Meta's social software. Analysis indicates that Meta's goal is to reduce dependence on chip manufacturers such as Nvidia.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Apple lowers prices of various iPhone models in India
- Baidu Shen Dou: Upgrade computing platform capability for 100000 card computing power cluster, Wenxin large model daily usage exceeds 700 million times
- Meta releases heavyweight new products: $299 Quest 3S headset, AR glasses prototype, multimodal AI model
- Baidu World 2024 will be held on November 12th, and the daily average number of adjustments for the Wenxin large model has exceeded 700 million times
- 挑战Model Y 蔚来的品牌下沉“阳谋”
- Ford CEO tired of making 'boring' car models, personalized and electrified products become 'new favorites'
- Dialogue | Baidu Li Tao: The overlap between automotive intelligence and the wave of big models is a historical inevitability
- Boeing announces 10% layoffs, first delivery of 777X model postponed to 2026
- Faraday Future plans to launch the first model of its second brand by the end of next year
- Will a third brand launch hybrid models overseas? NIO responds: Continuing the pure electric technology route
-
【英偉達の需要が高すぎる?SKハイニックス:黄仁勲がHBM 4チップの6カ月前納入を要求!】SKハイニックスの崔泰源(チェ・テウォン)会長は月曜日、インビダーの黄仁勲(ファン・インフン)CEOが同社の次世代高帯域 ...
- 琳271
- 昨天 17:54
- 支持
- 反对
- 回复
- 收藏
-
ファイザーが前立腺がんを治療する革新薬テゼナ& ;reg;(TALZENNA®,一般名:トルエンスルホン酸タラゾールパーリカプセル)は2024年10月29日に国家薬品監督管理局(NMPA)の承認を得て、HRR遺伝子突然変異 ...
- 什么大师特
- 5 小时前
- 支持
- 反对
- 回复
- 收藏
-
南方財経は11月5日、中央テレビのニュースによると、現地時間11月5日、米ボーイング社のストライキ労働者が59%の投票結果で新たな賃金協定を受け入れ、7週間にわたるストライキを終えた。ストライキ労働者は11月12 ...
- Dubssgshbsbdhd
- 6 小时前
- 支持
- 反对
- 回复
- 收藏
-
【マスクはテスラが携帯電話を作ることに応えた:作れるが作らないアップルとグーグルが悪さをしない限り】現地時間11月5日、有名ポッドキャストのジョローガン氏のインタビューに応じ、「携帯電話を作るのは私たち ...
- 波大老师
- 8 小时前
- 支持
- 反对
- 回复
- 收藏