Open source community watershed: Meta model Llama 3 with the highest release parameters or up to 400 billion

In order to maintain the company's position in the field of AI (artificial intelligence) open source big models, social media giant Meta has launched its latest open source model.
On April 18th local time, Meta announced on its official website the release of its latest large model, Llama 3. At present, Llama 3 has opened two small parameter versions, 8 billion (8B) and 70 billion (70B), with a context window of 8k. Meta stated that by using higher quality training data and fine-tuning instructions, Llama 3 has achieved a "significant improvement" compared to the previous generation Llama 2.
In the future, Meta will launch a larger parameter version of Llama 3, which will have over 400 billion parameters. Meta will also introduce new features such as multimodality for Llama 3 in the future, including longer context windows and Llama 3 research papers.
Meta wrote in the announcement, "Through Llama 3, we are committed to building open-source models that can compete with today's best proprietary models. We want to handle developer feedback, improve the overall practicality of Llama 3, and continue to play a leading role in responsible use and deployment of LLM (Large Language Models)."
On the 18th, Meta's stock price (Nasdaq: META) closed at $501.80 per share, up 1.54%, with a total market value of $1.28 trillion.
"The best open source big model currently on the market"
According to Meta, Llama 3 has demonstrated state-of-the-art performance on various industry benchmarks, providing new features including improved inference capabilities, and is currently the best open-source large model on the market.
At the architecture level, Llama3 has chosen the standard decoder only Transformer architecture, using a tokenizer that includes a 128K token vocabulary. Llama 3 was pre trained on two 24K GPU clusters created by Meta, using over 15T of publicly available data, including 5% non English data covering over 30 languages. The training data volume was seven times that of the previous generation Llama 2, and the code included was four times that of Llama 2.
According to Meta's test results, the Llama 38B model outperforms Gemma 7B and Mistral 7B Instrument on multiple performance benchmarks such as MMLU, GPQA, and HumanEval, while the 70B model surpasses the well-known closed source model Claude 3's intermediate version Sonnet, with three wins and two losses compared to Google's Gemini Pro 1.5.
Llama 3 performs exceptionally well on multiple performance benchmarks. Source: Meta official website
In addition to conventional datasets, Meta is also committed to optimizing the performance of Llama 3 in practical scenarios, and has specifically developed a high-quality manual testing set for this purpose. This test set contains 1800 pieces of data, covering 12 key use cases such as seeking advice, closed ended question answering, brainstorming, coding, and writing, and is kept confidential by the development team.
In this test set, the results show that Llama 3 outperforms Llama 2 significantly and also surpasses well-known models such as Claude 3 Sonnet, Mistral Medium, and GPT-3.5.
Llama 3 achieved excellent results on the manual test set. Source: Meta official website
Although the 400B+model of Llama 3 is still being trained, Meta has also demonstrated some of its testing results, seemingly aimed at benchmarking against the strongest version of Claude 3, Opus. However, Meta has not released the comparison results between the Llama 3 larger parameter model and GPT-4 equivalent specification players.
The 400B+model of Llama 3 is still being trained. Source: Meta official website
The Llama 3 model will soon be available to developers on Amazon AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson X, Amazon Azure, Nvidia NIM, and Snowflake, and will receive hardware platform support from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm. In order for Llama 3 to be developed responsibly, Meta will also provide new trust and security tools, including Llama Guard 2, Code Shield, and CyberSec Eval 2.
Meanwhile, Meta has released an official web version of Meta AI based on Llama3. At present, the platform is still in its early stages, with only two major functions: dialogue and painting. Users do not need to register to use the dialogue function, while using the painting function requires users to register and log in to an account.
Injecting vitality into the open source community
Meta's AI path has always been closely linked to open source, and once Llama 3 was launched, it was warmly welcomed by the open source community.
Although there are some roast about the size of Llama 3's 8k context window, Meta said that it would soon expand the Llama 3's context window. Matt Shumer, CEO and co-founder of email startup Otherside AI, is also optimistic about this and said, "We are entering a new world where GPT-4 level models are open source and accessible for free."
According to Jim Fan, a senior research scientist at Nvidia, the upcoming larger parameter Llama 3 model marks a "watershed" for the open source community, which can change the decision-making methods of many academic research and startups, and "is expected to see a surge in vitality throughout the entire ecosystem.".
However, it is worth noting that Meta has not released the training data for Llama 3, only stating that it is entirely from publicly available data. Strictly speaking, so-called "open source" software should be fully open to the public during the development and distribution process, including the source code of software products, training data, and other content. Previously, the "strongest open source model" DBRX released by data company Databricks not only had standard configurations far beyond ordinary computers, but also had this issue.
The launch of Llama 3 closely follows the progress made by Meta's self-developed chips. Just last week, Meta announced the latest version of its self-developed chip MTIA. MTIA is a customized chip series designed by Meta specifically for AI training and inference work. Compared to the Meta's first generation AI inference accelerator MTIA v1, which was officially announced in May last year, the latest version of the chip has significantly improved performance, specifically designed for the ranking and recommendation system of Meta's social software. Analysis indicates that Meta's goal is to reduce dependence on chip manufacturers such as Nvidia.

浏览过的版块