首页 News 正文

After OpenAI suddenly launched a "small model" GPT-4o mini, Meta decided to throw out its large model explosion with super large parameters.
On July 24th, Meta released the open-source large model series Llama 3.1 405B, as well as upgraded models in two sizes: 70B and 8B.
Llama 3.1 405B is considered the strongest open-source model currently available. According to the information released by Meta, the model supports a context length of 128K and has added support for eight languages. It is comparable to flagship models such as GPT-4o and Claude 3.5 Sonnet in terms of general knowledge, operability, mathematics, tool usage, and multilingual translation. Even in human evaluation comparisons, its overall performance is better than these two models.
Meanwhile, the upgraded versions of the 8B and 70B models are also multilingual and have been expanded to 128K context length.
Llama 3.1 405B is the largest model of Meta to date. Meta stated that the training of this model involves over 15 trillion tokens, and in order to achieve the desired results within a reasonable time, the team optimized the entire training stack, using over 16000 H100 GPUs - the first Llama model to be trained on such a large scale of computing power.
This difficult training objective was broken down by the team into multiple key steps. In order to ensure maximum training stability, Meta did not choose the MoE architecture (hybrid expert architecture), but instead adopted the standard Transformer model architecture with only decoders for minor adjustments.
According to Meta, the team also used an iterative post training process, supervised fine-tuning and direct preference optimization for each round, creating the highest quality synthetic data for each round to improve the performance of each ability. Compared to the previous version of Llama, the team has improved and enhanced the quantity and quality of data used before and after training.
At the same time as the explosion of Llama 3.1 405B, Mark Zuckerberg issued a statement titled "Open source AI is the way forward", emphasizing once again the significance and value of open source big models, and directly targeting big model companies such as OpenAI that have taken the path of closed source.
Zuckerberg reiterated the story of open-source Linux and closed source Unix, stating that the former supports more features and a wider ecosystem, and is the industry standard foundation for cloud computing and running most mobile device operating systems. I believe that artificial intelligence will also develop in a similar way
He pointed out that several technology companies are developing leading closed source models, but open source models are rapidly narrowing this gap. The most direct evidence is that Llama 2 was previously only comparable to outdated older generation models, but Llama 3 is now comparable to the latest models and has achieved leadership in certain fields.
He expects that starting next year, Llama 3 will become the most advanced model in the industry - and before that, Llama has already taken a leading position in openness, modifiability, and cost efficiency.
Zuckerberg cited many reasons to explain why the world needs open source models, stating that for developers, in addition to a more transparent development environment to better train, fine tune, and refine their own models, another important factor is the need for an efficient and affordable model.
He explained that for user oriented and offline inference tasks, developers can run Llama 3.1 405B on their own infrastructure at a cost of approximately 50% of closed source models such as GPT-4o.
The debate over the two major paths of open source and closed source has been discussed extensively in the industry before, but the main tone at that time was that each has its own value. Open source can benefit developers in a cost-effective way and is conducive to the technological iteration and development of large language models themselves, while closed source can concentrate resources to break through performance bottlenecks faster and deeper, and is more likely to be the first to achieve AGI (General Artificial Intelligence) than open source.
In other words, the industry generally believes that open source is difficult to catch up with closed source in terms of model performance. The emergence of Llama 3.1 405B may prompt the industry to reconsider this conclusion, which is likely to affect a large group of enterprises and developers who are already inclined to use closed source model services.
At present, Meta's ecosystem is already very large. After the launch of the Llama 3.1 model, over 25 partners will provide related services, including Amazon AWS, Nvidia Databricks、Groq、 Dell, Microsoft Azure, and Google Cloud, among others.
However, Zuckerberg's expectation for the Llama series models to be in a leading position is next year, and there is a possibility that they may be overturned by closed source models in the middle. During this period, the outside world may pay attention to closed source large models that cannot match the performance level of Llama 3.1 405B, and their current situation is indeed somewhat awkward.
He also specifically talked about the competition between China and the United States in the field of big models, believing that it is unrealistic for the United States to always lead China for several years in this area. But even a small lead of a few months can accumulate over time, giving the United States a clear advantage.
The advantage of the United States is decentralization and open innovation. Some people believe that we must close our models to prevent China from acquiring these models, but I think this will not work and will only put the United States and its allies at a disadvantage. "In Zuckerberg's view, a world with only closed models will lead to a few large companies and geopolitical rivals being able to gain leading models, while startups, universities, and small businesses will miss opportunities. In addition, restricting American innovation to closed development increases the possibility of being completely unable to lead.
On the contrary, I believe our best strategy is to establish a strong open ecosystem, allowing our leading companies to work closely with governments and allies to ensure they can make the best use of the latest developments and achieve sustainable first mover advantages in the long term, "said Zackberg.
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

楚一帆 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    38