Pika financing and Kwai online Why does Keling Apple's AI product "burn the cold stove"?
123458039
发表于 2024-6-11 19:29:04
1270
0
0
Apple Inc. (AAPL. US) launched an AI product called Apple Intelligence at the WWDC Developer Conference, but the stock price closed down 1.91% on the same day. Interestingly, on June 11th, the Sora index (8841756. WI) in Wind data increased by 1.55%.
Why is there such a difference?
Apple has chosen to avoid the current hot video models and has launched AI related updates that focus more on the text field. The rise of domestic concept stocks is closely related to the recent popularity of cultural and educational video models. Foreign companies such as Pika, a celebrity AI video generation company, have completed a new round of financing, with a total of 80 million US dollars in Series B financing. The company's valuation will exceed 470 million US dollars. In China, for example, Kwai (1024. HK) "Kering" video generation model was officially launched, adopting a technical route similar to Sora.
In the eyes of many industry insiders, Apple's focus on integrating AI text rather than video is more driven by considerations such as cost and practicality.
Apple avoids Sora's "battle zone"
The built-in large language model launched by Apple allows iPhone, iPad, and Mac to understand and generate language and images. Siri has semantic retrieval function by connecting to ChatGPT, which can intelligently search for photos, calendars, files, emails, and other content. It can also use most of ChatGPT's functions without registration.
Guo Minggui, an analyst at Tianfeng International Securities, posted a brief review stating that Apple's newly released Apple Intelligence suite demonstrates the advantages of ecological integration and interface design, which is very practical for users but only adds icing on the cake for investors. The latter is looking forward to seeing original and essential features.
Han Xu, Chief Researcher of Facewall Intelligence, told reporters that from the perspective of accessing operating systems, Apple mainly needs AI to understand human intentions and call system level interfaces. These requirements are not completely consistent with Sora's starting point, but are more compatible with the large model of multimodal input and text output. Models like Sora that generate images or videos are currently more suitable for integration with software, especially visual processing software.
Why didn't Apple join Sora's "battle"?
A person from an AIGC video application manufacturer told reporters that from a product thinking and business perspective, Apple will only focus on areas that are relatively mature and have a more significant input-output ratio to visibility. At the level of mobile hardware interaction, there are more scenarios for using text. From research and development investment to actual inference costs, this field is also relatively more cost-effective for Apple's current technological accumulation.
Another industry technician stated that today's LLM service (Large Language Model Service) has basically achieved breakeven in the field of text, but not necessarily in the field of text, graphics, and video. This is also an important reason why the Apple WWDC conference has not yet integrated video AIGC capabilities.
Compared to Apple's actions, the domestic big model market currently has high expectations for the video industry. In April this year, Professor Zhu Jun, vice president of the Artificial Intelligence Research Institute of Tsinghua University, co-founder and chief scientist of Student Digital Science and Technology, on behalf of Tsinghua University and Student Digital Science and Technology, released China's first video model Vidu. Not long ago, the video model "Kering" launched by Kwai also triggered some hot debate.
The reporter took Sora's representative video copy as the prompt word, input Kwai "Keling" to generate video contrast, take "Tokyo street girls walking" as an example, at that time Sora video had errors such as leg deformation, dislocation of leg crossing and transposition, and right leg walking in front twice in a row. Kwai "Kering" also has similar problems.
Tianfeng Securities believes that the improvement of Kwai 3D VAE+DiT architecture in computing power, model and data quality has shown that it can achieve commercial results. At the same time, the customization of time length and proportion has greatly enhanced the availability of generated materials. Although it is inferior to Sora in some complex semantic understanding, there is little difference in a slightly simple scenario.
Multimodal becomes an opportunity for China's big model race
An excellent video generation model needs to consider four core elements - model design, data assurance, computational efficiency, and the expansion of model capabilities.
Regarding the immaturity of Sora, OpenAI has stated that Sora may have difficulty accurately simulating the physical principles of complex scenes, may not understand causal relationships, may confuse spatial details of prompts, and may have difficulty accurately describing events that occur over time, such as following specific camera trajectories.
But this seems more like a common problem. Founder Wang Changhu of Aishi Technology previously stated that current video models directly learn physics knowledge from video data, but real videos often contain a lot of information, making it difficult to accurately learn each physical law separately. By adding 3D modeling information such as human hands and animal tails as constraints while inputting visual images to the model, it can assist in learning the large model and optimize the effect.
The Kelingda model adopts the native cultural and biological video technology route, replacing the combination of image generation and timing modules. At present, mainstream video generation models usually use 2D VAE with Stable Diffusion for spatial compression in hidden space encoding/decoding, but this poses significant information redundancy for videos. Therefore, the Kwai big model team has developed a 3D VAE network by itself, trying to find the balance between training performance and effect. In addition, in terms of temporal information modeling, the Kwai big model team has designed a 3D Attention mechanism as a spatio-temporal modeling module.
Tang Jiayu, CEO of Shengshu Technology, mentioned that research on multimodal large models is still in its early stages and the technological maturity is not yet high. This is different from the hot language models, as foreign countries have already taken the lead by an era. Therefore, compared to struggling with language models, Tang Jiayu believes that multimodality is an important opportunity for domestic teams to seize the big model track. This is similar to Zhou Zhifeng, a partner of Qiming Venture Capital, who also believes that today's big models have gradually moved from pure language mode to multimodal exploration.
Lin Yonghua, Vice President and Chief Engineer of Beijing Zhiyuan Artificial Intelligence Research Institute, told First Financial reporters that China has a certain possibility of overtaking on bends in the multimodal field, but the success factors of multimodal models still lie in computing power, algorithms, and data. At present, at the algorithmic level, there is not a significant difference between the Chinese and American teams, and the industry still has ways to solve computing power problems. However, obtaining massive high-quality data is still very difficult.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Apple's intelligent overseas launch, domestic manufacturers bet on AI to compete
- JD Seven Fresh's' Super Breakthrough 'Ignites Consumer Trend, 72 Hour Transaction Users and Online Orders Increase by Three Digits YoY
- Dada Group's profitability continues to improve in the third quarter. During the "Double 11" period, the daily peak of online orders delivered in seconds reached a new high
- Zaiding Pharmaceutical plans to issue 7.84 million depositary shares, with an expected financing of no more than 230 million US dollars
- Alibaba launches financing plan: plans to issue priority unsecured dual currency notes
- Deep | Zaiding Pharmaceutical is addicted to financing, with accumulated losses exceeding $2 billion
- Elon Musk completes a new round of financing with a valuation reportedly exceeding $40 billion
- Zaiding Pharmaceutical issues 7.84 million American depositary shares, raising approximately $230 million in financing
- Jiyue and Baidu plan to hold an online meeting to negotiate employee social security and compensation issues
- Quantum Song builds an "online+offline" silver hair service ecosystem
-
生成式人工知能(AI)が巻き起こす技術の波の中で、電力会社は意外にも資本市場の寵児になった。 今年のスタンダード500割株の上昇幅ランキングでは、Vistraなどの従来の電力会社が注目を集め、株価が2倍になってリ ...
- xifangczy
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
隔夜株式市場 世界の主要指数は金曜日に多くが下落し、最新のインフレデータが減速の兆しを示したおかげで、米株3大指数は大幅に回復し、いずれも1%超上昇した。 金曜日に発表されたデータによると、米国の11月のPC ...
- SNT
- 前天 12:48
- 支持
- 反对
- 回复
- 收藏
-
長年にわたって、昔の消金大手の捷信消金の再編がようやく地に着いた。 天津銀行の発表によると、同行は京東傘下の2社、対外貿易信託などと捷信消金再編に参加する。再編が完了すると、京東の持ち株比率は65%に達し ...
- SNT
- 前天 12:09
- 支持
- 反对
- 回复
- 收藏
-
【ビットコインが飛び込む!32万人超の爆倉】データによると、過去24時間で世界には32万7000人以上の爆倉があり、爆倉の総額は10億ドルを超えた。
- 断翅小蝶腥
- 3 天前
- 支持
- 反对
- 回复
- 收藏