首页 News 正文

Interface News Reporter | Zhou Shuqi
For a long time, Ideal Auto struggled to catch up with Huawei's promotional campaign on intelligent driving. This option, which was not originally included in the top three consumer purchasing decisions, has become a key player for the two car brands in the frontline competition.
Ideal car salespeople will cover up the fact that intelligent driving is relatively backward through other product features such as cockpit experience, and even avoid directly mentioning Huawei. At that time, the WENJIE New M7, which directly competed with the Ideal L series models, opened up users' awareness of intelligent driving functions and also drove brand sales at a shocking speed that shocked the industry.
But this year there has been a reversal in the direction of the plot. Ideal Automobile has become the second car manufacturer after Huawei that can be driven nationwide, surpassing the previously leading NIO and Xiaopeng.
Ideal car sales have also begun actively promoting MAX version models with advanced intelligent driving software to users. The latest data shows that the proportion of orders for AD Max purchased by users has increased from 37% in May to 49% in July, with a purchase rate of 75% for the L9 MAX version.
What makes some competitors uneasy is that Ideal Auto seems to have quickly made up for the shortcomings of intelligent driving in a very short period of time. This latecomer did not show a leading advantage in the era of traditional manual writing rules, but suddenly made a comeback after switching to the cutting-edge end-to-end technology in the industry, which has attracted much attention from the outside world.
The industry's attitude towards ideal intelligent driving is undergoing a subtle shift. A Huawei intelligent driving engineer revealed to Interface News that in the past, internal attention only focused on the research and development progress of Tesla and Xiaopeng Motors, but now Ideal Motors has also been included in the discussion topic circle.
Ideal Automobile was once the least favored company among "Weixiaoli", and its adherence to the extended range route was criticized for its outdated technology. But now its sales rank first among new forces, and the extended range route is also being adopted by more and more car companies.
The field of intelligent driving has replicated the same plot. During interviews with media outlets such as Interface News, Lang Xianpeng, Vice President of Intelligent Driving R&D at Ideal Automobile, and Jia Peng, Head of Intelligent Driving R&D, reviewed how the "poor performers" in intelligent driving have iterated three versions in less than two years, ultimately narrowing the gap with Tesla to within six months.
Due to the extreme pursuit of profit and efficiency during the initial stage, Ideal Automobile has been conservative in its investment in the field of intelligent driving and has always been a follower of the industry.
Looking at the timeline, Huawei initiated the research and development of autonomous driving technology in the year when Ideal Auto was just established. When Ideal Auto began developing its own intelligent driving system in 2021, NIO and Xiaopeng Motors had already implemented the high-speed navigation assisted driving function (NOA) at the same time.
Last year, when the industry was busy with the NOA competition in the urban area, Li Xiang, CEO of Ideal Automobile, reflected at the autumn strategic meeting held in September that Ideal Automobile had invested too late in intelligent driving. He made it clear for the first time that intelligent driving is the company's core strategy and aims to become the absolute leader in intelligent driving by 2024.
Ideal Automobile began to recruit on a large scale, becoming one of the few companies at that time that could offer high salaries and multiple positions. At that time, Ideal Auto believed that learning Huawei's legion combat mode could exchange the density of talent for the speed of research and development.
But the actual implementation process was not smooth. Over the course of a year, Ideal Auto has attempted two solutions: Neural Prior Networks (NPN) and Graph free, investing a significant amount of manpower in iterations, updates, and testing, but has yet to achieve the level of anthropomorphism. Xiaopeng Motors and Huawei successively achieved their national large-scale city opening goals at the beginning of this year, while Ideal Motors' goal of opening 100 cities without maps nationwide by the end of last year was downgraded to a commuting mode.
Continuously switching technical routes quickly made Lang Xianpeng realize the bottleneck of the technical route. In his view, when faced with infinite real-life scenarios, people can never define all situations in advance. To fundamentally solve the problem, end-to-end is currently the optimal technological path.
Unlike the traditional auto drive system, which is divided into multiple modules such as perception, planning, positioning and decision-making, the end-to-end architecture emphasizes the integration of perception and decision-making. Its biggest advantage is to reduce the loss of information transmission between modules and improve the upper limit of intelligent driving capability. Autonomous driving is truly relying on artificial intelligence rather than detailed mapping and coding to make progress.
Tesla was the first car company to switch to this cutting-edge technology, followed closely by domestic smart electric vehicle manufacturers and autonomous driving suppliers such as Huawei. At the same time as the "city opening competition", car companies are launching a new round of end-to-end competition. Ideal cars are once again attempting on a new technological path.
In the actual implementation process, Xiaopeng Motors and Huawei adopt a "segmented end-to-end" approach, replacing perception and regulation with separate models, while Tesla and Ideal Motors are more aggressive "One Model" (a large model). For the sake of safety redundancy, the ideal car did not include a control module.
But end-to-end is not enough. Lang Xianpeng told Interface News that both end-to-end and traditional perception decision models are trained or manually designed based on known data to meet scene conditions. The potential problem with this is that if it's an unfamiliar scenario, the system won't work well.
A typical practice is Jia Peng experiencing Tesla's fully autonomous driving software FSD V12.3 version in the United States. He found that there was a significant difference in user experience between FSD in cities on the east and west coasts. From Boston to New York, Tesla's intelligent driving performance has sharply declined and the takeover rate has significantly increased in cities that are unfamiliar and have more complex road conditions.
The road scene in China is more diverse than in New York. Given the limited computing power of the vehicle's chips, a standalone end-to-end model is difficult to ensure flawless operation. To make autonomous driving truly think like humans, Ideal Auto has introduced the VLM visual language model and started pre research on end-to-end+VLM dual systems since September last year.
Li Xiang first disclosed the concept of dual systems to the public at the China Automotive Chongqing Forum held in June this year. System 1 runs an end-to-end model to address the need for timely response to road condition information during driving; System 2 can read navigation maps and other information content like humans, and handle complex and generalized scenarios that require logical deduction.
Jia Peng further told Interface News that the role of VLM in the entire architecture is to provide decision results and reference trajectories to System 1, but the end-to-end model may not necessarily use this inference information. This ensures the sole decision-making power of System 1 and avoids a fight between the two systems during operation.
However, all intelligent driving teams that invest in end-to-end still need to solve the same problem: how to test and validate the capabilities of end-to-end models.
The neural network model used in end-to-end architecture is a "black box", and VLM is also a "black box". The biggest drawback of both is the unclear failure mode. This makes the upper limit of the architecture much higher than in the traditional regulatory era, but it also leads to low-level errors, making it difficult to provide a safety net.
Due to the lack of clear code for categories, filtering and searching for these issues is also more complicated. A smart driving R&D personnel explained to Interface News that without knowing where the end-to-end model is running, it is impossible to collect data and develop targeted training strategies.
The solution approach for an ideal car is to introduce a world model and test System 1 and System 2. The exam model used to verify the training results is called System 3.
On the one hand, the question bank of System 3 comes from carefully selected "real questions" and "wrong questions" from the actual driving process of ideal car owners, and the proportion of car owners who can provide these "questions" is less than 3%; On the other hand, ideal cars will form "simulation problems" through reconstruction and generation, covering more scenarios. Only after the model achieves high scores through the test, will it be gradually pushed to users.
The Ideal Car utilizes System 3 to replace the traditional road testing method of driving hundreds or thousands of kilometers on site by previous developers. This not only accelerates the model iteration speed, but also saves high labor costs.
NIO also utilizes its virtual simulation capabilities. This new force released the first domestic intelligent driving world model to the outside world last month. This model has spatial reconstruction and temporal inference capabilities, and can deduce 216 possible scenarios within 100 milliseconds to find the optimal decision.
Jia Peng pointed out that if we imitate SORA's pure video generation method, it will cause more illusions and cannot be directly used in production scenarios. The ideal car is to reconstruct the real scene, generalize and generate it based on it, and provide a reference that conforms to physical laws.
Entering the era of autonomous driving, car companies are not only competing for talent depth, but also for data and computing power, which will directly affect the end-to-end upper limit capability.
Lang Xianpeng mentioned that the ideal car has a similar model structure, which allows for consistent camera configuration and installation positions across all vehicles, enabling data sharing. Moreover, since the first generation of Ideal ONE in 2019, Ideal Automobile has been conducting data closed-loop research and development, effectively accumulating over 1.2 billion kilometers of training data, earlier and more than the other two leading new forces.
One viewpoint put forward by He Xiaopeng, CEO of Xiaopeng Motors, is that having a lot of data does not necessarily mean being able to do well in autonomous driving. Lang Xianpeng also pointed out that besides the quantity and quality of data, the more difficult thing is the matching of data.
At the beginning of this year's dual system project development, the Ideal Intelligent Driving team discovered that the test car always wanted to change lanes when waiting for a red light. Later they found out that the cause of the problem was the deletion of data that users had been waiting for a long time in front of the red light. This data, which was originally overlooked, is the key information that enables the model to distinguish between two different waiting scenarios: waiting for red lights and traffic congestion.
In fact, the ability to quickly identify and solve this problem also lies in Ideal Auto's establishment of multiple small models such as data mining models and scene understanding models in the cloud. This complete toolchain and basic capability building are important components for data filtering and cleaning in autonomous driving.
Lang Xianpeng believes that this is like going to the hospital to see a doctor. After a problem scenario appears, there is an internal "triage table" that automatically analyzes which scenario the problem belongs to, gives the model triage suggestions, and then uses the triage suggestions to find similar scenario data, supplements it to the training sample, and proceeds to the next iteration.
In Jia Peng's opinion, in the future, most intelligent driving engineers will be doing data and model testing, and the structural design of the intermediate model itself may not require too many engineers.
With the change of business model, Ideal Automobile has adjusted its manpower configuration and organizational structure. In the traditional modular organizational structure system of autonomous driving, a large amount of manpower is required from scene design to research and development, testing, delivery, and problem modification. However, after transitioning to end-to-end, human participation in areas such as data collection, sample production, automated training, and automated iteration is significantly reduced.
After a round of expansion, the Ideal Auto intelligent driving team has laid off many people. Lang Xianpeng explained, "At that time, we wanted to expand our intelligent driving team. From a process perspective, we had to expand our presence across the country and needed more R&D engineers and testing personnel. However, even if I could invest resources to recruit these people, I still couldn't solve the problem of moving to higher capabilities
The current Ideal Automotive intelligent driving team is developing according to two main lines: RD and PD. The former is responsible for technical pre research, exploring the development direction of next-generation artificial intelligence, while the latter carries out mass production work, delivering and maintaining the current version to users.
In the eyes of the outside world, the progress of Ideal Automotive's intelligent driving has been advancing rapidly. However, since September last year, the intelligent driving team has held weekly meetings on artificial intelligence, where engineers regularly share topics related to autonomous driving, intelligent space, and other artificial intelligence with Li Xiang. The discussion about dual systems came out slowly like this.
The rapid progress of the Ideal Car has raised doubts among the public that there is no constant peace of mind for the leaders in the storyline of intelligent driving. But Lang Xianpeng pointed out that the difficulty of newcomers participating in the game is actually getting higher and higher. The competition of autonomous driving is not only about technology, but also about funding and the profitability of enterprises.
The most intuitive data is that currently, Ideal Auto only invests in rental cards for computing power, with an annual expenditure of 1 billion yuan. However, in the future, entering higher-level autonomous driving research and development, the annual training computing power expenditure will reach 1 billion US dollars. It is reported that the latest computing power reserves of Ideal Automobile and Xiaopeng Automobile's intelligent driving cloud are 4.5EFLOPS and 2.51EFLOPS, respectively.
In the past, domestic new forces were exploring Tesla's technological evolution direction and following step by step. But with Tesla no longer disclosing its technology solutions to the public, Ideal Auto provides a new methodology to break through the end-to-end fog. The above-mentioned Huawei R&D personnel told Interface News that this will help China's intelligent driving no longer follow Tesla's path and imitate it step by step.
However, whether end-to-end is the ultimate solution to higher-level autonomous driving technology, Lang Xianpeng and pioneers involved in the domestic intelligent driving craze may not be able to provide an answer.
For real car buyers, the choice of autonomous driving technology is never the focus of attention. Practical experience indicators such as safety, reliability, usability, and stability are the constant criteria for evaluating their performance.
Interface News excerpted conversations with Lang Xianpeng and Jia Peng, edited without affecting the original intention:
Standing at the edge of the no man's land
Q: What is the current ideal end-to-end+VLM intelligent driving architecture designed based on and what is its future development
Lang Xianpeng: During last year's strategic meeting, we referred to intelligent driving solutions including Tesla FSD and found that there are significant challenges in achieving the goal of autonomous driving. Whether it is end-to-end or traditional perception decision models, their approach is to provide a large amount of data, train or manually design rules based on known data to meet these scenario conditions. The potential problem is that if the system has not seen the scenario before, it cannot work well.
Based on enabling the system to correctly handle complex or unknown scenarios, we explore how to give vehicles the ability to think, make decisions, or make judgments and inferences like humans. We have adopted a dual system architecture that is similar to the thinking and cognitive processes of the human brain. We used an end-to-end model for System 1, and a VLM model for System 2. Are there any other implementation methods in the future? We are also iterating, but now it seems that this framework and experimental approach are more suitable for future autonomous driving.
Jia Peng: When we test drove the Tesla FSD V12.3 version, we found that there was a significant difference in its performance between the east and west coasts. This prompted us to think that in China, when it comes to autonomous driving and the limited computing power of the vehicle's chips, a single model may not be as effective. Our idea at that time was to add a system with true generalization and logical thinking ability on top of end-to-end, and naturally came up with VLM. Although it does not directly control the vehicle, it can provide decision-making.
In the future, with the improvement of computing power and the increase of model scale, System 1 and System 2 can achieve relatively tight coupling. It is also possible to draw on the current trend of multimodal model development to unify speech, vision, and LiDAR. This paradigm can support us to achieve L4 and may be the ultimate answer to realizing true artificial intelligence. In the future, it may truly reach unmanned areas and achieve large-scale production of autonomous driving, but we have not yet seen any company emerge.
Q: How do end-to-end and VLM systems collaborate?
Jia Peng: These two systems have been running in real-time all along. Running end-to-end together, because the model is smaller and the frame rate is higher, such as running at a frequency of over ten hertz. In addition, the VLM model has a much larger number of parameters, with 2.2 billion parameters, and can currently run up to approximately 3.4 Hz 3 to 4 Hz. VLM has always been there, but it throws the decision results and reference trajectories to System 1, and after end-to-end model inference, decides whether to use this information.
Q: Is VLM necessary now, and to what extent is it necessary here?
Lang Xianpeng: Our main supporting role in L3 is still end-to-end, which represents the driving ability of this person under normal behavior, but in L4 it must be VLM or large model, which plays a more important role. It may not work more than 90% of the time, but its effectiveness is a key factor in determining whether the system is L3 or L4, and whether it can truly cope with such unknown scenarios.
Q: How to test and validate end-to-end models to form a fixed push cycle to users?
Lang Xianpeng: A major challenge in the end-to-end era is the uncertainty of ability evaluation and testing. In addition to System 1 and System 2, which we use end-to-end and VLM to implement, there is also an experimental model called System 3. This experimental model is actually an examination system that uses the ability of the experimental model to reconstruct or generate exam questions.
We have our own real question bank for this exam, which covers the correct behavior of driving on the road. Its design is based on subjective evaluations from users, products, and the entire vehicle team, in collaboration with some of our internal experienced drivers, to establish the standards for experienced drivers. Among our 800000 car owners, everyone has given a score, and those who score above 90 are called experienced drivers, which accounts for about 3% of all our drivers.
During normal testing and driving processes, user takeover and exit are part of our error question bank. We also need to generate some simulation questions. We will determine whether each version of the model can be iterated onto the vehicle based on its exam score for further validation
Jia Peng: There is a problem with a particularly long tail, which makes it impossible to obtain such data in a real way, and there is a part of the generated work. Our world model is not purely generative. We feel that there are many illusions in a purely generative model that cannot be really used. We combine reconstruction and generation to generate a model that conforms to the laws of the world and physics.
More important than scale and quality is data matching
Q: How do you plan to collect or use more efficient methods for data collection?
Lang Xianpeng: Our car L789 looks quite similar, but the huge essential benefit here is that our data can be shared, and all camera configurations, including installation positions, on the car are generally consistent. And since the first generation of Ideal One in 2019, we have been developing data closed-loop systems. At the L789 stage, we have 800000 car owners and accumulated over 1.2 billion kilometers of effective training data, which is one of the largest in China.
Xiaopeng started doing this as early as 2021, and its models have undergone many changes, including sedans, SUVs, and MPVs, all of which have different forms. NIO started with ET7 and was previously a supplier solution. It will be later, around 2022.
Q: How to consider doing data filtering and cleaning? Currently, most of the energy is invested in the work of data. What kind of energy will it occupy?
Lang Xianpeng: We have found that training end-to-end models is not much different from ancient alchemy. How to match it to make the experience of autonomous driving better. We started our project earlier this year and found that when the model was trained and waiting for a red light, the car's behavior was quite strange, always wanting to change to the next lane. Later on, we realized that during our training, we deleted a lot of data that had been waiting before the red light. We felt that waiting for tens of seconds or even a minute was not enough
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

芊芊551 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    44