Latest research results from NVIDIA! Opening up more possibilities for robot navigation without the need for maps for real-time perception
楚一帆
发表于 昨天 17:15
119
0
0
Recently, researchers from the University of California and NVIDIA jointly released a new visual language model called "NaVILA". The highlight is that the NaVILA model provides a new solution for robot navigation.
Related papers on the NaVILA model
Visual Language Model (VLM) is a multimodal generative AI model that can infer text, image, and video prompts. It combines Large Language Models (LLMs) with visual encoders to give LLMs the ability to 'see'.
Traditional robot actions often rely on pre drawn maps and complex sensor systems. The NaVILA model does not require a pre map, and robots only need to "understand" human natural language instructions, combined with real-time visual images and LiDAR information, to perceive paths, obstacles, and dynamic targets in the environment in real time, and can autonomously navigate to designated locations.
Not only does NaVILA break away from its dependence on maps, but it also further extends navigation technology from wheeled to legged robots, hoping to enable robots to cope with more complex scenarios and have the ability to overcome obstacles and adaptive path planning.
In the paper, researchers from the University of California conducted experiments using the Yushu Go2 robotic dog and G1 humanoid robot. According to the team's actual test results, NaVILA has a navigation success rate of up to 88% in real environments such as home, outdoor, and work areas, and a success rate of 75% in complex tasks.
Go2 robot dog accepts action command: turn left a little and walk towards the portrait poster, you will see an open door
G1 humanoid robot receives action command: immediately turn left and go straight, step on the mat and continue moving forward until it approaches the trash can and stops
It is reported that the characteristics of the NaVILA model are:
Optimizing accuracy and efficiency: The NVILA model has reduced training costs by 4.5 times and memory requirements for fine-tuning by 3.4 times. Almost doubled the delay in pre filling and decoding (compared to another large visual model LLaVa OneVision).
High resolution input: The NVILA model does not optimize input by reducing the size of photos and videos, but instead uses multiple frames from high-resolution images and videos to ensure that no details are lost.
Compression technology: Nvidia pointed out that the cost of training visual language models is very high, and fine-tuning such models is also very memory intensive. A 7B parameter model requires over 64GB of GPU memory. Therefore, Nvidia has adopted a technology called "expand first, compress later", which reduces the size of input data by compressing visual information into fewer tokens and grouping pixels to preserve important information, balancing the accuracy and efficiency of the model.
Multimodal reasoning ability: The NVILA model is capable of answering multiple queries based on a single image or video, demonstrating strong multimodal reasoning capabilities.
In video benchmark testing, NVILA outperforms GPT-4o Mini and also performs well compared to GPT-4o, Sonnet 3.5, and Gemini 1.5 Pro. NVILA still achieved a slight victory in comparison to Llama 3.2.
Nvidia stated that it has not yet released the model on the Hugging Face platform and has promised to release the code and model soon to promote its reproducibility.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- before dawn! NVIDIA, Important Release!
- Latest statement from Nvidia executives: China is an important market
- State Administration for Market Regulation: Investigating Nvidia in accordance with the law
- Nvidia is suspected of violating the Anti Monopoly Law. The State Administration for Market Regulation has decided to initiate an investigation in accordance with the law
- Nvidia is suspected of violating the Anti Monopoly Law. The State Administration for Market Regulation has decided to initiate an investigation in accordance with the law
- Nvidia is suspected of violating the Anti Monopoly Law. The State Administration for Market Regulation has decided to initiate an investigation in accordance with the law
- Nvidia under investigation for alleged violation of antitrust laws
- Nvidia is suspected of violating the Anti Monopoly Law. The State Administration for Market Regulation has decided to initiate an investigation in accordance with the law
- Nvidia responds to anti-monopoly investigation, market value evaporates overnight by over 600 billion yuan
- How do you view the initiation of an investigation into Nvidia?
-
長年低調だった馬雲は最近頻繁に現れている。 12月8日、馬雲は蟻園区に突然登場し、支付宝と蟻グループの20周年イベントで「今日は、蟻の過去20年のために来たのではなく、蟻の未来の20年のために来た」と挨拶した ...
- msa015
- 3 天前
- 支持
- 反对
- 回复
- 收藏
-
10月末に2800ドルのマイルストーンを突破した後、国際金価格は短い調整を経た。 11日のニューヨーク商品取引所で来年2月に引き渡されたCOMEX金先物は1.5%近く上昇し、2750ドルの関門を再び奪還し、最新のインフレデ ...
- 什么大师特
- 3 小时前
- 支持
- 反对
- 回复
- 收藏
-
米大統領選後の株式市場の上昇は年末まで続いており、ウォール街の大物たちが叫んだ来年の目標価格も年々上昇しているが、上昇を追う際にはすべての慎重さを捨ててはならないと警告するアナリストも少なくない。 一 ...
- SOHU
- 昨天 11:47
- 支持
- 反对
- 回复
- 收藏
-
12月10日夜、米株が取引を開始し、市場の注目はグーグルに集中し、終値までにグーグルA(GOOGL)は5.59%上昇し、185.17ドルだった。その時価総額は一夜にして1120億ドル(約8120億元)も大幅に増加した。 情報面で ...
- 内托体头
- 昨天 11:15
- 支持
- 反对
- 回复
- 收藏