Nvidia and other giants exposed for illegally using YouTube data to train models involving 170000 videos
六月清晨搅
发表于 2024-7-17 15:00:42
214
0
0
According to media reports, some large tech companies, including Apple, NVIDIA, Salesforce, and Anthropic, have been exposed for using unauthorized data from Google's video website YouTube to train their AI models. These companies used a dataset provided by a third party, which contained a large amount of video subtitle text crawled from YouTube, violating YouTube's ban on unauthorized content crawling from the platform. The report points out that these tech companies all use a dataset called "YouTube Subtitles" when training their AI models, which is 5.7GB in size and contains 489 million words from 173500 videos across over 48000 channels on YouTube. This dataset consists of pure text for video subtitles, including parts uploaded by video bloggers and automatically transcribed text from YouTube. In addition to English, it usually comes with translations for languages such as Japanese, German, and Arabic.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
猜你喜欢
- Can Broadcom's customized AI chip challenge Nvidia with a market value exceeding trillions of dollars?
- Nvidia's US stock fell more than 2% before trading
- Nvidia's stock price fell 2.1% in pre-market trading and is expected to decline for four consecutive trading days
- Who will dominate the new landscape of AI chips between Broadcom and Nvidia?
- Who is the biggest buyer of Nvidia AI chips? This tech giant is dominating the rankings ahead of its peers
- AI agents are on fire! US AI application giant doubles sales team expansion
- Nvidia's US stock rose over 2% in pre-market trading
- Research institution: Microsoft will purchase far more Nvidia AI chips than its competitors in 2024
- Nvidia's stock price rose 2.5% in pre-market trading and is expected to end its four consecutive declines
- Nvidia reportedly has preliminarily finalized the GB300 order configuration
-
隔夜株式市場 世界の主要指数は金曜日に多くが下落し、最新のインフレデータが減速の兆しを示したおかげで、米株3大指数は大幅に回復し、いずれも1%超上昇した。 金曜日に発表されたデータによると、米国の11月のPC ...
- SNT
- 前天 12:48
- 支持
- 反对
- 回复
- 收藏
-
長年にわたって、昔の消金大手の捷信消金の再編がようやく地に着いた。 天津銀行の発表によると、同行は京東傘下の2社、対外貿易信託などと捷信消金再編に参加する。再編が完了すると、京東の持ち株比率は65%に達し ...
- SNT
- 前天 12:09
- 支持
- 反对
- 回复
- 收藏
-
【GPT-5屋台で大きな問題:数億ドルを燃やした後、OpenAIは牛が吹くのが早いことを発見した】OpenAIのGPT-5プロジェクト(Orion)はすでに18カ月を超える準備をしており、関係者によると、このプロジェクトは現在進 ...
- SNT
- 3 小时前
- 支持
- 反对
- 回复
- 收藏
-
【ビットコインが飛び込む!32万人超の爆倉】データによると、過去24時間で世界には32万7000人以上の爆倉があり、爆倉の総額は10億ドルを超えた。
- 断翅小蝶腥
- 3 天前
- 支持
- 反对
- 回复
- 收藏