ظهور XDOF برای حل گلوگاه بحرانی داده‌ها در هوش مصنوعی فیزیکی

📅2 hours ago⏱3 min read

In this article

XDOF Emerges to Solve the Critical Data Bottleneck in Physical AI

As the race for physical intelligence heats up with OpenAI relaunching its robotics program, a new challenge has surfaced: the lack of high-fidelity training data. While Large Language Models (LLMs) thrived on the vast expanse of the public internet, robotics requires precise, physical interaction data that current datasets simply cannot provide.

The Data Gap: Why LLMs Won't Solve Robotics

The primary hurdle in developing capable robots isn't just compute or model architecture; it is the absence of a "data moat" comparable to the text used for GPT models. Current alternatives, such as YouTube videos or low-fidelity footage captured by gig workers, are difficult to reconcile with the complex physical realities of robotic movement. This "chicken-and-egg" problem—needing data to train models, but needing models to collect efficient data—has become the primary bottleneck for the industry.

XDOF, a startup emerging from stealth, is positioning itself as the infrastructure layer to solve this. Having raised $70 million from heavyweights including Thrive Capital, Spark Capital, a16z, Lux, and WndrCo, the company is building the pipelines, collection tools, and annotation systems that frontier AI labs are struggling to build in-house.

Building the ABC Dataset and the Data Pyramid

To jumpstart the ecosystem, XDOF is partnering with UC Berkeley’s AI Research lab to release "ABC," a massive collection of high-quality robot training data. This dataset includes:

130,000 trajectories of robot manipulation data.
300 hours of simulation data.
100 hours of evaluations.

Using this data, teams have already successfully trained robots on granular tasks such as folding T-shirts, flattening boxes, and performing delicate operations like loading AirPods into their cases.

XDOF’s strategy follows a three-tier "data pyramid" to ensure comprehensive learning. The most valuable tier involves teleoperation data collected directly on the target robot. This is followed by general data gathered via devices like GELLO (a low-cost teleoperation system developed by XDOF co-founders Philippe Wu and Fred Shentu). The final tier involves "egocentric" data, where humans perform everyday tasks while wearing XDOF’s proprietary sensors to capture first-person physical movement.

پیشی گرفتن از آزمایشگاه‌های پیشرو از نظر مقیاس

یک پرسش حیاتی برای سرمایه‌گذاران این است که چرا آزمایشگاه‌های بزرگ هوش مصنوعی به‌سادگی این کارخانه‌های داده را خودشان نمی‌سازند. به گفته فیلیپ وو (Philippe Wu)، مدیرعامل شرکت، پیچیدگی عملیاتی بسیار عظیم است. مدیریت یک عملیات جمع‌آوری داده نیازمند صدها هزار فوت مربع فضای انبار، صدها ربات کالیبره شده و نیروی کار عظیم و آموزش‌دیده از اپراتورهای از راه دور (teleoperators) است.

XDOF با تخصص یافتن در این کارهای «غیرجذاب» — از جمله پاکسازی داده‌ها و کالیبراسیون مختص سخت‌افزار — به آزمایشگاه‌های هوش مصنوعی اجازه می‌دهد تا بر معماری مدل تمرکز کنند، در حالی که XDOF بار لجستیکی عظیم تولید داده‌های فیزیکی را مدیریت می‌کند. نام این شرکت که بازی با کلمات «درجات آزادی» (degrees of freedom) است، نشان‌دهنده هدف آن در ارائه داده برای هر نوع پیچیدگی دلخواه از حرکت است؛ از هفت درجه آزادی بازوی انسان گرفته تا ۳۰ درجه آزادی یک ربات انسان‌نما.

نکات کلیدی

اولویت زیرساخت بر مدل‌ها: XDOF با ارائه خطوط لوله داده تخصصی و ابزارهای برچسب‌گذاری (annotation) که آزمایشگاه‌های متمرکز بر LLM فاقد آن هستند، در حال رفع گلوگاه «هوش مصنوعی فیزیکی» است.
مجموعه‌داده‌های با دقت بالا (High-Fidelity): انتشار مجموعه داده ABC، مقیاسی بی‌سابقه را برای این صنعت فراهم می‌کند که شامل ۱۳۰,۰۰۰ مسیر دست‌کاری (manipulation trajectories) است.
برون‌سپاری عملیاتی: XDOF آزمایشگاه‌های پیشرو را قادر می‌سازد تا از نیازهای سرمایه‌ای و لجستیکی عظیم برای مدیریت انبارهای داده فیزیکی در مقیاس بزرگ و ناوگان اپراتورهای از راه دور بی‌نیاز شوند.

ظهور XDOF برای حل گلوگاه بحرانی داده‌ها در هوش مصنوعی فیزیکی

XDOF Emerges to Solve the Critical Data Bottleneck in Physical AI

The Data Gap: Why LLMs Won't Solve Robotics

Building the ABC Dataset and the Data Pyramid

پیشی گرفتن از آزمایشگاه‌های پیشرو از نظر مقیاس

نکات کلیدی

Continue reading

چگونه انعطاف‌پذیری هوش مصنوعی می‌تواند بحران جهانی برق مراکز داده را حل کند

دروازه هوش مصنوعی: سیستم عصبی مرکزی برای LLMهای سازمانی

شکاف آمادگی داده‌های هوش مصنوعی

شکاف زیرساخت هوش مصنوعی: هایپراسکالرها با بحران جریان نقدی مواجه هستند

پرامانا لبز (Pramaana Labs) برای حل چالش قابلیت اطمینان هوش مصنوعی از طریق تأیید رسمی، ۲۷ میلیون دلار سرمایه جذب کرد