XDOF 问世，旨在解决具身智能中的关键数据瓶颈

📅4 hours ago⏱3 min read

In this article

XDOF Emerges to Solve the Critical Data Bottleneck in Physical AI

As the race for physical intelligence heats up with OpenAI relaunching its robotics program, a new challenge has surfaced: the lack of high-fidelity training data. While Large Language Models (LLMs) thrived on the vast expanse of the public internet, robotics requires precise, physical interaction data that current datasets simply cannot provide.

The Data Gap: Why LLMs Won't Solve Robotics

The primary hurdle in developing capable robots isn't just compute or model architecture; it is the absence of a "data moat" comparable to the text used for GPT models. Current alternatives, such as YouTube videos or low-fidelity footage captured by gig workers, are difficult to reconcile with the complex physical realities of robotic movement. This "chicken-and-egg" problem—needing data to train models, but needing models to collect efficient data—has become the primary bottleneck for the industry.

XDOF, a startup emerging from stealth, is positioning itself as the infrastructure layer to solve this. Having raised $70 million from heavyweights including Thrive Capital, Spark Capital, a16z, Lux, and WndrCo, the company is building the pipelines, collection tools, and annotation systems that frontier AI labs are struggling to build in-house.

Building the ABC Dataset and the Data Pyramid

To jumpstart the ecosystem, XDOF is partnering with UC Berkeley’s AI Research lab to release "ABC," a massive collection of high-quality robot training data. This dataset includes:

130,000 trajectories of robot manipulation data.
300 hours of simulation data.
100 hours of evaluations.

Using this data, teams have already successfully trained robots on granular tasks such as folding T-shirts, flattening boxes, and performing delicate operations like loading AirPods into their cases.

XDOF’s strategy follows a three-tier "data pyramid" to ensure comprehensive learning. The most valuable tier involves teleoperation data collected directly on the target robot. This is followed by general data gathered via devices like GELLO (a low-cost teleoperation system developed by XDOF co-founders Philippe Wu and Fred Shentu). The final tier involves "egocentric" data, where humans perform everyday tasks while wearing XDOF’s proprietary sensors to capture first-person physical movement.

规模化超越前沿实验室

对于投资者来说，一个关键问题是，为什么主要的 AI 实验室不直接自己建造这些数据工厂。据 CEO Philippe Wu 称，其运营复杂度是巨大的。运行数据采集业务需要数十万平方英尺的仓库空间、数百台经过校准的机器人，以及一支规模庞大且经过培训的远程操作员团队。

通过专注于这些“不那么光鲜亮丽”的工作——包括数据清洗和针对特定硬件的校准——XDOF 让 AI 实验室能够专注于模型架构，而由 XDOF 来管理物理数据生产中巨大的物流负担。公司的名称巧妙运用了“自由度”（degrees of freedom）一词，体现了其目标：为任何复杂程度的运动提供数据，从人臂的 7 个自由度到类人机器人的 30 个自由度。

核心要点

基础设施重于模型： XDOF 通过提供以 LLM 为中心的实验室所缺乏的专业数据流水线和标注工具，正在解决“物理 AI”的瓶颈。
高保真数据集： ABC 数据集的发布为行业提供了前所未有的规模，包含 130,000 条操作轨迹。
运营外包： XDOF 使前沿实验室能够免除管理大规模物理数据仓库和远程操作设备群所带来的巨大资金和物流需求。

XDOF 问世，旨在解决具身智能中的关键数据瓶颈

XDOF Emerges to Solve the Critical Data Bottleneck in Physical AI

The Data Gap: Why LLMs Won't Solve Robotics

Building the ABC Dataset and the Data Pyramid

规模化超越前沿实验室

核心要点

Continue reading

AI 灵活性如何解决全球数据中心电力紧缺问题

AI 网关：企业级大语言模型 (LLM) 的中央神经系统

AI 数据就绪度差距

AI 基础设施缺口：超大规模云计算厂商面临现金流危机

Pramaana Labs 获得 2700 万美元融资，利用形式化验证解决 AI 可靠性问题