Watafiti wa Nvidia Wanawezesha Roboti Kujifunza Zenyewe kwa Kutumia Wakala wa Uandishi wa Kodi wa AI

Kikwazo cha ukusanyaji wa data kwa mkono na uingiliaji wa mara kwa mara wa binadamu katika roboti hatimaye kinatatuliwa. Kwa kutumia wakala wa uandishi wa kodi wa AI, watafiti wameunda mfumo ambapo roboti zinaweza kuandika kodi zao za mafunzo zenyewe kwa uhuru na kuboresha ustadi wao katika mazingira ya ulimwengu halisi.

Kuvunja Kikwazo cha Kazi za Mkono kwa Kutumia ENPIRE

Kwa kawaida, kuifundisha roboti kazi ngumu kama vile kushika vitu kwa ustadi kunahitaji wahandisi wa binadamu kuandaa upya mazingira, kukusanya seti za data, na kurekebisha algoriti kwa mkono. Mchakato huu unaotumia nguvu nyingi unaunda kikwazo kikubwa katika kukuza akili ya roboti. Ili kutatua hili, watafiti kutoka Nvidia, Chuo Kikuu cha Carnegie Mellon, na UC Berkeley walianzisha ENPIRE, mfumo unaobadilisha mchakato wa mafunzo kuwa mzunguko wa mrejesho unaojiendesha wenyewe.

Badala ya kusubiri maelekezo ya binadamu, mfumo wa ENPIRE unatumia wakala wa uandishi wa kodi wa AI kusimamia mzunguko mzima wa maisha: kuandaa upya eneo la kazi, kutekeleza mkakati wa mwendo, kutathmini matokeo, na kurudia kodi mara moja ili kuboresha utendaji. Hii inahamisha roboti kutoka kwenye hali ya "human-in-the-loop" kwenda kwenye "agent-in-the-loop."

Jinsi Wakala wa Uandishi wa Kodi wa Uhuru Unavyochochea Ustadi

Mfumo wa ENPIRE unafanya kazi katika awamu mbili tofauti. Katika awamu ya kwanza, wakala huandaa eneo la kazi kwa kutumia mwongozo mdogo wa binadamu—mara nyingi ni dakika chache tu za video zinazoonyesha majaribio yaliyofanikiwa na yaliyoshindwa. Jambo la muhimu ni kwamba, wakala huandika kazi zake za zawadi (reward functions) zenyewe. Kwa mfano, wakati wa kazi za kuingiza pini, wakala alitengeneza ukaguzi maalum unaounganisha upatanishi wa kuona, urefu wa kishikio (gripper), na nguvu inayokadiriwa ili kuamua mafanikio.

Katika awamu ya pili, wakala hufanya kazi kwa uhuru kamili. Wanasoma makala za utafiti, wanatengeneza nadharia, na kuhariri kodi za mafunzo moja kwa moja. Wanaweza kuchagua kati ya mbinu kama vile behavior cloning (kuiga mwendo wa binadamu) au reinforcement learning (jaribio na makosa) kulingana na mbinu inayotoa ishara bora zaidi za ulimwengu halisi. Wakati wa majaribio, watafiti walitumia mifano yenye utendaji wa juu ikiwa ni pamoja na Codex (yenye GPT-5.5), Claude Code (yenye Opus 4.7), na Kimi Code (yenye Kimi K2.6), huku Codex ikionekana kuwa bora zaidi.

Kukuza kwa Kutumia Jeshi la Roboti linalowezeshwa na Git

One of the most innovative aspects of this research is the coordination of a fleet of eight dual-arm YAM robot stations. Rather than working in isolation, these stations act as a distributed research team. They share their findings, successful "recipes," and failed hypotheses using Git, the standard version control tool used in software engineering.

This fleet-based approach yields massive temporal gains:

The Reality Gap: Simulation vs. Hardware

Despite these breakthroughs, the research highlights the "sim-to-real" gap. While all three tested agents solved the Push-T test in simulation, two out of three failed when transitioned to physical hardware due to unpredictable variables like friction and robot dynamics. However, ENPIRE demonstrated superior performance in the RoboCasa simulation compared to established models like GR00T.

As the industry moves toward general-purpose robotics, the ability for machines to "self-research" through code will be the key to moving beyond narrow, pre-programmed motions toward true, adaptable intelligence.

Key Takeaways