Subquadratic Yadai Mafanikio Makubwa katika Kutatua Kikwazo cha Quadratic cha LLM
Sekta ya AI imechangamka kutokana na kampuni changa ya Subquadratic inayopatikana Miami, ambayo inadai kutatua kikwazo cha kihisabati kilichokuwa kikizizuia Large Language Models (LLMs) kwa karibu muongo mmoja. Ingawa mashaka ya awali yalikuwa makubwa, uhakiki wa hivi karibuni wa huria unaonyesha kuwa usanifu wao mpya wa "SubQ" unaweza kubadilisha kabisa mfumo wa generative AI.
Tatizo: Gharama ya Quadratic ya Dense Attention
Ili kuelewa umuhimu wa dai la Subquadratic, ni lazima uelewe usanifu wa "Transformer" ulioanzishwa na Google mwaka 2017. LLM nyingi za kisasa zinategemea mfumo unaitwa dense attention. Katika mchakato huu, kila token (neno au sehemu ya neno) katika mfuatano huzidishwa na kila token nyingine ili kunasa muktadha.
Hii inatengeneza mzigo mkubwa wa kimitambo unaojulikana kama quadratic expansion. Ukiongeza urefu wa maandishi mara mbili, mahitaji ya kimitambo huongezeka mara nne hivi. Kwa hati yenye maneno 10,000, modeli lazima ifanye karibu mamilioni 50 ya kuzidisha mmoja mmoja. Kutofanya kazi kwa ufanisi huku ndiyo sababu kuu inayofanya LLM zijulikane kama "vimelea vya nguvu" (power hogs), zikihitaji nishati kubwa na vifaa vya gharama kubwa ili kuchakata muktadha mrefu.
Suluhisho: Kupanuka kwa kutumia Sparse Attention
Modeli ya SubQ ya Subquadratic inalenga kuacha dense attention na badala yake kutumia sparse attention. Falsafa ya msingi ni kwamba si kila uhusiano kati ya maneno ni muhimu katika kuelewa hati. Badala ya kuzidisha kila token na kila token nyingine, sparse attention huchagua tu mahusiano muhimu zaidi ya kufanyia kazi.
Ingawa "sparse attention" si dhana mpya, majaribio ya awali yamekuwa yakihangaika kudumisha kiwango cha juu cha uwezo wa kufikiri na upekee unaopatikana katika modeli za dense-attention. Subquadratic inadai kuwa imeziba pengo hili, ikitengeneza modeli inayotoa ufanisi wa sparse attention bila kupoteza akili kama ilivyokuwa desturi.
Kuthibitisha Madai: Matokeo kutoka Appen
Kufuatia mashaka ya awali—ambapo baadhi ya wakosoaji hata walilinganisha madai hayo ambayo hayajathibitishwa na "AI Theranos"—Subquadratic imetoa vipimo vya upande wa tatu kutoka Appen, kampuni inayoongoza katika tathmini ya AI. Matokeo ya majaribio huria ya Appen yamethibitisha usanifu wa SubQ, yakielezea matokeo hayo kuwa "ya kushangaza" na mabadiliko makubwa yanayoweza kuja ("game changer").
According to the startup, SubQ offers several transformative technical advantages:
- Context Window: SubQ can process up to 12 times more text at once compared to most current models, making it ideal for analyzing entire codebases or massive document libraries.
- Performance: Despite the leaner architecture, SubQ matches the performance of industry leaders like OpenAI, Google DeepMind, and Anthropic on critical tasks such as coding.
- Efficiency: The model is significantly faster, cheaper, and more energy-efficient than existing transformer-based models.
A New Era Beyond Transformers?
Subquadratic is not just looking to optimize current models; they are looking to replace the foundational architecture of the industry. CEO Justin Dangel has stated that the company believes the era of building on Transformers may be coming to an end. If SubQ can continue to prove its efficacy at scale, the transition from dense to sparse attention could represent the most significant shift in AI architecture since the invention of the Transformer itself.
Key Takeaways
- Breaking the Quadratic Barrier: SubQ uses sparse attention to avoid the exponential increase in computation required by traditional dense attention.
- Superior Context Handling: The model can process 12x more data at once, enabling deep analysis of large-scale datasets and long-form code.
- Verified Efficiency: Independent testing by Appen confirms that SubQ achieves high-tier performance (matching OpenAI and Google) at a fraction of the cost and energy.