ð ð¶ð ððð¿ð² ðŒð³ ðð ðœð²ð¿ðð (ð ðŒð): ððŒð ðð ðªðŒð¿ðžð ð®ð»ð± ðªðµð²ð» ððŒ ðšðð² ðð
4æã®GPUã远å ã§è³Œå ¥ããããšãªãã7Bã¢ãã«ãã70Bã¢ãã«ãžãšã¹ã±ãŒã«ã¢ããããããšèããŠãããšããŸãããã
ããã§èª°ããMixture of Experts (MoE)ãææ¡ããŸãã圌ãã¯ãããã7Bã®èšç®éã§70Bçžåœã®ããã©ãŒãã³ã¹ãåŸããããšäž»åŒµããŸãã
ãŸãã§ããªãŒã©ã³ãïŒã¿ã飯ïŒã®ããã«èãããŸãããããã«ã¯èœãšã穎ããããŸãã
ã©ã®ãããªä»çµã¿ãªã®ã§ããããïŒ
Llama 3.2ã®ãããªDenseïŒå¯ïŒãªTransformerã¯ããã¹ãŠã®ããŒã¯ã³ã«å¯ŸããŠãã©ã¡ãŒã¿ã®100%ã䜿çšããŸãã7Bãã70Bã«ã¹ã±ãŒã«ã¢ããããå Žåãã¡ã¢ãªãšèšç®éã®äž¡æ¹ã10åã«ãªããŸãã
MoEã¯ãã®2ã€ãåãé¢ããŸããã¢ãã«ã¯ããå€ãã®ãã©ã¡ãŒã¿ãä¿æããŸããïŒã¡ã¢ãªã³ã¹ãã¯å¢å ïŒãåããŒã¯ã³ã«å¯ŸããŠã¯ãããã®ããäžéšã®ã¿ã䜿çšããŸãïŒèšç®ã³ã¹ãã¯æå¶ïŒã
ãã¬ãŒããªã:
⢠Dense 7B: ç·ãã©ã¡ãŒã¿æ° 7B | ã¢ã¯ãã£ã 7B | èšç®é 7B | ã¡ã¢ãª 14 GB ⢠Dense 70B: ç·ãã©ã¡ãŒã¿æ° 70B | ã¢ã¯ãã£ã 70B | èšç®é 70B | ã¡ã¢ãª 140 GB ⢠MoE 45B: ç·ãã©ã¡ãŒã¿æ° 45B | ã¢ã¯ãã£ã ~13B | èšç®é ~14B | ã¡ã¢ãª ~90 GB
èœãšã穎: äŸç¶ãšããŠå€§èŠæš¡ã¢ãã«ã®ã¡ã¢ãªã³ã¹ããããããŸããMixtralãåäžã®24 GB GPUã§åããããšã¯ã§ããŸããã䜿çšãããŠããªããšãã¹ããŒããå«ãããã¹ãŠã®ãšãã¹ããŒããä¿æã§ããååãªVRAMãå¿ èŠã«ãªããŸãã
ã¢ãŒããã¯ãã£:
SparseïŒçïŒãªMoEã§ã¯ãæšæºçãªFeed-Forward Network (FFN) ããè€æ°ã®ããšãã¹ããŒããFFNãšåŠç¿å¯èœãªã«ãŒã¿ãŒã«çœ®ãæããããŸãã
- ã«ãŒã¿ãŒãããŒã¯ã³ãåãåããŸãã
- åãšãã¹ããŒãã«ã¹ã³ã¢ãå²ãåœãŠãŸãã
- äžäœkåã®ãšãã¹ããŒããéžæããŸãïŒMixtralã®å Žå㯠k=2ïŒã
- ãã®ããŒã¯ã³ããéžæããããšãã¹ããŒãã®ã¿ã«éããŸãã
- çµæãçµ±åããŸãã
ã«ãŒã¿ãŒã¯æåã®ã¹ã±ãžã¥ãŒã©ãŒã§ã¯ãããŸãããåŠç¿ãããã¬ã€ã€ãŒã§ããæ°åŠã®ããŒã¯ã³ã¯ãããšãã¹ããŒãã«ãã³ãŒãã®ããŒã¯ã³ã¯å¥ã®ãšãã¹ããŒãã«éãããã«åŠç¿ããŸãã
åŠç¿ã«ããã課é¡:
æå€§ã®æžå¿µã¯ãã«ãŒã¿ãŒã®åŽ©å£ïŒrouter collapseïŒãã§ããé©åãªå¯Ÿçãè¬ããªããšãã«ãŒã¿ãŒããã¹ãŠã®ããŒã¯ã³ãåã2ã€ã®ãšãã¹ããŒãã«éã£ãŠããŸãå¯èœæ§ããããŸãããããšããã®ãšãã¹ããŒãã¯ããã«æŽç·Žãããã«ãŒã¿ãŒã¯ããã«å€ãã®ãã©ãã£ãã¯ããããã«éãããã«ãªããŸããçµæãšããŠãä»ã®ãšãã¹ããŒãã¯åœ¹ã«ç«ããªããªã£ãŠããŸããŸãã
ãšã³ãžãã¢ã¯ããã解決ããããã«ãè£å©çãªè² è·åæ£æå€±ïŒauxiliary load-balancing lossïŒã䜿çšããŸããããã¯ããã¹ãŠã®ãšãã¹ããŒããåçã«äœ¿çšããŠããªãå Žåã«ã¢ãã«ã«ããã«ãã£ãäžããŸãã
MoEãé¿ããã¹ãã±ãŒã¹:
⢠äžè²«ããã¬ã€ãã³ã·ãå¿ èŠãªå Žå: MoEã¯å¿çæéã®ã°ãã€ãã倧ãããªããŸãã ⢠VRAMãéãããŠããå Žå: 48 GBæªæºã®GPUã1æããæã£ãŠããªãå Žåã¯ãDenseã¢ãã«ã䜿çšããŠãã ããã ⢠å°èŠæš¡ãªã¢ãã«ãæ§ç¯ããå Žå: ãã©ã¡ãŒã¿æ°ã3Bæªæºã®ã¢ãã«ã§ã¯ããªãŒããŒãããã倧ããããŸãã ⢠ã·ã³ãã«ãªã€ã³ãã©ãå¿ èŠãšããå Žå: MoEã«ã¯è€éãªãšãã¹ããŒã䞊ååïŒexpert parallelismïŒãã«ã¹ã¿ã ã«ãŒãã«ãå¿ èŠã§ãã
MoEã¯ãããŒã¹ã©ã€ã³ãšãªãDenseã¢ãã«ã30Bãã©ã¡ãŒã¿ä»¥äžã§ãããããããµããŒãã§ããã¡ã¢ãªãããå Žåã«æé©ã§ãã
Mixture of Experts (MoE): å éšã®ä»çµã¿ãšãã©ã®ãããªæã«ãã®ç䟡ãçºæ®ãããã®ã
å€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒã®é²åã«ãããŠãã¢ãã«ã®ãã©ã¡ãŒã¿æ°ãå¢ããããšã¯ãããé«ãèœåãåŸãããã®æšæºçãªæ¹æ³ãšãªã£ãŠããŸãããããããã©ã¡ãŒã¿æ°ãå¢ããã«ã€ããŠãèšç®ã³ã¹ãïŒFLOPsïŒãæ¯äŸããŠå¢å€§ãããšããåé¡ããããŸãã
ããã§ç»å Žããã®ã Mixture of Experts (MoE) ã§ããMoEã¯ãã¢ãã«ã®ãã©ã¡ãŒã¿æ°ãåçã«å¢ãããªãããæšè«æã®èšç®ã³ã¹ããäœãæããããã®ã¢ãŒããã¯ãã£ã§ãã
ãã®èšäºã§ã¯ãMoEãå éšã§ã©ã®ããã«åäœããŠããã®ãããããŠã©ã®ãããªç¶æ³ã§ãã®æ©æµãåããããã®ãã詳ãã解説ããŸãã
Mixture of Experts (MoE) ãšã¯äœãïŒ
åŸæ¥ã®ããã³ã¹ïŒDenseïŒãã¢ãã«ã§ã¯ãå ¥åããããã¹ãŠã®ããŒã¯ã³ã«å¯ŸããŠãã¢ãã«å ã®ãã¹ãŠã®ãã©ã¡ãŒã¿ãèšç®ã«äœ¿çšãããŸããã€ãŸããã¢ãã«ã倧ãããªãã°ãªãã»ã©ã1ã€ã®åèªãåŠçããããã«å¿ èŠãªèšç®éãå¢ããŸãã
äžæ¹ãMixture of Experts (MoE) ã¯ãã¹ããŒã¹ïŒSparseïŒããªã¢ãã«ã§ããMoEã¢ãã«ã¯ãã¢ãã«å šäœãããã€ãã®å°ããªãããã¯ãŒã¯ãããªãã¡ããšãã¹ããŒãïŒExpertsïŒãã«åå²ããŸããå ¥åãããåããŒã¯ã³ã«å¯ŸããŠããã¹ãŠã®ãšãã¹ããŒããåãããã§ã¯ãªãã**ã²ãŒããããã¯ãŒã¯ïŒGating NetworkïŒ**ã«ãã£ãŠéžæããããããäžéšã®ãšãã¹ããŒãã®ã¿ãèšç®ãè¡ããŸãã
å éšã®ä»çµã¿ïŒã²ãŒããããã¯ãŒã¯ãšãšãã¹ããŒã
MoEã®ä»çµã¿ã¯ãäž»ã«2ã€ã®ã³ã³ããŒãã³ãã§æ§æãããŠããŸãã
1. ãšãã¹ããŒã (Experts)
ãšãã¹ããŒãã¯ãéåžžã®ãã£ãŒããã©ã¯ãŒãã»ãããã¯ãŒã¯ïŒFFNïŒã«çžåœããå°ããªãããã¯ãŒã¯ã®éåã§ããåãšãã¹ããŒãã¯ãç¹å®ã®çš®é¡ã®ãã¿ãŒã³ãç¥èã«ç¹åããããã«åŠç¿ãé²ã¿ãŸãïŒäŸïŒãããšãã¹ããŒãã¯ææ³ã«åŒ·ããå¥ã®ãšãã¹ããŒãã¯æ°åŠçãªæŠå¿µã«åŒ·ãããšãã£ãå ·åã§ãïŒã
2. ã²ãŒããããã¯ãŒã¯ (Gating Network / Router)
ã²ãŒããããã¯ãŒã¯ïŒãŸãã¯ã«ãŒã¿ãŒïŒã¯ãå ¥åãããããŒã¯ã³ãã©ã®æ¹çïŒãšãã¹ããŒãïŒã«éãããæ±ºå®ãããäº€éæŽç圹ãã§ãã
ããã»ã¹ã¯ä»¥äžã®éãã§ãïŒ
- ããŒã¯ã³ãå ¥åãããã
- ã²ãŒããããã¯ãŒã¯ããã®ããŒã¯ã³ã®ç¹åŸŽãåæããã
- ã²ãŒããããã¯ãŒã¯ã¯ããã®ããŒã¯ã³ãåŠçããã®ã«æãé©ããäžäœ $k$ åã®ãšãã¹ããŒãïŒéåžž $k=1$ ãŸã㯠$2$ïŒãéžæããã
- éžæããããšãã¹ããŒãã®ã¿ãèšç®ãå®è¡ãããã®çµæãçµ±åãããã
ãã®ãå¿ èŠãªéšåã ããåãããä»çµã¿ãã**ã¹ããŒã¹ãªæŽ»æ§åïŒSparse ActivationïŒ**ãšåŒã°ããŸãã
ãªã MoE ãéèŠãªã®ãïŒïŒã¡ãªããïŒ
1. èšç®å¹çã®åäž
MoEã®æå€§ã®å©ç¹ã¯ãããã©ã¡ãŒã¿æ°ããšãèšç®ã³ã¹ãããåãé¢ããããšã§ãã äŸãã°ã1å ãã©ã¡ãŒã¿ãæã€MoEã¢ãã«ã§ãã£ãŠãã1ã€ã®ããŒã¯ã³ãåŠçããéã«å®éã«åãã®ã¯ãã®ãã¡ã®æ°ïŒ ïŒäŸïŒ100Bãã©ã¡ãŒã¿åïŒã ãã§ããã°ãèšç®ã³ã¹ãã¯100Bãã©ã¡ãŒã¿ã®ãã³ã¹ã¢ãã«ãšåçšåºŠã«æããããŸããããã«ããã巚倧ãªç¥è容éãæã¡ãªãããé«éãªæšè«ãå¯èœã«ãªããŸãã
2. ã¹ã±ãŒãªã³ã°ã®å®¹æã
MoEã䜿çšãããšãèšç®ãªãœãŒã¹ãççºçã«å¢ããããšãªããã¢ãã«ã®å®¹éïŒç¥èéïŒãæ¡å€§ã§ããŸããããã¯ãéãããèšç®äºç®ã®äžã§ããã髿§èœãªã¢ãã«ãæ§ç¯ãããå Žåã«éåžžã«æå©ã§ãã
MoE ã®èª²é¡ãšãã¬ãŒããªã
MoEã¯éæ³ã®æã§ã¯ãããŸãããããã€ãã®éèŠãªèª²é¡ããããŸãã
1. ã¡ã¢ãªïŒVRAMïŒæ¶è²»é
èšç®ã³ã¹ãã¯æããããŸãããã¢ãã«ã®å šãã©ã¡ãŒã¿ãã¡ã¢ãªã«èŒããŠããå¿ èŠããããŸãã 1å ãã©ã¡ãŒã¿ã®MoEã¢ãã«ãåããã«ã¯ãæšè«æã«å®éã«èšç®ããã®ã¯äžéšã§ãã£ãŠãã1å ãã©ã¡ãŒã¿åã®VRAM容éãå¿ èŠã§ããããã¯ãããŒããŠã§ã¢èŠä»¶ãéåžžã«é«ãããŸãã
2. åŠç¿ã®äžå®å®ããšããšãã¹ããŒãã®åãã
åŠç¿ããã»ã¹ã«ãããŠãç¹å®ã®æ°åã®ãšãã¹ããŒãã°ãããéžã°ããä»ã®ãšãã¹ããŒããã»ãšãã©äœ¿ãããªããšããçŸè±¡ïŒExpert CollapseïŒãèµ·ããããšããããŸãããããé²ãããã«ãåãšãã¹ããŒããåçã«åŠç¿ãããããã«èª¿æŽãããããŒããã©ã³ã·ã³ã°ã»ãã¹ïŒLoad Balancing LossïŒããªã©ã®ç¹æ®ãªææ³ãå¿ èŠã«ãªããŸãã
3. éä¿¡ãªãŒããŒããã
忣åŠç¿ãå€§èŠæš¡ãªæšè«ãè¡ãéãç°ãªããšãã¹ããŒããç°ãªãGPUã«é 眮ãããŠããå ŽåãããŒã¯ã³ãé©åãªãšãã¹ããŒãã«éãããã®éä¿¡ïŒAll-to-Alléä¿¡ïŒãçºçãããããããã«ããã¯ã«ãªãããšããããŸãã
ãŸãšãïŒã〠MoE ãæ¡çšãã¹ããïŒ
MoEã¯ã以äžã®ãããªç¶æ³ã§ç䟡ãçºæ®ããŸãã
- èšç®ãªãœãŒã¹ïŒFLOPsïŒãç¯çŽãã€ã€ãã¢ãã«ã®ç¥è容éãæå€§åãããå Žåã
- æšè«ã®ã¬ã€ãã³ã·ïŒé å»¶ïŒãæããªãããéåžžã«å€§èŠæš¡ãªã¢ãã«ãå©çšãããå Žåã
éã«ãã¡ã¢ãªå®¹éãéãããŠããç°å¢ããã¢ãã«ã®ãµã€ãºèªäœãå°ããæãããå Žåã«ã¯ããã³ã¹ã¢ãã«ã®æ¹ãé©ããŠããŸãã
çŸåšãMixtral 8x7B ã GPT-4ïŒåã¬ãã«ã§ããïŒã®ããã«ãMoEã¯æ¬¡äžä»£ã®AIéçºã«ãããäžå¿çãªã¢ãŒããã¯ãã£ãšãªã£ãŠããŸãã
Optional learning community: https://t.me/GyaanSetuAi