(update) [ENG / JPN] D-RNA : ３値化成功 (1.58 bit) ／３値モデル継続学習成功

(PS) D-RNA : Successful ternarization (1.58 bit) / Successful continued learning of ternary model

This is the real-time log of ternarization：158b_train_sample/158_log.txt

D-RNA repository：https://github.com/muooon/DRNA
It is a new ternarizable architecture that does not use STE

Completely inheriting the Transformer structure, the evolved D-RNA is also capable of pure ternary learning and inference
The above log is a log of ternarization, which progresses gradually like this

To begin with, D-RNA is a new architecture that excels in resonance contraction and long-term memory
By utilizing these features, it becomes possible to perform ternarization stably

With the STE-method ternary models, optimizers like Prodigy cause inconsistency
However, with this ternarization by D-RNA, it normalizes without being subject to that constraint

::: PS :::

By running this, anyone can perform a learning test for the ternary model
https://github.com/muooon/DRNA/tree/drna/158b_train_sample

This also allows for additional learning (continuation) of the ternary model
Ternarizing an existing model (conversion/fine-tuning (full fine)) is also executable

As far as I know, this is the "world's first" success story / without using STE
Any model can be stably ternarized by making it into a D-RNA type

※ "Additional learning" (continued learning) is possible while keeping the learning source as a ternary model
※ It is also possible to convert an existing normal model into a D-RNA type, and further perform "ternarization/fine-tuning" (full fine)
※ Of course, additional learning after conversion can also be performed / in other words, pure ternary learning

Let's organize the workflow

A: New ternary (1.58b) model -> Gradient reconstruction/Additional learning/Ternary recrystallization -> New ternary model
B: Existing (fp16/32) model -> Ternary learning -> Crystallization -> New ternary model -> Above A cycle

The D-RNA architecture realizes this

これは３値化のリアルタイムlogです：158b_train_sample/158_log.txt

D-RNA repository：https://github.com/muooon/DRNA
STEをつかわない３値化可能な新アーキテクチャです

Transformer 構造を完全継承し、進化を遂げた D-RNA は純粋３値学習と推論も可能です
上掲logは、３値化のlogです、このように徐々に３値化を進行します

そもそも D-RNA は、共鳴収縮と長期記憶を得意とする新アーキですが
この特徴を活用することで、安定的に３値化を行うことが可能になります

STE 方式３値モデルでは Prodigy などの optim は不整合を起こしますが
この D-RNA による３値化では、その制約を受けずに正常化します

::: 追記 :::

こちらを実行することで３値モデルの学習テストを誰でも行えます
https://github.com/muooon/DRNA/tree/drna/158b_train_sample

これは３値モデルの追加学習(継続)も可能です
既存モデルを３値化する(変換･微調整(フルファイン)) も実行可能です

これは知る限り "世界初" の成功例です / STE を不使用
どんなモデルも D-RNA 型にすると、安定的に３値化することが可能です

※ 学習元を３値モデルのまま｢追加学習｣(継続学習)可能です
※ 既存の通常モデルを D-RNA型へ変換し、さらに｢３値化･微調整｣(フルファイン)も可能
※ もちろん変換後の追加学習も行えます／つまり純粋３値学習です

ワークフローを整理しましょう

Ａ：新規３値(1.58b)モデル -> 勾配再構成･追加学習･３値再結晶 -> 新３値モデル
Ｂ：既存(fp16/32)モデル ->３値学習 -> 結晶化 -> 新３値モデル ->上記Ａサイクル

D-RNA アーキテクチャは、これを実現しています

(update) [ENG / JPN] D-RNA : ３値化成功 (1.58 bit) ／ ３値モデル継続学習成功

(update) [ENG / JPN] D-RNA : ３値化成功 (1.58 bit) ／３値モデル継続学習成功