CPU: —

SisyphusPI

Modular Addition
( a + b ) mod N  —  sweeping all hyperparameters
30 / 8,640
AIs trained
0.3% complete  •  0 grokked  •  0.0% grok rate
ETA
4 days, 19:34:28
avg time
48.3s/model
remaining
8,610

Summary

(wd, lr) Combos
4
Total Sacrifices
30
Total Grokked
0
Grok Rate
0.0%

Grok Rate — weight_decay × learning_rate

Darker green = more grokking. Grey = no data yet.

Grok Rate vs N

Steps to Grok vs N

Median training steps until test accuracy > 97%. Only shown for N values with at least one grokked model.

Average Training Curve

Per-N Statistics

N Sacrifices Grokked Grok % Avg Train Time (s) Median Steps to Grok
5300.062.2310000.0
10300.074.9110000.0
15300.085.7210000.0
20300.0108.3410000.0
25300.0122.6610000.0
30300.0144.8010000.0
35200.0164.8810000.0
40200.0197.8810000.0
45200.0224.4310000.0
50200.0252.5510000.0
55200.0290.7110000.0
60200.0323.0210000.0

Grok Rate by weight_decay

Grok Rate by learning_rate

Fix weight_decay → see learning_rate breakdown

Fix learning_rate → see weight_decay breakdown