AI ALIGNMENT FORUM
AF

lewis smith
Ω147200
Message
Dialogue
Subscribe

Posts

Sorted by New
8lewis smith's Shortform
9mo
0
58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
2mo
6
33A Problem to Solve Before Building a Deception Detector
4mo
1
98The ‘strong’ feature hypothesis could be wrong
10mo
0
39Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found