AI ALIGNMENT FORUM
AF

lewis smith

Posts

Sorted by New

8lewis smith's Shortform

9mo

0

58Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

2mo

6

33A Problem to Solve Before Building a Deception Detector

4mo

1

98The ‘strong’ feature hypothesis could be wrong

10mo

0

39Improving Dictionary Learning with Gated Sparse Autoencoders

1y

32

40[Full Post] Progress Update #1 from the GDM Mech Interp Team

1y

3

36[Summary] Progress Update #1 from the GDM Mech Interp Team

1y

0

Wikitag Contributions

Comments

Sorted by