AI ALIGNMENT FORUM
AF

Connor Kissane
Ω202700
Message
Dialogue
Subscribe

Posts

Sorted by New
34SAEs are highly dataset dependent: a case study on the refusal direction
7mo
0
23Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
7mo
3
35Base LLMs refuse too
8mo
10
28SAEs (usually) Transfer Between Base and Chat Models
11mo
0
18Attention Output SAEs Improve Circuit Analysis
1y
0
34We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
1y
0
28Attention SAEs Scale to GPT-2 Small
1y
0
35Sparse Autoencoders Work on Attention Layer Outputs
1y
3

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found