AI ALIGNMENT FORUM
AF

Recent Activity

51Have LLMs Generated Novel Insights?

Abram Demski, Cole Wyeth, Kaj Sotala

4mo

19

42why assume AGIs will optimize for fixed goals?

nostalgebraist, Rob Bensinger

3y

3

27What convincing warning shot could help prevent extinction from AI?

Charbel-Raphael Segerie, Diego Dorn, Peter Barnett

1y

2

7Egan's Theorem?

5y

7

40Seriously, what goes wrong with "reward the agent when it makes you smile"?

Alex Turner, johnswentworth

3y

13

14Is weak-to-strong generalization an alignment technique?

4mo

1

9What is the most impressive game LLMs can play well?

5mo

8

4How counterfactual are logical counterfactuals?

6mo

9

16Are You More Real If You're Really Forgetful?

Thane Ruthenis, Charlie Steiner

7mo

4

6Why not tool AI?

smithee, Ben Pace

6y

2

69Why is o1 so deceptive?

Abram Demski, Sahil

9mo

14

7Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger

9mo

5