AI ALIGNMENT FORUM
AF

HomeLibraryQuestionsAll Posts
About

Top Questions

51Have LLMs Generated Novel Insights?
Q
Abram Demski, Cole Wyeth, Kaj Sotala
4mo
Q
19
42why assume AGIs will optimize for fixed goals?
Q
nostalgebraist, Rob Bensinger
3y
Q
3
27What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie, Diego Dorn, Peter Barnett
1y
Q
2
40Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner, johnswentworth
3y
Q
13
69Why is o1 so deceptive?
Q
Abram Demski, Sahil
9mo
Q
14
Load MoreView All Top Questions

Recent Activity

51Have LLMs Generated Novel Insights?
Q
Abram Demski, Cole Wyeth, Kaj Sotala
4mo
Q
19
42why assume AGIs will optimize for fixed goals?
Q
nostalgebraist, Rob Bensinger
3y
Q
3
27What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie, Diego Dorn, Peter Barnett
1y
Q
2
7Egan's Theorem?
Q
johnswentworth
5y
Q
7
40Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner, johnswentworth
3y
Q
13
14Is weak-to-strong generalization an alignment technique?
Q
cloud
4mo
Q
1
9What is the most impressive game LLMs can play well?
Q
Cole Wyeth
5mo
Q
8
4How counterfactual are logical counterfactuals?
Q
Donald Hobson
6mo
Q
9
16Are You More Real If You're Really Forgetful?
Q
Thane Ruthenis, Charlie Steiner
7mo
Q
4
6Why not tool AI?
Q
smithee, Ben Pace
6y
Q
2
69Why is o1 so deceptive?
Q
Abram Demski, Sahil
9mo
Q
14
7Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
Q
David Scott Krueger
9mo
Q
5
Load MoreView All Questions