What decision theory is for
(It’s for winning.)
See also: Newcomb’s Problem and Regret of Rationality
Omega, the alien philosopher-troll, tells you:
“I decided, before looking and pi, to make you the following offer:
If the third digit of pi is 0-4, I ask you for $10.
If the third digit of pi is 5-9, I give you $10,000 if I predicted you would’ve given me $10 if the digit was 0-4.
The third digit of pi is 4. Would you want to pay me $10?”
Should you pay?
Let’s say that the way Omega makes a prediction in a pi=3.18 world is by simulating you, except every time you think of pi or try to calculate it, it changes the results of your recollection and outputs of your computers to say that pi starts with 3.14. And then it sees whether you pay it.
Do you want to be the kind of entity that pays Omega in this situation?
I’m pretty sure the answer is yes.
In this situation, you don’t want the parts of you that make decisions to trust their inputs on what pi is; the way you’re making decisions should assume that you might be inside omega’s mechanism that predicts what you would do in a counterlogical world. It doesn’t matter that you’re experiencing pi as starting with 3.14. To win, you want to be — you can be — the kind of person who pays in situations like this one, because you can no longer rely on your knowledge of what pi is. (The fact you’re experiencing the real world is also an input into your decision-making algorithms, and you should ignore that input).
Same for transparent Newcomb’s. See two transparent boxes? Both have money? Want to be the kind of entity that sees both boxes with money and gets a million dollars? Well, there’s a simple way to do that! There’s one thing dependent on you here: your choice. And you can pick to win.
The purpose of study of decision theory isn’t to come up with some interesting math and look and its consequences. The purpose is to describe, in math, systematized winning.
This is why logical decision theories — UDTs, FDT — are superior to older EDT and CDT: they’re simply better at winning, better at describing wha a winning agent is doing, and prescribing what a winning agent should be implementing to systematically win.
If we’re trying to reason about AIs that are grown rather than crafted or are otherwise optimized, it’s likely that these AIs would be good at winning, because if there’s a way for AI to be that wins more, that way would be preferred by optimization processes, and so good decision theories will be able to describe the AI’s decision-making better.
And if we’re trying to build in AI, we want it to win according to our values, and that means we want it to have a decision theory that’s good at winning.
And the question we want to answer isn’t “what [a particular decision theory] prescribes our AI does in a situation”, and make AI do that in this situation; it’s “what move would win, according to our values”, and make an AI that does that.
That’s the purpose of doing decision theory: winning and developing a language to then talk about agents more broadly.1
Including perhaps developing “agent foundations” and being able to talk about what it means for agents to be aligned with each other and how we can optimize for finding an agent aligned with humans