is the two envelopes problem important for understanding bayesian

the two envelopes problem is a paradox about reasoning
it is not a core tool for bayesian inference

some formulae

Open envelope contains a number \(W_1\)
Unopen envelop contains unknown number \(W_2\) :

\[W_2 \begin{cases} W_1/2 \newline \ 2W_1 \end{cases}\]

the incorrect deduction from the problem is that:

\[P(W_2 = 2W_1 | W_1) = P(W_2 = \frac{1}{2} W_1| W_1) = 0.5\]

as such an erroneous deduction would lead to the conclusion that:e

\[E(W_2) = \frac{1}{2}(0.5 W_1 + 2W_1) = 1.25W_1\]

so first thing to understand… why is

\[P(W_2 = 2W_1 | W_1) = P(W_2 = \frac{1}{2} W_1| W_1) = 0.5\]

an incorrect deduction?

well, before we have seen W_1 we know that W_2 and be half or double W_1

however once we have seen W_1 equal probabilities W_2 is half or double W_1 is only one of many other set of possibilities.

without putting figures on it, after seeing W_1, there are ranges where double W_1 is more likely and where W_2 is half is more likely.

i have to admit the book i am reading does not do a good job at explaining the point i just made above:

Image Description

however it gives us a conditional expectation

\[E(W_2 | W_1 = w) = p(w) 2w + (1 - p(w)). \frac{1}{2} w\]

so how did they get to this:

i’m not sure about the probabilities they are using \(p(w)\) are they probabilities of W_2 taking w cos W_2 never takes w, it takes 2w or half w.

Rough Notes from Chat GPT

notes from reading chapter 2

wages and education
  • notation convention is Big letter for random variable and small letter for actual data

  • interesting example of wages versus level of education.
  • they model the two probablistically using gamma distribution… and i see two random variables in there … whereby the usual random variable is wages but one of the parameters is expressed as the other random variable education.
  • the probability of one random variable given the other fixes one of them as a value.
  • changing the value of expenses changes the form of the gamma distribution.

  • we have made obtaining wages our aim
  • wages are the random variable
  • probabilistic approach: prob of ages given education
  • different levels of education correspond to different genrations of pdf with varying parameter values

  • statistical approach: we fix wages and look at graph of education levels
  • strictly this is not a pdf
  • it is instead known as likelihood function
  • we determine most likely value of paramter expenses for given wages
envelope problem

are you familiar with the two envelopes problem isn bayesian thinking

Yes.

The two-envelopes problem is a classic paradox in Bayesian thinking and decision theory. You’re given two envelopes, one containing twice as much money as the other. You pick one at random and may switch after seeing (or sometimes not seeing) its contents.

The apparent paradox comes from this reasoning: If your envelope has amount x x, the other envelope has a 50% chance of containing 2x 2x and a 50% chance of containing x/2 x/2. The expected value of switching seems to be

0.5(2x)+0.5(x/2)=1.25x, 0.5(2x)+0.5(x/2)=1.25x,

so switching looks beneficial—no matter what x x is. That can’t be right, since the same logic applies symmetrically to both envelopes.

Bayesian resolution: The mistake is treating x x as a fixed value while also assigning symmetric probabilities without a prior over the possible amounts. Once you specify a proper prior for the smaller amount (or the process that generated the envelopes), the symmetry breaks. With a proper prior, the expected gain from switching is either zero or depends on the observed amount, and the paradox disappears.

Bottom line: There’s no universal advantage to switching. The “always switch” conclusion comes from an incoherent prior or mixing up conditional expectations.

the mistake is treating x - the amount - as a fixed value while alos assigning symmetric probabilities without a prior over the possible amounts. this goes over my head

That reaction is very common. Here’s a cleaner, intuitive way to see it.

Where the reasoning quietly breaks

When you say:

“My envelope has amount x x. The other one is either 2x 2x or x/2 x/2 with 50–50 probability.”

you are doing two incompatible things at once:

Treating x x as a known, fixed number (“I am holding exactly $x.”)

Treating x x as a random draw from an unknown process (because you’re assigning probabilities to what it could be relative to another envelope).

You can’t do both without saying how likely different values of x x were in the first place.

A concrete analogy

Suppose I say:

“I randomly picked a person. There’s a 50% chance they’re twice as tall as average and a 50% chance they’re half as tall.”

That sounds symmetric, but it’s nonsense—because people of different heights are not equally likely to be picked. Extremely tall or extremely short people are rarer.

Same issue here.

What the missing prior does

A prior answers this question:

Before I opened the envelope, how likely was it that the smaller amount was $10? $100? $1,000,000?

Without answering that, you don’t actually know:

how often your observed x x would arise as the smaller amount

versus how often it would arise as the larger amount

Those two probabilities are not automatically 50–50.

Once you pick a realistic prior:

large values of x x are much more likely to be the larger envelope

small values of x x are more likely to be the smaller envelope

That cancels the fake “1.25x” advantage.

One-sentence takeaway

The paradox happens because you assign 50–50 probabilities without justifying why those probabilities should be equal, and that requires a prior over the amounts.

If you want, I can also explain it without any formulas at all, or walk through a numerical example that makes the symmetry failure obvious.

i kind of have a loose grasp. but it si not firm. i feel a bit stupid

You’re not stupid at all. This problem confuses very smart people precisely because the mistake feels invisible. Your reaction actually means you’re noticing the right thing: something feels slippery.

Here’s the firm version, stripped to the core.

The solid intuition

The bad step is this sentence (even if it’s implicit):

“Given that I see x x, it’s equally likely that x x is the smaller amount or the larger amount.”

That feels reasonable — but it’s false unless you assume a very specific (and unrealistic) way the amounts were generated.

Why? Because big numbers don’t show up as often as small ones.

One concrete picture (no math)

Imagine the smaller envelope is chosen first.

Most of the time it’s something modest: $10, $20, $50. Very rarely it’s huge: $1,000,000.

Now you open your envelope and see $1,000,000.

Ask yourself honestly:

Is it really 50–50 that this is the smaller amount?

Or is it overwhelmingly likely this is the larger envelope?

Obviously the second. That alone breaks the paradox.

The “always switch” argument pretends this intuition doesn’t exist.

Why your brain rebels

Your brain is being asked to:

condition on a value after seeing it

while pretending that value had no effect on how likely the situation was

That’s a subtle conditioning error, not a lack of intelligence.

The grounding sentence (remember this)

You don’t get to assign probabilities after seeing a value unless you say how that value was generated.

That’s it. Once that clicks, the paradox is gone.

If you want, I can lock this in with one tiny numerical toy example that removes all ambiguity — but only if you want.

ok there could be a value where it feels 50 50 but most of the time it does not

Yes — that’s exactly right. And that’s the firm understanding.

Here’s the clean way to hold it in your head:

There can be values of x x where, given how the amounts were generated, it really is close to 50–50 whether x x is the smaller or larger envelope.

But that is not true for all x x.

The paradox comes from pretending it is true for every x x.

Once you accept that the probabilities depend on the size of x x, the “always switch” argument collapses.

A good mental shortcut:

Small amounts → more likely the smaller envelope Large amounts → more likely the larger envelope Only in the middle might it feel roughly 50–50

That’s not hand-waving — that’s exactly what a prior does.

You’ve got it. This is a subtle conditioning issue, not a math trick.

but before we opened the evnvelope didnt we think it was 50 50

Yes — before opening the envelope, it really is 50–50. That’s the key distinction, and you’re asking the right question.

Here’s the clean separation that makes everything click:

Before opening the envelope

You randomly pick one of two envelopes.

One is larger, one is smaller.

Probability you picked the larger one: 50%.

No controversy. This is correct.

After opening the envelope and seeing x x

You are now asking a different question:

“Given that I see x x, what’s the probability this envelope is the larger one?”

That probability is not generally 50% anymore, because seeing x x gives you information.

Simple real-world intuition

Before drawing a card:

Probability it’s an ace: 4/52.

After drawing a card and seeing it’s a heart:

Probability it’s an ace is no longer 4/52.

Same idea. Information updates probabilities.

Why the paradox happens

The bad argument quietly does this:

Uses the 50–50 from before opening

Pretends it still applies after observing x x

That’s the entire error.

Lock-in sentence

Picking the envelope is 50–50; identifying what you picked after seeing its value is not.

Once you keep those two moments separate, the paradox disappears completely.

so what was the 1.25x argument… the one that said always better to go for the envelope you didnt open

Good — this is the last piece. Once you see this clearly, the whole thing settles.

What the 1.25 x argument actually did

It pretended to answer this question:

“After I open my envelope and see x x, what is the expected value of the other envelope?”

But it secretly used probabilities from before you saw anything.

Specifically, it assumed:

“Given that I see x x, there’s a 50% chance the other envelope has 2x 2x and a 50% chance it has x/2 x/2.”

That assumption is exactly the mistake.

Why that assumption is illegitimate

The 50–50 statement was true only before opening:

50% chance you picked the larger envelope

50% chance you picked the smaller

Once you see x x, those probabilities must be updated using how likely it was for x x to appear in each role.

The 1.25 x calculation never does that update. It treats “I picked larger vs smaller” as still equally likely after observing x x.

The sleight of hand (this is the trick)

x x is treated as fixed and known

while also being treated as typical and symmetric

without asking how often such an x x would arise

That’s why it looks compelling and wrong at the same time.

One-sentence diagnosis

The 1.25 x argument mixes probabilities from before and after you observe the amount, which makes the expectation meaningless.

That’s all it is. No hidden cleverness — just a conditioning error that’s unusually well camouflaged.

so there is a funny thing here because i might not have a prior. i just might not know what numbers are reasonable BUT that somehow doesn’t matter because we know that any prior does not have 50 50. i think

Yes — that’s a very good way to put it, and you’re basically there. Let me firm it up just a bit.

“I don’t have a prior”

That’s fine. You don’t need to know the prior.

What matters is this:

You are not allowed to assume a 50–50 split after seeing x x unless some prior makes that true.

And in fact:

There is no proper, realistic prior for which “given I observe x x, this envelope is equally likely to be the smaller or larger one” holds for all x x.

So even if your prior is unknown, you already know the 50–50 claim can’t be universally right.

Why ignorance doesn’t rescue the 1.25x argument

Saying “I have no idea what numbers are reasonable” does not imply:

“Every value of x x should be treated symmetrically.”

Ignorance is not symmetry.

Any actual way the envelopes could have been filled:

makes extreme values rare

makes middling values common

That alone forces the conditional probabilities away from 50–50 once you see x x.

The funny-but-important point

Even without a prior, you can still say:

“Whatever the prior is, it cannot justify the same 50–50 calculation for every observed x x.”

That’s enough to kill the paradox.

Final anchor sentence

You don’t need a prior to reject the 1.25x argument — you only need to see that no prior could support it for all values.

That’s a mature Bayesian insight, not a beginner one.


This site uses Just the Docs, a documentation theme for Jekyll.