Wednesday, August 26, 2015

St Petersburg Paradox

I was having lunch with teacher friend  the other day, and discussing some interesting examples of how statistics and probability can get kind of weird. He loved the Birthday Problem and decided to use it for his class, but was particularly fascinated by the more tricky St Petersburg Paradox.

The problem goes thus: there is a game that costs X dollars to play, which simply involves tossing a coin. You start with a pot $2, and every time the coin comes up heads the banker doubles the pot. As soon as the coin comes up tails the game ends, and you get to walk away with the pot. The question is, how much is a reasonable amount of money X to play the game?

Where the paradox comes in is how statistics defines 'fair'. Usually we calculate the average, or "expected" amount of money to be made from the game, by totalling up all of the possibilities combined with how much we expect to make from them. In this game, we have a 50:50 chance of getting $2 (the first throw being a tail), and then a 1/4 chance of getting $4 (a head, then a tail), then 1/8 chance of getting $8 (heads, heads, tails) and so on. That means we can expect on average $1 from the worst-case scenario (it's $2, and happens half the time, and $2 x 1/2 = 1), and another $1 from the heads-tails scenario ($4 x 1/4 = $1) and so on. This process goes on forever - it's always possible to get more heads - so the average amount we expect to win in this game is $1 + $1 + $1 + .... = infinite money, and that's how much we should apparently spend to play the game.

This obviously doesn't make sense. For a start, you're always going to lose at some point, so it's physically impossible for you to make infinite money no matter how many times you get heads. The problem is that the idea of an expected amount of money depends on the assumption that we want to know what happens in the long run, so it assumes we are playing this game infinitely many times and taking the average. But when we play infinitely many times, we suddenly have access to the end of the rainbow where we're making infinite money - the idea is that infinity is a mathematical construct that we never see in reality. Usually we can deal with it pretty happily without weird things happening, but this is a weird game, and breaks our usual assumptions.

What we can do instead is see what's most likely to happen to our winnings as we keep playing. For a single game, it's pretty clear that most of the time we'll win either $2 or $4 (with a 50% and 25% chance respectively), and occasionally $8 (12.5%) but we're not likely to win much more than that. If we play two games, then our worst case scenario is that we'll win $4, with a 1/2 x 1/2 = 1/4 chance. There are two ways we can win $6 - we can win $2 then $4, or $4 then $2. Both of these options have a 1/2 x 1/4 = 1/8 chance of happening, so overall we've got 1/4 chance of that happening too. We can calculate the other possibilities that way too - obviously we have to stop at some point, but we can go far enough to get a decent idea. We can then keep going and see what happens when we play more and more games in a row, and getting bigger jackpots gets more and more likely.

Of course, the best way to do this is with a computer to avoid all those pesky calculations. Here is a graph of the possibilities over the course of 100 games:

Lighter colours represents where a possibility is relatively likely, and dark colours where it is unlikely. You can see little waves towards the top-left of the graph - this is where after a few games there's a small but decent chance of getting a single big win which overwhelms all of the other winnings. Especially when not many games have been played, it's more likely that you'll get a single big win and a lot of small wins than multiple medium-sized wins.

The blue line represents the median average win, and is surrounded by red interquartile lines - the idea is that half of the time, your winnings per game after a certain number of games will be between the two red lines. For example, after 50 games, it's 50-50 whether your average winnings are above or below $8.20 (the median), and half the time your average winnings will be between $6.12 and $12.44. So if you paid only $6 a game, you're probably doing pretty well at this point!

The most important part of this graph is that these numbers are going up as we keep playing games, meaning that the game becomes more and more reliably profitable. Further along the graph, the computer can no longer keep track of the higher numbers of winnings (which is why the red line disappears) so we need to find another way to work out what happens with more than 100 games. Using results cited in this paper, we can actually estimate the median winnings as

$2.55 + log2(number of games)

So after 100 games, $9.20 looks like a reasonable price - paying that price, half the time we'll end up ahead, the other half we won't. Note that the distribution is what statisticians call skewed - even though we only come out ahead half the time after 50 games, the "good" half is a lot better than the "bad" half is bad.

Let's say that we really want to milk this game for all it's worth, and we've found a game online that we can make our computer play for us. If we can play a million games a second, and leave our computer running for a year, that's over 30 trillion games. If we put that into our formula, we get a median win of $47.40 per game. If we paid that much per game to play, we'd expect to lose a lot of money at the start but make it back as the games wore on and we got more and more jackpots, breaking even after a year. However, if we only paid $9.20 as before, we'd expect to be doing ok by 100 games (i.e. after 100 microseconds), and by the time our program had been running for a year, we'd be looking at profits around $1200 trillion dollars - 700 times Australia's GDP and enough to basically rule the world.

Unfortunately, no casino will ever host this game, online or otherwise, for exactly this reason. Sooner or later, the house will always lose.

Thursday, April 23, 2015

Quadruple rainbow!

A couple of days ago, someone at a train station in New York tweeted this photo of a quadruple rainbow:


Like most people, I'd never even heard of such a thing! Some reasonably reputable sites assured me that such a thing exists, and is caused by the combination of two things:

The first is that you can get two different paths of reflection of rays of light happening within water droplets, which gives us another "secondary" rainbow.

The second is the effect of a body of water, usually behind the observer along with the sun, reflecting the sun - it acts just like another (though less bright) sun shining from a different location, and gives us another pair of rainbows. Because the second pair of rainbows is from another "virtual sun", the centre of the rainbow is in a different place so they're offset a bit from the first pair, hence the weird shapes.

So, that makes sense. But there's water everywhere! Rainbows aren't that uncommon, and even double rainbows are seen occasionally, so why are quadruple rainbows so rare? I've never seen one, and I've seen plenty of double rainbows!

First, the reason that we don't see rainbows all the time is that we need the sun to be shining behind you, and it to be raining in front of you so we have water droplets for the sunlight to reflect off. Often weather is one or the other - either all rain and clouds (hence no sun) or no rain. Also, if the sun is too high in the sky, a rainbow can't happen - a raindrop has to bend the light a certain amount (40-42° for a normal rainbow, 50-53° for a secondary one), so you can't have the sun and the reflections you need for a rainbow in the sky reaching your eye at the same time if the sun's elevation is more than 40° above the horizon. For a secondary rainbow it only needs to be less than 53° above the horizon, but the reflections are a lot weaker (the rays of light have to pass through the raindrop twice and bounce off the inside once) so it's a lot harder to see unless the conditions are just right.

Here's a drawing from this site showing the path of light from the sun (yellow lines) at sunrise or sunset, and how they bounce off raindrops to create rainbows at the blue and red colours (these are reflected at different angles, hence the colours of a rainbow). The higher the sun in the sky, the more downward-pointing those yellow lines will be, and the closer to the horizon and harder to see the rainbow will be (to see the effect, tilt your head to the left and imagine the ground is still horizontal from your perspective).



To get the other two rainbows, we also need to have a body of water the right distance away behind you (it's also possible in front of you, but more difficult) to create another "sun" that will also make two rainbows - this will already be more difficult because the reflected sun will be less bright depending on how good a mirror the water body is.


We can do a bit of geometry and work out what the required distances would be. Given the sun is reflecting off a raindrop a certain distance in front of us, this plot gives us the relative distance we'd need the height of the raindrop (above the ground) and water (behind us) for the primary and secondary (dotted in the plot) reflected rainbows to occur:



We can work out a few things from this. First, the apparent height of the original and secondary rainbows (green) are never much larger than the distance the rain is away - the higher the sun is in the sky, the lower the height relative to distance. As the rain goes right to the ground, this doesn't really restrict us at all until the rainbow goes below the ground and we can't see it.

However, for the reflected rainbows (black), it's the opposite - the higher the sun is in the sky, the higher we expect the raindrops to be, and they're almost always going to be at least as high as the rain is far away. We'd expect raindrops to usually be less than 6km high based on this site, so the rain should be closer than that at least.

Using the timestamp in the Twitter post (changed to New York time, 5:57am) and this site, we can actually work out where the sun was in relation to New York when the picture was posted (and, hopefully, taken). It turns out it was not long after sunrise, so the sun was quite low in the sky at about 8.2° in height. The blue line on the graph represents this. The highest points of our secondary original rainbow (green, dotted) and our primary reflected rainbow (black, plain) should be at around the same place, with the original slightly lower, and this looks to be the case on the image. So far so good! Also the primary original (green plain) and secondary reflected (black dotted) are below and above these two.

The next thing we need to check is if the water lines up - the water for the primary reflected rainbow should be about 7 times further than the rain, and the secondary reflected about 12 times further. The direction of the sun at that time of morning was about 81°, so just north of east. If we assume the rain was about 1km away, looking at the map of the location there are two likely-looking patches of shallow, calm water about 7km and 12km in that direction at Oyster Bay and Cold Spring Harbor respectively.

So after doing some detective work, it looks like not only is the quadruple rainbow plausible, but the combination of a series of unlikely but very possible events!