03 April 2008

Don't Bet on April 31st...

The Nenana Ice Classic guess submission deadline is day after tomorrow, Saturday April 5. Don't miss it! Guess early and guess often. No, that's voting... Guess late and don't go crazy, that's the best strategy...let me explain.

In my last post I outlined, quite convincingly I feel, what a hopeless guesser I am. How the first year Rationality and Reason prevailed and every year since I picked past winning times kinda willy-nilly and according to birthday's of relatives and pretty girls. Nevertheless I'm going to now segue directly into why I should be convincing you how to guess scientifically... It's back to Reason for me too this year...

First, I established in a comment to the last post that the best guess is April 30, since that is the day on which the most Breakups have occurred. It seems to be the "average" day, and the guess of mine that is almost always closest. But, the question arises, has April 30 always been the average day?

Below is a graph of Breakup date over the years:




OK... "Year" is on the Y-axis, and "Day after Jan 1" on the X-axis is just another way to note the date... What do you see? If you're like me, you see dots... Another way to look at the same exact data is to connect the dots:




OK... Now what do you see? These connected dots show "trends" better, and to me, it looks like the jaggedy line is kind of going down-ish... But IS IT? Well, statistics to the rescue... going back to the original dot view, and plotting a regression line we get...



The red regression line points down. Notice, that the beginning of the line is at about the level of 129 "Days after Jan 1," i.e. about May 8. Notice, that the right end of the line is at about the level of 121 "Days after Jan 1," i.e. about May 1. So, there it is. Breakup is happening *on average* one week early now then when the Ice Classic started. That fact is "statistically significant." All those numbers to the right of the graph tell us that, particularly one that say's "p(uncorr):0.0009916." What that tells us is the chance that this line doesn't actually point down (i.e. have a negative slope, i.e. breakup happening earlier). The chance that this statistical "inference" isn't legitimate is less than 0.0001,meaning a 1/10,000 chance breakup isn't happening earlier, or conversely a 99.999% chance breakup is happening earlier. OK, I'm convinced... but... has it been *gradually* getting earlier. Like, a day a decade, e.g.? Well, look back at the first graph... To me it looks like the jaggedy line starts going south in a hurry about 1970... what do you think? ... If I break up the dataset into two periods 1917-1970 and 1971-2007 and do the same statistical test, on each period individually we get the following. First, 1917-1970:



Well now... turns out that the average date of breakup didn't get statistically signicficantly earlier or later from 1917-1970! Wow... that's interesting... how about 1970-now?



Wow, that's striking. The date of Breakup got way earlier since 1970. What's going on? I haven't a clue.

But, you might fairly ask, great but how is this going to make me rich? Well, all this tells you is the best date to guess, the current "average" date. So, the end of that red line "now" is at about day 119-120 or something, i.e. April 30...

But there has to be something "beneath" that variation or trend in Breakups... Breakup is a combination of things, ice melting because of warmer temperatures in spring, snow melting and flushing into the river and pushing the ice up and out, etc... so, one would imagine that if the ice is thinner, these things would cause Breakup to happen earlier... Is that the case? Well, ice thickness has been measured at the tripod, not since the beginning, but they have since 1989. Below is a graph of the data. On the Y-axis is Breakup date, on the X-axis is not "year," but instead "ice thickness," as measured on about April 1 every year. You don't need to know which dots come from which years, and it doesn't matter (see P.S. if you're interested):



So, here's what this means. Thicker ice in spring tends to result in later Breakups. If ice thickness didn't matter, then the line would be flat. It's not. You can imagine that ice thickness measure on December 1 might result in a flat line, since there's a lot that can happen between then and Breakup, extreme cold snaps, extreme warm snaps, etc... i.e. the predictive power of an ice-measurement on April 1st is much better than the predictive power on Dec 1. Duh... but how "good" a predictor is April 1 ice? Damn good actually. It can explain about 42% (say the stats on the right) of the "variation." If it explained 100% then all the dots would be right on the line, and you'd know exactly when Breakup would be just based on ice thickness. But 42% is darn good for a single "predictor variable." Other things explain the rest. Things like air temperature or how much snow is in the hills contributing to "flushing" the ice out. But we don't have that data. All we have is ice-thickness. So... almost done... what you should do is go to the Ice Classic website, figure out what the measurement for April 1 was this year, find that thickness on the X-axis and see what date to guess... that's what I did in 2004, and I did, meh... OK... Anyway, this is the best reason to guess late, to wait for that April 1 ice measurement, the best predictor. Oh, as for time of day. Throw a dart at a clock... No, just kidding. The early afternoon seems good... Oh, and of course, you can ignore all this and just go with your intuition... In any case, GOOD LUCK!


(P.S. There, apparently is no correlation between year and date or year and thickness in the 1989-2007 dataset... interesting)

Links:
http://www.nenanaakiceclassic.com/

Advanced Reading:

An article in the prestigious peer reviewed journal Science by a Stanford scientist (Wherein Tanana is misspelled Tenana):
http://www.sciencemag.org/cgi/content/full/294/5543/811
And it's interpretation:
http://news-service.stanford.edu/news/2001/october31/alaskabet-1031.html

A caveat (that I haven't addressed) written by the same scientist in the other of the worlds two most prestigious journals, Nature:
http://www.nature.com/nature/journal/v414/n6864/full/414600a.html
And it's interpretaion:
http://news-service.stanford.edu/pr/01/leapyear1212.html

A lengthy retort by some guy on the internet (who has way more graphs than I do, no fair!):
http://www.john-daly.com/nenana.htm

Some journalists have noticed the Nenana Ice Classic over the years
(Thanks Google "News Archive Search"!):



E.g. in 1973, about the time things were heating up... the NYT ran an article:
http://select.nytimes.com/gst/abstract.html?res=F20813F93954137A93CAA9178ED85F478785F9

An article by Sagarin "In Support of Observational Studies":
http://www.esajournals.org/perlserv/?request=get-abstract&doi=10.1890%2F1540-9295(2007)5%5B294%3AWBSOOS%5D2.0.CO%3B2&ct=1

... this may not be my last Ice Classic Post... It's about all I think about this month...

1 comment:

brittany said...

I find it fascinating that all the pretty girls you know were born in April or May. What are the odds of that?