вторник, 18 января 2011 г.

Regression Toward the Mean

источник = [http://davidmlane.com/hyperstat/B153351.html]

A person who scored 750 out of a possible 800 on the quantitative portion of the SAT takes the SAT again (a different form of the test is used). Assuming the second test is the same difficulty as the first and that there was no learning or practice effect, what score would you expect the person to get on the second test? The surprising answer is that the person is more likely to score below 750 than above 750; the best guess is that the person would score about 725. If this surprises you, you are not alone. This phenomenon, called regression to the mean, is counter intuitive and confusing to many professionals as well as students.

The conclusion that the expected score on the second test is below 750 depends on the assumption that scores on the test are, at least in some small part, due to chance or luck. Assume that there is a large number, say 1,000 parallel forms of a test and that (a) someone takes all 1,000 tests and (b) there are no learning, practice, or fatigue effects. Differences in the scores on these tests are therefore due to luck. This luck may be a function of simple guessing or may be a function of knowing more of the answers on some tests than on others

Define the mean of these 1,000 scores as the person's "true" score. On some tests, the person will have scored below the "true" score; on these tests the person was less lucky than average. On some tests the person will have scored above the true score; on these tests the person was more lucky than average. Consider the ways in which someone could score 750. There are three possibilities: (1) their "true" score is 750 and they had exactly average luck, (2) their "true" score is below 750 and they had better than average luck, and (3) their "true" score is above 750 and they had worse than average luck. Consider which is more likely, possibility #2 or possibility #3. There are very few people with "true" scores above 750 (roughly 6 in 1,000); there are many more people with true scores between 700 and 750 (roughly 17 in 1,000). Since there are so many more people with true scores between 700 and 750 than between 750 and 800, it is more likely that someone scoring 750 is from the former group and was lucky than from the latter group and was unlucky.

There are just not many people who can afford to be unlucky and still score as high as 750. A person scoring 750 was, more likely than not, luckier than average. Since, by definition, luck does not hold from one administration of the test to another, a person scoring 750 on one test is expected to score below 750 on a second test. This does not mean that they necessarily will score less than 750, just that it is likely. The same logic can be applied to someone scoring 250. Since there are more people with "true" scores between 250 and 300 than between 200 and 250, a person scoring 250 is more likely to have a "true" score above 250 and be unlucky than a "true" score below 250 and be lucky. This means that a person scoring 250 would be expected to score higher on the second test. For both the person scoring 750 and the person scoring 250, their expected score on the second test is between the score they received on the first test and the mean.

This is the phenomenon called "regression toward the mean." Regression toward the mean occurs any time people are chosen based on observed scores that are determined in part or entirely by chance. On any task that contains both luck and skill, people who score above the mean are likely to have been luckier than people who score below the mean. Since luck does not hold from trial to trial, people who score above the mean can be expected to do worse on a subsequent trial. This counterintuitive phenomenon is illustrated concretely by a simulation found here [http://onlinestatbook.com/stat_sim/reg_to_mean/].

In regression with standardized variables [http://davidmlane.com/hyperstat/B133621.html], the regression equation is:

Zy' = (r)Zx

where Zy' is the predicted standardized score, Zx is the standardized score on the predictor, and r is Pearson's correlation [http://davidmlane.com/hyperstat/A34739.html]. This means that the predicted standardized score will be closer to the mean of zero whenever the correlation is not perfect (not -1 or 1).

For example, if the SAT had a mean of 500 and a standard deviation of 100, then a score of 750 would have a standard score equivalent of 2.5 since 750 is two and a half standard deviations above the mean. If the test-retest correlation were 0.90, then the predicted standard score for someone with a standard score of 2.5 would be (0.90)(2.5) = 2.25. Therefore, they would be predicted to be 2.25 standard deviations above the mean on the retest which is 500 + (2.25)(100) = 725.

Regression toward the mean will occur if one chooses the lowest-scoring subjects in an experiment. Since the lowest-scoring subjects can be expected to have been unlucky and therefore have scored lower than their "true" scores, they will, on average, improve on a retest. This can easily mislead the unwary researcher. What if a researcher chose the first-grade children in a school system that scored the worst on a reading test, administered a drug that was supposed to improve reading, and retested the children on a parallel form of the reading test. Because of regression toward the mean, the mean reading score on the retest would almost certainly be higher than the mean score on the first test. The researcher would be mistaken to claim that the drug was responsible for the improvement since it would be expected to occur simply on the basis of regression toward the mean.

Consider an acutal study that received considerable media attention. This study sought to determine whether a drug that reduces anxiety could raise SAT scores by reducing test anxiety. A group of students whose SAT scores were surprisingly low (given their grades) was chosen to be in the experiment.

These students, who presumably scored lower than expected on the SAT because of test anxiety, were administered the anti-anxiety drug before taking the SAT for the second time. The results supported the hypothesis that the drug could improve SAT scores by lowering anxiety: the SAT scores were higher the second time than the first time. Since SAT scores normally go up from the first to the second administration of the test, the researchers compared the improvement of the students in the experiment with nationwide data on how much students usually improve. The students in the experiment improved significantly [http://davidmlane.com/hyperstat/A6642.html] more than the average improvement nationwide. The problem with this study is that by choosing students who scored lower than expected on the SAT, the researchers inadvertently chose students whose scores on the SAT were lower than their "true" scores. The increase on the retest could just as easily be due to regression toward the mean as to the effect of the drug. The degree to which there is regression toward the mean depends on the relative role of skill and luck in the test.

Consider the following situation in which luck alone determines performance: A coin is flipped 100 times and each student in a class asked to predict the outcome of each flip. The student who did the best in the class was right 62 times out of 100. The test is now given for a second time. What is the best prediction for the number of times this student will predict correctly? Since this test is all luck, the best prediction for any given student, including this one, is 50 correct guesses. Therefore, the prediction is that the score of 62 would regress all the way to the mean of 50. To take the other extreme, if a test were entirely skill and involved no luck at all, then there would be no regression toward the mean

HyperStat Online Statistics Textbook