We see a lot of strong claims about how behavior x is good for you these days. Or that certain lifestyles make people happier. Pretty much every other day, you can find a major study of this kind being talked up on the front page of your favorite newspaper.
These studies are often necessarily based on observational data. No one is going to randomly assign people to go vegetarian for a decade, or get married when they otherwise would not have, so that we can see what happens to them. The classic concern with these types of studies is that people who perform healthy behavior x also exercise more (or those who are healthier, wealthier and more successful in their careers have an easier time finding suitable mates), and not even your crankiest crank (I don't think) doubts that exercise is important to health (or that health, wealth, and career achievement might influence life satisfaction). But one might question whether brushing your teeth prevents heart attacks by stipulating that people who exercise regularly might also brush more regularly.
The obvious response is that the people conducting these studies are not morons, and of course they controlled for exercise.
But does controlling for exercise solve the problem?
Let's go a step further. I suspect that most of my readers are already sympathetic to the idea that there is at least some cause for concern that maybe we shouldn't read too much into the putative relationship between brushing your teeth and heart health.
So how about a slightly more controversial example: vegetarianism.
Many studies purport to show that vegetarians are at a lower risk of heart attack. Less meat -> less cholesterol -> better heart health. Simple enough.
Maybe. Seems logical enough to a non-specialist like me. But, just for the sake of argument, let us ask, might it not look like vegetarianism is associated with heart health even if it wasn't, and, more importantly, even if we controlled for exercise?
I'm going to try to convince you that the answer is not only yes (duh) but that this could be true under circumstances that I don't think are especially implausible.
As usual, I encourage you to play along at home. I'm providing STATA code, but it's not hard to convert this to R, if that's your preferred software.
set obs 10000I create 10,000 observations, and begin by defining two subsets of the population. By prog and con, I'm drawing a distinction between those who hug trees, drive hybrids, and eat granola versus those who shoot guns, drive pickups, and eat red meat. This is obviously not unrelated to politics, but it is also a question of lifestyle choices and values.
gen prog = 0
gen ltype = uniform()
replace prog = 1 if ltype<0.2
gen con = 0
replace con = 1 if ltype>0.2 & ltype<0.4
What I've done is randomly assigned 20% of the population to each of these two stylized groups, leaving the remaining 60% as a baseline category of the general public. Obviously this is extremely simplified, and these are but two of many continuous dimensions along which people vary. I know. But the argument hangs together either way, and this is nice and easy, so please go with it.
gen veg = 0
gen lveg = uniform()
replace veg = 1 if lveg < 0.30 & prog==1
replace veg = 1 if lveg < 0.05 & prog==0 & con==0
Now I generate a binary indicator for vegetarianism. Yes, I know, there is variation in meat consumption that is not captured by this distinction, even without getting into th difference between pesco-vegetarians, vegans, and other groups. It's stylized, but stick with me.
I've assumed that 30% of progressives are vegetarians, 5% of the general public, and 0% of conservatives. I don't know how accurate those numbers are, and I'm sure a lot of us know at least one conservative vegetarian. But again, the argument works even if I make things more realistic. I'm just trying to keep this quick and simple.
gen exer = .
drawnorm lxprog lxcon lxrest, means(0.5, 0.5, 0.3) sds(0.075, 0.075, 0.15)
replace exer = lxprog if prog==1
replace exer = lxcon if con==1
replace exer = lxrest if prog==0 & con==0
replace exer = 0 if exer<0
I then generate three normally distributed variables to for how much people exercise, normalized to vary from 0 to 1. I assume the average progressive and average conservative score in the middle at 0.5, while the average member of the population is well below at 0.3.
Now, that gap may be unreasonable, and there may be a difference between the two elite groups. I don't know. The important thing for this illustration is that vegetarianism is positively related to a commitment to a healthy lifestyle in other ways, including exercise, while acknowledging that there are also many people who are committed to fitness who are not especially likely to be vegetarian. If you want to play with these numbers, feel free. You'll have to change them a fair amount to make my conclusion fall apart.
gen exer_obs = .
gen fudge = uniform()
replace exer_obs =(exer*2+fudge)/(3) if prog==1
replace exer_obs = (exer*2+fudge)/(3) if con==1
replace exer_obs = (exer+fudge)/(2) if prog==0 & con==0 & exer<0.5
replace exer_obs = (exer*2+fudge)/(3) if prog==0 & con==0 & exer>0.5
Here's where the heart of the argument shows up.
I'm assuming it is highly impractical, though obviously not impossible, to directly observe how often people exercise. Instead, we must rely on self-reports. I'm assuming that people are going to misstate the truth for lots of reasons, at least one of which is that not everyone follows a strict regimen and may not actually know the exact truth off the top of their heads. But they should be close, so I double-count the truth relative to the fudge factor for both progressive and conservative health nuts. I also do the same for members of the general public who do in fact exercise a fair amount, those scoring above 0.5.
However, I'm also assuming people who are less active are systematically more likely to misstate how much they exercise. Obviously that's going to be our big problem here. But notice how I introduce that. All I've done is take away the assumption that the truth counts twice as much as the fudge factor, which is an entirely random uniformly distributed variable. Thus, in some cases, this means that even inactive people are understating the frequency with which they exercise.
It is important to stress that I am not introducing an absurdly large amount of measurement error here. What I want to emphasize is how it would only take a modest amount of error to badly bias all of these types of studies. Which means that you need to be very, very confident that inactive people are no more likely to misrepresent their activity levels if you are going to stand by the types of observational studies that appear regularly in the news.
In my simulated data, the average level of exer and exer_obs are both very nearly precisely 0.5 amongst both progressive and conservative health nuts. For the general population, the mean value of exer is almost exactly 0.3, and the mean value of exer_obs is almost exactly 0.4. So we'd still see that health nuts exercise more, but we'd be understating the difference by a fair margin. But notice this still gives us a better 0.5 correlation between exer and exer_obs. I have not assumed, by any means, that self-reports are meaningless. I've even assumed they are incredibly accurate for the most active groups.
gen hattack = 0Next, I assume that the risk of heart attacks is partly unrelated to exercise and meat consumption. That might be genetics, it might be other lifestyle choices not accounted for by these variables, or whatever. More importantly, I assume that about half of one's risk of heart attack is determined by exercise. I create a continuous measure of risk that is partly random and partly (inversely) related to exercise, and further assume that heart attacks occur when the risk level exceeds 0.8. Again, this is very simple. But I don't think it's crazy.
gen lattack = uniform()
gen risk = (lattack + (1-exer))/2
replace hattack = 1 if risk>0.8
Finally, lets estimate a model.
logit hattack exer_obs vegIn my sample, I estimate negative and significant coefficients for both exer_obs -- which, recall, is the self-reported level rather than the true level -- and the binary indicator for vegetarianism. Despite the fact that, by construction, there is no relationship between vegetarianism and the risk of heart attacks.
I've run the same code a few times, and do not always get exactly the same results. There's a lot of room for variability in this setup. I even sometimes find veg to have a negative and significant coefficient estimate when replacing exer_obs with exer. But very rarely. So if you are playing along at home, I encourage you to clear the data set and re-run the code a few times. You should find that most of the time, controlling for exer_obs is not sufficient to prevent us from falsely concluding that vegetarianism prevents heart attacks, while controlling for exer, which is by assumption impractical if not impossible, typically solves that problem.
Maybe this is all obvious. The general idea that measurement error can bias our results is obviously nothing new. The point here is that when you start getting into specific examples, you find that the assumptions you need to make in order to tell a story whereby most of what gets reported in the news is bunk are not all that crazy. If we needed to be talking about enormous amounts of measurement error before things go wrong, there'd be no problem. But we don't.
So when someone expresses concern about the results of some health study (or on happiness, where we also conveniently find that lifestyles that both progressive and conservative elites typically endorse, like getting married and having kids, makes you happy), particularly because they think that individuals who are most inclined to engage in behaviors endorsed by prevailing cultural norms may also be more likely to exercise (or in the case of happiness, it's perhaps just maybe possible that those who are healthy, wealthy and successful in their careers are more likely to attract spouses in the first place), and someone defends these studies by saying that of course the researchers are not idiots, so they controlled for exercise (health, income, career), remember that the question isn't whether they controlled for those things. The question is whether they did so using self-reported values that are likely to be misstated, particularly by those who exercise less (are less healthy, wealthy, and successful). It doesn't take a great deal of misrepresentation by such individuals to render these studies all but meaningless.
And I don't know about you, but I don't find it hard to believe that at least some non-trivial proportion of people who exercise less, are less healthy, less wealthy, and less successful, just might be somewhat disinclined to acknowledge their socially undesirable traits to social scientists. If you want me to believe the latest study saying I need to live the life cultural elites think I should live if I want to be healthy and happy, you need to do a lot more than tell me that the people conducting these studies included crude controls for obvious confounders.
You need to persuade me that people with socially undesirable traits are generally quite comfortable fessing up to that when answering surveys.
UPDATE: I should have included a few more caveats.
It's entirely possible that individuals on vegetarian diets find it easier to stick to a certain level of caloric intake. Vegetable-based diets might be higher in micronutrients. There are obvious moral concerns surrounding the consumption of meat. I'm not trying to argue against vegetarianism. I'm not even claiming to know or believe that vegetarian diets fail to promote heart health. That very well could be true.
The bigger point is that your typical study about health or happiness arrives at a conclusion that you ought to be doing the things that cultural elites already thought you should be doing, yet the evidence provided is typically insufficient to warrant the conclusions offered. That is so because there typically remain legitimate concerns about omitted variable bias, even when the researchers include controls for the obvious confounders.
We might not be able to reasonably expect the researchers to go past self-reports, because it is so impractical to rely on anything else. Fin. But whether the researchers did the best that can be reasonably expected of them is not the same as whether the evidence they provide is sufficient to establish their conclusions. As long as there is reason to worry that people with socially undesirable traits misreport private information, there is serious concern about what we can and cannot learn about the determinants of health and happiness from observational studies.