An interesting set of survey results came out a few weeks ago. Folks were asked about how large they thought certain group sizes were (e.g. what % of the US is Black, what % earns over a million dollars, etc.). Below are the results for the median estimate, which is much more accurate than the mean estimate (means, especially for small/large groups, are thrown off by folks who are wildly inaccurate).

I had fun reading the corresponding article, because I learned that apparently demographic group size estimation is a thriving line of research in psychology and political science? Very cool, I guess. One thing that made sense in the article: don’t gawk at the innumerate Americans who are way off in their estimates. Instead, see that folks tend to pull estimates of extreme large/small groups towards 50%. This is a bit easier to see if we plot real and perceived group sizes against one another.

The dashed line is perfect scoring, where real and perceived are the same value. You see that perceived group sizes are higher among small groups and lower among large groups. People are most accurate when estimate the group size of groups that are 50% of the population. Let’s fit a simple line between real and perceived.

It looks to me like it’s really hard to get folks to say any group is smaller than 20% of the population or larger than 70%. So presumably you could ask a random sample of people, “What percent of the US population is Tom VanHeuvelen?” and they’d discount their proper understanding (very, very few) towards the middle and maybe say: “Well, very few, so probably about 15% of the US population is Tom VanHeuvelen.” Same with giant groups. “What percent of the US population has at least one human organ?” “Well, just about everyone, so maybe like 70%?”

To me, that means that the relative accuracy isn’t against the faint blue dashed line, where a 1% group is perceived to be 1%, but rather the pink line, where 1% groups should be expected to be identified as 20% of the population. That means that we should estimate a simple linear regression model predicting perceived group size using real group size as the single independent variable. The residuals, or the difference between predicted and observed, will let us see which groups folks actually over- and under-estimate.

Here it is. We see the constant term is 23, and the coefficient of the slope between real and perceived size is 0.53. That means that when a group is 0.0001% of the population, folks will say that it’s 23% of the US population. When a group is 100% of the population, folks will say it’s 23+(.53*100)=76% of the population. The bounds are roughly 25% and 75% for very large and very small.

Let’s look at groups with the smallest residuals, to get a sense of which group sizes are most “accurately” predicted.

“real” is the actual group size. “estimate” is the perceived group size. “predict” is the predicted estimated group size based on our simple regression. “res” is the residual, or real minus predicted, and “absres” is the absolute value of the residual (I want to see which groups most dramatically differ from the regression line, above or below it).

We see that folks do a really good job predicting certain groups that are about half the population: folks who are Republican and folks who are married. Each ~ half the population. Nailed it. We also see that, when accounting for the fact that folks over/under estimate small/big groups, certain small groups (Gay/Lesbian, Vegans/Vegetarians, Jewish individuals) are pretty accurately estimated. People have a good sense that the very large majority of the US eats meat, is heterosexual, is not Jewish.

Let’s look at the biggest discrepencies.

This gets a little funky, because a discrepancy can mean that respondents were either weirdly inaccurate or weirdly accurate.

Here’s one where folks are weirdly accurate: transgendered individuals. This group is small, much like Judaism and vegetarianism. Yet folks were much more accurate in identifying the small group size. When you expect a small group to be identified as 23%, saying it’s 12% is actually very precise. Same with households earning over $1 million dollar. Gawk at the raw 10% and think, “Stupid Americans and their dreams and bootstraps! No wonder they’re all so bad!” But actually, folks do a great job realizing that very few households have millionaire status. We expect folks to say 23%. So if anything, perhaps folks underestimate the high income earning space that exists in the US labor market…or perhaps income segregation really warps folks’ sense of the economic scene out there.

Gun ownership is interesting. Apparently 32% of Americans own guns, but whereas we’d expect folks to estimate this as 40%, folks actually estimate it as 52%. Either there are a lot of folks who lie to survey administrators about their gun ownership and people are correct, or else the high resonance of gun rights / gun violence in the news makes people think there are way more gun owners in the US.

Here are two that are fun to see together: smartphone ownership and book learnin’. Folks were really, really accurate in predicting smartphone ownership, saying 80% when we’d expect them to discount to 68%. So people understand the ubiquity of smartphones in the world. Not surprising, since you can see in your network and in the world smartphone usage. People thought only half of Americans read a book last year, when in fact 77% of people did. This represents an underestimation of 14 percentage points. Hey, I’m a professor and so know that I worry about the general appetite for reading in the population. But this suggests that people do in fact read, and yet the perception of ignorant and unread America has a real distortionary effect. Maybe people aren’t reading the books you want them to read, but there’s more reading than people generally suspect.

Three groups that are very overrepresented in people’s minds are Black, obese, and military veteran individuals. The true group sizes are 12%, 42%, and 6%, respectively. We’d expect people to say these group sizes are 29%, 45%, and 26%, but people actually say 40%, 56%, and 37%. This could be due to any number of mechanisms, more or less positive, more or less rooted in status threat or media representation. I must admit that I’m surprised at how few people are veterans. I’d have probably said 15-20%. I also wonder how much people substitute “Black” for the broader category “racial minority.”

So when I look at these results, here are some of my big takeaways (some talked about above, some not):

People are pretty good, sometimes weirdly good, at identifying group sizes of sexual minorities.
People way overestimate the proportion of the country that is obese, that is Black, that is connected to the military, that owns guns.
People are actually generally pretty good at understanding the shape of the distribution of incomes (weirdly accurate at predicting %of households over $25k, the small size of $1M+). But I think folks tend to be a little bit overly pessimistic about opportunities for reasonably high standards of living (see the pessimism about home ownership, people also tended to really underestimate the % of households with $100k or more) and a little too optimistic about the extent of economic precarity (folks overestimated how many people flew on airplanes, how many people are labor unions members, etc).