The Power Of Survey Taxonomies To Skew The Results The Way You Want Them

I went to the doctor’s office this week and they asked if I would participate in a short Federal survey. I said sure.

“What is your nationality?”
“American,” I said.
“That is not an option,” said the lady.
“What are the options?” I asked.
“Hispanic, Asian, Asian Black, African, Central American, Chicano, Cuban, Hispanic Latino, Mexican, Native American, Native Hawaiian, South American, Spanish, White, White Hispanic Other, Unknown and Refused,” she said.
“I am Native American, White, Spanish and Mexican,” I said.
“You can only pick one,” she said.
“I am also Welsh, English, Scottish, German, Dutch, Irish and married to a Bohemian Mexican, Spanish French man. Put me down as Refused!”
She said “Most people are putting down Refused or Other.”

I figured I was at the doctor’s office, many groups have known medical predispositions to diseases. That must be why they are asking. Medical predispositions of some sort (whether susceptibility to certain diseases or response to certain drugs) might actually have been why they were asking; at least it’s quite plausible. Of course, there’s still a problem with the lack of granularity, whether they’re doing research or predicting risk.

One example is that ingestion of fava beans which may be fatal for some people of Mediterranean descent. I’ve heard anecdotes about a U.S. Army cook serving up meals with fava beans, and the infirmary subsequently dealing with an influx of very sick people.
I don’t have a reference for the latest version of the OMB Directive (still the 1997 one) and came across the FDA’s “Guidance for Industry: Collection of Race and Ethnicity Data in Clinical Trials“, which says, in part:

“Differences in response to medical products have already been observed in racially and ethnically distinct subgroups of the U.S. population. …For example, in the United States, Whites are more likely than persons of Asian and African heritage to have abnormally low levels of an important enzyme (CYP2D6) that metabolizes drugs belonging to a variety of therapeutic areas, such as antidepressants, antipsychotics, and beta blockers (XIE 2001). Other studies have shown that Blacks respond poorly to several classes of antihypertensive agents (beta blockers and angiotensin converting enzyme (ACE inhibitors) (Exner 2001 and Yancy 2001). …Clinical trials have demonstrated lower responses to interferon-alpha used in the treatment of hepatitis C among Blacks when compared with other racial subgroups.”

Ashkenazi Jews are known to be especially vulnerable to certain diseases, e.g. breast cancer. And from the American Association of Cancer Research Journal “62% of the Taiwanese colorectal tumor specimens analyzed exhibited Eps8 over expression.”

Those would indicate excellent reasons to do this survey. Nope! This classification does not justify those. The groups were incredibly unbalanced. All of Asian, Chinese, Korean, Indian, Malay etc are in a single class – half the works population under a single classification. “African” Africa is a huge continent. There are many phenotypes there and all are grouped into a single lump. White, not German, Scandinavian, English, French, plus most Spanish and Portuguese are Caucasian in origin as well.

More background. Many have tried to classify mankind. Bodin’s color classifications in the mid 1500’s were descriptive using neutral terms based on skin color such as “duskish colour, like roasted quinze, black, chestnut, and farish white.”

By the 1600’s Bernier settled on four subgroups based on the four quarters of the globe and used Europeans, Far Easterners, Negroes (blacks), and Lapps.

In the 1800’s Louis Agassiz made a case for genre of scientific racism based on creationism and gained wide followings. We have Arthur de Gobineau to thank for the theory of the superior races and the Aryan race. He saw the intermingling of races – like French marrying Germans as a degenerative process. Thomas Huxley and Charles Darwin were believers in monogenism (all humans descended from one evolutionary process). Huxley separated mankind into 9 types – four of them on the African continent, and three types of Mongoloid. Darwin argued that they were all one speicies and in the Descent of Man, chapter VII argues that all “should be classed as a single species or race, or as two (Virey), as three (Jacquinot), as four (Kant), five (Blumenbach), six (Buffon), seven (Hunter), eight (Agassiz), eleven (Pickering), fifteen (Bory St. Vincent), sixteen (Desmoulins), twenty-two (Morton), sixty (Crawfurd), or as sixty-three, according to Burke. This diversity of judgment does not prove that the races ought not to be ranked as species, but it shews that they graduate into each other, and that it is hardly possible to discover clear distinctive characters between them.”

In the later 19th and 20 centuries there were a lot of mental excursions into classifications based on intelligence, skull shape, etc. By the 1930’s people had stopped trying to do these types of classifications and the rise of the Nazi’s underscored how damaging such classifications can be leading to ethnic cleansing by superior.

In 1954 UNESCO condemned all approaches to classification by race saying that we should not make examples of the Caucasian, Negroid and Mongoloid races but rather talk about ethnic groups which share common cultural ties.

So what is the government doing? Recent news articles have heralded a 40% level of Hispanics in the US. Is that true? Do I have to be only one classification? How reliable are surveys where of the 28 classifications available 8 could be roughly grouped as Hispanic (what happened to Iberian?). Aren’t the Spanish a combination of Moors and Celts? Why do we try to do this?

An interesting way to trace our thinking is to follow the US Census categories. In the 1790 Census the count was made on White Males, White Females, other free persons, and Slaves (all types). In 1940 Mexican was counted as white. In 2010 the census allows for an entire question on Hispanic origin including Argentinean, Salvadoran, etc., and an additional 15 categories for Race. Wikipedia itself has 35 entires for race and ethnicity. Seven of those are Hispanic and an additional one for Non Hispanic whites.

The American Anthropological Association made recommendations for the classifications for 2010 but they were not accepted by the Census Bureau. There is still no American for those of us who do not fit into one or even two classifications. Let’s see 8 out of 18 classifications is … 44.5 percent. The news says that the Hispanics are 40% of the population. I wonder what the Irish are. If we had a classification for Central Europeans would they be a bigger part of the population?

This shows the power of the classification system in surveys. If you want to get a certain answer then you make that percentage of questions or answer options the percentage you hope for. How many Chinese Americans? They are under Asian. How many people from India? Look under Asian. Japanese, Filipino, Thai, Vietnamese, guess where to look. All are classed together. Want to know how many Arabs? Tough!

What if we were to let people put in their own classification what would the answer be? The 1980 and 1990 censuses came close to that option. But they did not allow multiple posting. You could be either Black or White. If you said White/Black you were classed as White, if you put Black/White you were coded as Black. I do understand that the big mainframe computers of that age had fixed length fields and coded options with limited sort options. But those days are long past. Now we handle variable length fields of text, multiple subfields, we can sort and aggregate information in many ways.

What would I do?

Work from the data. People got really annoyed with the census. Some refused to answer at all. The options were not ones they felt comfortable with. I would let people put in their own assessment. That would give a realistic assessment of what people prefer to think of themselves as. What if we decide to collect the information to see how diverse we really are? Actually we do not have this data, but we should try to collect it. It is not too early to decide on what to collect for the 2020 Census. Perhaps by then we can see …20/20.
Ensure that the balance of the surveys is truly reflective of the data group. Do not bias it by the questions asked. If 44.5% of the answers provide a single grouping, is that really a fair survey? I would not allow surveys that try to cram everyone into a single class (multiple broader terms should be allowed). I would allow as many listings (race or ethnicity) as people want to take the time to put in. We are a melting pot of a country. We used to be proud of it. Now we try to segment and separate which drives wedges and divides.
Provide associations. If you let people do their own classification, allowing free associations, then the results would provide linkages the creator of a survey instrument could not foresee. The richness of related terms in the thesaurus or links in semantic web are a bonus in richness of expression.
Make a hierarchy. Are all those classifications equal or are some subdivisions of others? Could someone choose a higher level because they are Cuban and Latino? Some want to be grouped as Hispanic? It does not have to be a single flat list. Let people decide how discretely they want to be classified. That would tell us a lot about the nation. This step takes a lot of care; it’s is where an unscrupulous or careless group would have power to really slant the survey by the way it organizes the hierarchy.
Does it matter what the group calls itself? There are shorthand ways of describing every ethnic group and race. Can we allow them to use those names and translate them into officialdom? I think that would make the results a better source of information about the groups themselves. Someone could decide on the preferred term usage, but not at the data collection level. That would interfere with the real data collection.

Summary

If the census and other surveys were built on controlled vocabulary principles, then there would be Associate, Equivalence and Hierarchical options. Working from the data instead of imposing a preferred order on the subjects would give a significantly enhanced data set. In this digital age, we should be able to do much better. We are no longer bound by old style mainframe computing or tallying all results by hand. Let’s catch the census and other surveys up to the current information standard practice.

Marjorie M.K. Hlava
President, Access Innovations

The Power Of Survey Taxonomies To Skew The Results The Way You Want Them