Monday, July 13, 2015

Making Data Make Sense- missing data


In almost any research study, there will be missing or incomplete data. Missing data can happen for a number of reasons: participants fail to respond to questions, subjects withdraw (or quit) studies before they are completed, and data entry errors. 

The problem with missing data is that nearly all statistical techniques assume or require complete data. There can be legitimately missing data; an example might be a survey in which one is asked if he or she married, and if so how long. If you are not married, than you would be correct in leaving the "how long" portion of the question blank.  

It is also important to realize that legitimately missing data can be meaningful. The missing data allows a validity check and may inform the status of an individual. Osborn (2013) proves a great example. In cleaning the data from an adolescent health risk survey, he noticed that some individuals indicated on one question that they had never used illegal drugs, but later in the survey when asked how many times they used marijuana, indicated an answer greater than 0. Therefore, an answer they should have skipped (or missing), showed an unexpected number. The author suggests several possible explanations, such as the subject was not paying attention and answered in error. However, a more intriguing possibility is that some subjects did not view marijuana as an illegal drug, which is an interesting possibility that could be examined in future search.

One way of dealing with legitimately missing data is making the missing and present data two separate groups. Using the marriage survey example, we could eliminate non-married individuals from a specific analysis when looking at issues related to being married vs. not married. So instead of asking the silly research question- "How long, on average, do all people, even unmarried people, stay married- we can ask two more refined questions: "What are the predictors of whether someone is currently married?" and Of those who are currently married, how long on average have they been married? 

Next time we will consider categories of missing data. Do you have an issue or a question that you would like me to discuss in a future post? Would you like to be a guest writer? Send me your ideas! leann.stadtlander@waldenu.edu

Osborn, J. W. (2013). Best practices in data cleaning. DC: Sage.

No comments:

Post a Comment