Wednesday, July 15, 2015

Categories of Missing Data

There are two categories of missing data, data that are missing at random (MAR) and data that are missing not at random (MNAR). If data are missing randomly, we can assume that they will not bias not the results. However, data missing not at random may be a strong biasing influence. 

Let's use an example from Osborn (2013), of an employee satisfaction survey give to school teachers. The teachers are surveys twice- once in September and once in June. Missing at random data would mean that data that were missing in June had no relationship to any variable from the September survey (such as satisfaction in Sept., age, years of teaching). An example, might be if we randomly selected 50% of the people who responded in September to again complete the survey in June- we would legitimately be missing half of the data in June (the 50% of people we did not ask). The missing data would be random and not related to a specific variable such as satisfaction, age, years teaching). 

On the other hand, suppose only teachers that were satisfied responded to the survey in June (people who were dissatisfied were less likely to respond to the survey). Then the missing data are considered missing not at random (MNAR) and may substantially bias the results. Thus, the June survey would show a higher than expected satisfaction score (because unsatisfied people did not participate). 
Next time we will consider how do deal with the missing data. Do you have an issue or a question that you would like me to discuss in a future post? Would you like to be a guest writer? Send me your ideas! 

Osborn, J. W. (2013). Best practices in data cleaning. DC: Sage. 

No comments:

Post a Comment