Friday, April 7, 2017

Categories of Missing Data

There are two categories of missing data: data missing at random and data missing not at random. If data are missing randomly, we can assume that they will not bias not the results. However, data missing not at random may be a strong biasing influence.

Let us use an example from Osborn (2013), of an employee satisfaction survey given to schoolteachers. The teachers are surveyed twice, once in September and once in June. Missing random data would mean data missing in June had no relationship to any variable from the September survey (such as satisfaction in Sept., age, and years of teaching). An example, might be if we randomly selected 50% of the people who responded in September to again complete the survey in June, we would legitimately be missing half of the data in June (the 50% of people we did not ask). The missing data would be random and not related to a specific variable such as satisfaction, age, years teaching.

On the other hand, suppose only teachers who were satisfied responded to the survey in June (i.e., people who were dissatisfied were less likely to respond to the survey). Then the missing data are considered missing not at random and may substantially bias the results. Thus, the June survey would show a higher than expected satisfaction score (because unsatisfied people did not participate). 

Next time we will consider how to deal with your missing data. Do you have an issue or a question that you would like me to discuss in a future post? Would you like to be a guest writer? Send me your ideas! leann.stadtlander@waldenu.edu

No comments:

Post a Comment