There are two categories of missing
data: data missing at random and data missing not at random. If data are
missing randomly, we can assume that they will not bias not the results.
However, data missing not at random may be a strong biasing influence.
Let us use an example from Osborn
(2013), of an employee satisfaction survey given to schoolteachers. The
teachers are surveyed twice, once in September and once in June. Missing random
data would mean data missing in June had no relationship to any variable from
the September survey (such as satisfaction in Sept., age, and years of
teaching). An example, might be if we randomly selected 50% of the people who
responded in September to again complete the survey in June, we would
legitimately be missing half of the data in June (the 50% of people we did not
ask). The missing data would be random and not related to a specific variable
such as satisfaction, age, years teaching.
On the other hand, suppose only
teachers who were satisfied responded to the survey in June (i.e., people who
were dissatisfied were less likely to respond to the survey). Then the missing
data are considered missing not at random and may substantially bias the
results. Thus, the June survey would show a higher than expected satisfaction
score (because unsatisfied people did not participate).
Next time we will consider how to
deal with your missing data. Do you have an issue or a question that you would
like me to discuss in a future post? Would you like to be a guest writer? Send
me your ideas! leann.stadtlander@waldenu.edu
No comments:
Post a Comment