There are two categories of missing data, data that are
missing at random (MAR) and data that are missing not at random (MNAR). If data
are missing randomly, we can assume that they will not bias not the results.
However, data missing not at random may be a strong biasing influence.
Let's use an example from Osborn (2013), of an employee
satisfaction survey give to school teachers. The teachers are surveys twice- once
in September and once in June. Missing at random data would mean that data that
were missing in June had no relationship to any variable from the September
survey (such as satisfaction in Sept., age, years of teaching). An example,
might be if we randomly selected 50% of the people who responded in September
to again complete the survey in June- we would legitimately be missing half of
the data in June (the 50% of people we did not ask). The missing data would be
random and not related to a specific variable such as satisfaction, age, years
teaching).
On the other hand, suppose only teachers that were satisfied
responded to the survey in June (people who were dissatisfied were less likely
to respond to the survey). Then the missing data are considered missing not at
random (MNAR) and may substantially bias the results. Thus, the June survey
would show a higher than expected satisfaction score (because unsatisfied
people did not participate).
Next time we will consider how do deal with the missing
data. Do you have an issue or a question that you would like me to discuss in a
future post? Would you like to be a guest writer? Send me your ideas!
leann.stadtlander@waldenu.edu
Osborn, J. W. (2013). Best
practices in data cleaning. DC: Sage.
No comments:
Post a Comment