There
are two categories of missing data, data that are missing at random (MAR) and
data that are missing not at random (MNAR). If data are missing randomly, we
can assume that they will not bias not the results. However, data missing not
at random may be a strong biasing influence.
Let's use an example from Osborn (2013), of an employee
satisfaction survey give to school teachers. The teachers are surveyed twice-
once in September and once in June. Missing at random data would mean that data
that were missing in June had no relationship to any variable from the
September survey (such as satisfaction in Sept., age, years of teaching). An
example, might be if we randomly selected 50% of the people who responded in
September to again complete the survey in June- we would legitimately be
missing half of the data in June (the 50% of people we did not ask). The
missing data would be random and not related to a specific variable such as
satisfaction, age, years teaching).
On the other hand, suppose only teachers that were satisfied
responded to the survey in June (people who were dissatisfied were less likely
to respond to the survey). Then the missing data are considered missing not at
random (MNAR) and may substantially bias the results. Thus, the June survey
would show a higher than expected satisfaction score (because unsatisfied
people did not participate).
Next time we will consider how do deal with the missing
data. Do you have an issue or a question that you would like me to discuss in a
future post? Would you like to be a guest writer? Send me your ideas!
leann.stadtlander@waldenu.edu
Osborn, J. W. (2013). Best
practices in data cleaning. DC: Sage.
No comments:
Post a Comment