Sample size risks in experiments with users
ABSTRACT:
One critical decision that researchers must make when designing
experiments with users is how many participants to study. In our field,
the determination of sample size is often based on heuristics and
limited by practical constraints such as time and finances. As a result,
many studies are underpowered and it is common to see researchers make
statements like "With more participants significance might have been
detected," but what does this mean? What does it mean for a study to be
underpowered? How does this effect what we are able to discover about
information search behavior, how we interpret study results and how we
make choices about what to study next? How does one determine an
appropriate sample size? What does it even mean for a sample size to be
appropriate?
In this talk, I will discuss the use of statistical power analysis for
sample size estimation in experiments. Statistical power analysis does
not necessarily give researchers a magic number, but rather allows
researchers to understand the risks of Type I and Type II errors given
an expected effect size. In discussing this topic, the issues of effect
size, Type I and Type II errors and experimental design, including
choice of statistical procedures, will also be addressed. I hope this
talk will function as a conversation starter about issues related to
sample size in experimental interactive information retrieval.
No comments:
Post a Comment