Talk about low standards
Published on
I came across the following video titled “What We Actually Know About Software Development, and Why We Believe It’s True”
Greg Wilson - What We Actually Know About Software Development, and Why We Believe It's True from CUSEC on Vimeo.
It seems interesting, although I only had time to watch a few bits of it. But here is what I found most peculiar about this talk.
First, the speaker laments about low standards of proof in the industry. People talk about the usefulness of domain-specific languages, functional programming, or agile development without citing any studies to back up these claims.
Then, as if to show how to do it right, he presents the results of a study on the effect of anchoring in software estimation. Here’s a screenshot (about 15m56s into the video):
When you see numbers such as 5.1 months or 7.8 months, what do you expect their standard error to be?
I couldn’t find the raw data from the study, but here’s the chart from the paper showing the data points summarized above:
The intervals are individual (expert) participants. There are 16 of them split among 3 conditions.
To be clear, I am not questioning the conclusions of the study. The effects of anchoring are pretty uncontroversial, and being a software developer myself, I can easily imagine being anchored by someone else’s estimates or expectations.
I have some objections to the study. The raw data is not available, the sample size is low (which makes it impossible to verify the assumptions), and I don’t see any mention of a multiple hypotheses correction.
But most of all I am startled by the speaker, who reduces the variance among the participants to a point estimate and quotes it with one decimal place. He then lists several factors (how much experience they have, what technique they use etc.) and confidently concludes that “none of that has any statistical bearing on the result” — based on a study with 23 participants split across 3 conditions and (at least?) 4 groups.
So yeah, if these people teach us about data-based research, our standards are low indeed.