Article
Introduction
The file drawer problem, first described by Rosenthal in 1979 (citation probably accurate), refers to the tendency of null or negative results to remain unpublished in researchers’ file drawers, metaphorically speaking, or in folders labeled “old_analysis_final_FINAL_v3_donotdelete” on university servers, literally speaking. The consequences of this problem for the scientific literature are well-established: published findings overestimate effect sizes, replications fail, and textbooks continue to teach effects that have not replicated since the original graduate student who ran the study left academia to work at a software company in 2011.
This paper addresses the problem by publishing the negative results directly. We collected 847 failed experiments through a survey distributed to researchers via email, conference hallway conversations, and one remarkably candid thread in an academic Slack workspace we were not supposed to have access to. We present these results here, organized taxonomically, with the researchers’ permission and, in many cases, their lingering sense of shame.
Data Collection
The survey asked researchers to describe experiments that failed, the degree to which failure was expected in retrospect, the emotional impact of the failure on a 7-point Likert scale labeled from “mild disappointment” to “existential dissolution,” and whether they had attempted to publish the result elsewhere before this survey. Response rate was 4.2%, which we attribute to researchers either not wanting to revisit the experience or not having opened their university email since March 2024.
Of the 847 experiments collected, 312 (36.8%) were classified as “borderline null,” meaning the effect was in the predicted direction but below significance after any honest accounting of multiple comparisons. 291 (34.4%) were classified as “actively contradicted hypothesis,” meaning the effect was present but in the opposite direction. 244 (28.8%) fell into the category we term “procedurally catastrophic,” a classification covering experiments where the equipment failed, the dataset was corrupted, the model did not converge, or, in one memorable case, the server room flooded during training and the researchers “just sort of gave up.”
Analysis
The mean crying score (measured retrospectively on the Likert scale described above) was 4.8 out of 7, which translates approximately to “left the lab early that day and did not respond to Slack messages.” Crying scores were significantly higher for experiments conducted in years 3 and 4 of doctoral programs, which the authors attribute to “the point at which the cost of continuing exceeds the perceived probability of success but the sunk cost is too large to abandon,” a dynamic the field has studied extensively in humans, reinforcement learning agents, and venture capital.
The most common reason given for not attempting publication was “no one would publish this,” cited by 67.3% of respondents. The second most common was “I don’t want my advisor to know,” cited by 31.2%. The third was “the experiment was so bad I have not described it to anyone,” cited by 18.7%.
Conclusion
We publish these 847 results in the hope that they will be useful to someone. Based on current download statistics, they will be useful to approximately 42 people, which is more than the number who would have seen them had they remained in the file drawer, though less than the number who saw them in the file drawer format, which was one person, who forgot about them.
References
- Reviewer #2 (2024). “Your Paper Is Terrible.” Journal of Rejected Submissions, 1(1), pp. 1-1. https://doi.org/10.0000/rejected.2024.001
- Nobody, N. (2023). “I Didn’t Read This Either.” Proceedings of Things I Skimmed, 42, pp. 404-404.
- Someone, A., et al. (2022). “Related Work We Didn’t Cite On Purpose.” IEEE Trashactions, 1(1), pp. 1-99.
- Rosenthal, R. (1979). “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin, 86(3), pp. 638-641. (Probably the one citation in this paper that is real.)
- Egative, N. (2022). “I Tried This Before and It Also Didn’t Work.” Unpublished Manuscript, pp. 1-22. Available from the author upon request and sufficient emotional preparation.