Article
Introduction
Peer review is the mechanism by which the scientific community evaluates and validates new knowledge. It is also the mechanism by which three people who may or may not be domain experts, operating under time pressure, possibly in the middle of their own review season, determine whether a piece of work enters the permanent scientific record. These two descriptions are not in tension. They describe the same process.
This paper presents a ten-year empirical study of that process. We collected 10,000 paper submissions from nine venues, along with their complete review histories and eventual disposition. For papers that were ultimately accepted elsewhere, we collected the subsequent citation record. For papers that were never accepted anywhere, we collected what data we could, which primarily consisted of the arXiv preprint and, in three cases, a note from the corresponding author explaining that they had left academia and did not wish to be contacted.
The core question motivating this study is simple: do papers get rejected because they are bad? The answer, we show, is no, or at least: not in a way that is detectable using any measure of quality we could construct.
Results
We coded the stated reasons for rejection into four primary archetypes. The Methodological Dismissal (“the experiments are insufficient,” “the ablations are incomplete,” “the baseline comparisons are missing”) accounted for 38.4% of rejections and showed no correlation with whether additional experiments were requested or performed in subsequent resubmissions, which were accepted at the same rate regardless of whether the experiments were added. The Related Work Objection (“the paper ignores important prior work,” “the contribution is incremental given existing literature”) accounted for 31.7% and was most common in reviewers whose own papers were among the missing citations. The Scope Mismatch (“this paper would be better suited for a different venue”) accounted for 19.2% and was, we found, roughly equivalent to a coin flip: the papers sent to different venues were accepted there at approximately the same rate as they would have been accepted at the original venue. The Bad Day archetype, identified through a combination of tonal analysis and reviewer comment length (shorter reviews correlate significantly with negative mood in our annotation), accounted for 10.7% and was most prevalent in the month of December and the week after NeurIPS deadline.
The correlation between initial rejection and eventual citation count was r = 0.03 (95% CI: [-0.07, 0.13]). We interpret this as “none.”
Conclusion
We recommend that researchers treat a rejection as information about the review process rather than information about their paper. We further recommend that program chairs consider whether any system producing these results is functioning as intended. We have submitted this recommendation to three venues. It has been rejected twice for insufficient baselines and once because it was outside scope.
References
- Reviewer #2 (2024). “Your Paper Is Terrible.” Journal of Rejected Submissions, 1(1), pp. 1-1. https://doi.org/10.0000/rejected.2024.001
- Nobody, N. (2023). “I Didn’t Read This Either.” Proceedings of Things I Skimmed, 42, pp. 404-404.
- Someone, A., et al. (2022). “Related Work We Didn’t Cite On Purpose.” IEEE Trashactions, 1(1), pp. 1-99.
- Ejected, R. (2020). “Early Evidence That This Was Going to Be Hard to Publish.” Proceedings of Preliminary Findings, 1, pp. 1-8.
- Emoralized, D., & Ejected, R. (2018). “A Pilot Study We Have Not Followed Up On.” Workshop on Incomplete Research Programs, pp. 14-19.