BLATANTLY MISLEADING PAPERS IN SOME OF OUR BEST MEDICAL JOURNALS ADVERSELY AFFECT INFERTILITY PRACTICE
By David H. Barad, MD, MS, Director of Clinical ART and Research at the Center for Human Reproduction in NYC, where he is also a Senior Scientist. He can be reached directly at dbarad@thechr.com.
Demonstrating the competence of his MS degree in medical statistics, the CHR’s David Barad, MD, MS, offers here an important commentary on how poorly peer-reviewed publications, especially in more prestigious medical journals like the ASRM’s Fertility and Sterility (F&S), by giving credence to a publication, can negatively affect infertility practice. He indeed uses in his article a recent paper published in F&S not only to—elegantly—take apart the study design of this paper, but also to point out the special responsibility editors and an (excessively large) F&S editorial board have, as the primary publication of the ASRM, in not further contributing to the many misdirections in standard infertility care that already exist in routine fertility practice.
This publication is the edited version of an article that appeared in the September-October issue of the CHRVOICE.
People are increasingly asking, “what is it these days with Fertility and Sterility (F&S)?” Not only does this journal publish more commentary articles (i.e., “expert” opinions, the lowest level of evidence) than research articles, but many original research articles as well as commentaries are often highly misleading, potentially inducing readers to pursue often ridiculous clinical practices.
In the literature review section pertaining to Reproductive Medicine of the July/August issue of the CHRVOICE the CHR in a very detailed commentary already addressed this issue in conjunction with publication of a really bad paper by Chinese investigators in F&S that claimed utility for preimplantation genetic testing for aneuploidy (PGT-A) in older in vitro fertilization (IVF) patients with no more than three blastocysts for transfer (the paper, indeed won one of the WORST PAPER AWARDS for that issue of the CHRVOICE).1
A main reason why we then criticized acceptance of this paper by Chinese investigators was that the concept of embryo selection by any method, including PGT-A, can only make even minimal sense in patients with large enough embryo numbers. After all, in the absence of large embryo numbers, especially in older patients, nobody needs any form of embryo selection with only three transferable embryos.
But F&S recently just published another paper (at time of this communication the paper is still “ahead of print” and only available electronically), once again by Chinese investigators (there really appears to be good reason why the Chinese government—according to recent reports—is clamping down on poor quality submissions often associated with so-called paper mills), which appears to make as little sense in its initial thesis as the earlier PGT-A paper. This paper’s title is “GnRH antagonist protocol is associated with higher oocyte yield in young women at high risk for low oocyte retrieval: a retrospective study using three statistical methods.“ 2
It was accepted for publication even though the title, alone, already communicates an obviously counterintuitive message, in claiming that a GnRH antagonist protocol increases oocyte yield in women at high risk for poor response, as no suppressive treatment—and antagonist cycles are suppressive—ever increases oocyte yields. And to expect such an improved response in known poor responders should raise suspicion in every credible reviewer, inviting serious scrutiny of the study’s definitions, methodology, and clinical implications. And that, of course, apparently did not happen, - not from the reviewer and not from editors.
At the heart of the article is an unconventional redefinition of “low response,” operationalized as retrieval of fewer than 10 oocytes. This threshold significantly exceeds those defined by international consensus. The Bologna and POSEIDON criteria, widely accepted in clinical and research contexts, typically define poor ovarian response as fewer than 4 or 5 oocytes retrieved. The authors’ rationale for this redefinition stems from one of their earlier publications that contained a nomogram predicting lower live birth probability when fewer than 10 oocytes were obtained. However, prognostic modeling does not justify reclassification of diagnostic categories. The use of this elevated threshold risk automatically inflates the effect size and ends up misclassifying patients with normal ovarian performance as “low responders,” thereby undermining the interpretability and clinical relevance of all findings.
Methodologically, the most serious limitation of the study lies in its retrospective, non-randomized design. Treatment assignment was determined by individual providers, not by a pre-specified protocol or random allocation. As a result, patients in the GnRH antagonist and progestin-primed ovarian stimulation (PPOS) groups differed markedly in their baseline characteristics. Before matching, the antagonist group had substantially higher ovarian reserve by every available metric—AMH was more than double (2.2 vs. 1.0 ng/mL), AFC was nearly twice as high (10.8 vs. 5.8), and baseline FSH was significantly lower (9.2 vs. 13.0 IU/L). These differences persisted even after propensity score matching: AMH remained higher (1.7 vs. 1.3), AFC remained higher (9.4 vs. 7.2), and FSH remained lower (9.1 vs. 11.0) in the antagonist group. These imbalances favor the outcome—oocyte yield—and strongly suggest that clinicians were more likely to select the antagonist protocol for patients with better prognoses.
These differences, therefore, reflect clear underlying selection biases. Propensity score matching, while intended to reduce confounding, cannot correct for a treatment allocation that is systematically associated with unmeasured or inadequately balanced prognostic variables. Indeed, the authors’ own use of Bayes factors shows that these differences were not only statistically significant but supported by overwhelming evidence (BF₁₀ > 1000 for AMH and AFC). Such residual imbalance precludes reliable inference about the effects of treatment. Under these conditions, any observed difference in oocyte yield is far more likely to reflect baseline differences in ovarian reserve than a true effect of the GnRH antagonist protocol.
The selection of PPOS as the sole comparator is also highly problematic. While increasingly used in “freeze-all” cycles, PPOS is not a widely accepted standard protocol for young women with low or diminished ovarian reserve. The study excludes other well-established approaches, such as the long GnRH agonist or microdose flare protocols, seriously limiting the applicability of its findings for that reason alone.
In our own work at the CHR, we have demonstrated that for patients with low ovarian reserve, clinical outcomes can be significantly improved not by protocol selection alone, but by tailoring the timing of ovulation trigger to patient-specific factors, particularly age and follicular dynamics. Highly Individualized Egg Retrieval (HIER), which we have described in the literature previously, improves pregnancy rates by avoiding premature luteinization, especially in women with premature ovarian aging.3 This approach, along with individualized hormonal priming and cycle management, forms the foundation of the CHR’s treatment philosophy in women with low functional ovarian reserve, as outlined in our recent contribution to Optimizing Management of Fertility in Women over 40.4 Within this framework, the observed benefit of the antagonist protocol in Teng et al.’s study may reflect the limitations of the chosen comparator arm more than any intrinsic superiority of the antagonist itself.
While the authors report a higher mean oocyte yield in the antagonist group after matching (8.3 vs. 5.3), other laboratory outcomes—mature oocyte rate, fertilization rate, cleavage rate, and good-quality embryo rate—were essentially equivalent between groups. These similarities suggest that the additional oocytes retrieved in the antagonist group may not have translated into improved developmental competence (though even the improved oocyte numbers lack credibility in women with truly low functional ovarian reserve). Furthermore, the primary outcome measure—the incidence of “low oocyte retrieval”—is circular, as it merely reflects the same arbitrary <10 oocyte threshold used to define the study population and does not correlate with validated clinical endpoints.
Most critically, the study does not report implantation, pregnancy, or live birth rates, leaving the ultimate clinical utility of the proposed approach untested.
In conclusion, while the authors set out to examine protocol selection in a clinically challenging population, their findings are significantly undermined by methodological shortcomings. Most notably, the lack of randomization introduced systematic bias in treatment allocation, resulting in two study groups with markedly different ovarian reserve profiles. These baseline disparities far more plausibly explain the observed differences in oocyte yield than any inherent effect of the GnRH antagonist protocol.
Although the authors attempt to control for confounding with propensity score matching, key indicators such as AMH, AFC, and FSH remained significantly imbalanced—even more so when quantified using Bayes factors—making any causal inference unreliable. Moreover, the use of an unconventional definition of “low response” and exclusion of clinically relevant comparators further limit the study’s applicability.
As the field moves toward individualized treatment strategies, we must remain vigilant in distinguishing between biologically driven treatment effects and artifacts of study design. Innovation in ovarian stimulation must be grounded in rigorous methodology, validated clinical outcomes, and diagnostic clarity—not in retrospective associations that cannot withstand the bias introduced by selective treatment assignment.
We are deeply concerned that somewhat uneducated consumers of previously addressed PGT-A1 and here reviewed antagonist2 papers may be seriously misled into adjusting their practice patterns by both of these papers, which, after all, have appeared in the most prestigious and main journal published by the ASRM. It seems to us that the ASRM – more than any other publisher in the field – has a responsibility to protect the integrity of infertility practice. Publication of yet another paper that, simply based on its title alone, should have attracted increase scrutiny, quite apparently did not even receive minimally needed scrutiny. Editors, of course, cannot in detail review every paper. They, however, should and must be able to notice papers that simply make no sense. And, while that alone should, of course, not be enough to reject a paper, it should be more than enough reason to pick a paper for more detailed review out from even amidst the large onslaught of paper submissions editors these days routinely face.
Finally, and with considerable admiration for the many scientific important research achievements of Chinese colleagues over recent decades, it must be noted once more that both of here noted papers came out of China and that basic science and medical journals,5 as well as the lay press6 recently have been publishing increasing numbers of horror stories regarding Chinese (and other countries’) paper mills, apparently increasing the output of fake papers by 1505 every six months.
Shouldn’t the editors of our medical journals , therefore, not pay more and better attention to submissions from countries where such paper mills are known to exist?
References
Ou Z, et al. Effects of preimplantation genetic testing for aneuploidy on embryo transfer outcomes in women of advanced reproductive age with no more than three retrieved oocytes. Fertil Steril 2025;123(6):991-998
Teng et al., Fertil Steril 2015; https://doi.org/10.1016/j.fertnstert.2025.07.018; online, ahead of print
Wu YG, Barad DH, Kushnir VA, Wang Q, Zhang L, Darmon SK, et al. With low ovarian reserve, Highly Individualized Egg Retrieval (HIER) improves IVF results by avoiding premature luteinization. J Ovarian Res. 2018;11:23.
Barad DH, Gleicher N. (2024). Optimal IVF protocols for women over 40 and low functional ovarian reserve. In: Optimizing Management of Fertility in Women over 40. Cambridge University Press. https://www.amazon.com/Optimizing-Management-Fertility-Women-over/dp/1316516822
Richardson et al., PNAS 2025;122(32):e2420092122
Agius MW. Fake Studies on the Rise, Destroying Trust in Science. DW. Published August 10, 2025. Accessed October 13, 2025. https://www.dw.com/en/fake-studies-on-the-rise-destroying-trust-in-science/a-73533918#:~:text=Bogus%20research%20damages%20scientific%20integrity,do%2C%22%20Abalkina%20