To the Editor of Supportive Care in Cancer,

Your recent publication “Bismuth adjuvant ameliorates adverse effects of high-dose chemotherapy in patients with multiple myeloma and malignant lymphoma undergoing autologous stem cell transplantation: A randomised, double-blind, prospective pilot study” by Hansen and Penkowa (H+P) [1] was brought to our attention by recent mention in the public media.

In the article, the authors investigate the effect of using a bismuth adjuvant on various features of chemotherapy-related toxicity. However, several of the statistical significances reported in the manuscript appear to contradict the results presented in the figures, and the p values listed are considerably lower than what would be expected from the data presented. The discrepancies are so large that several conclusions are wrong.

The paper contains sufficient information about the statistical methods and the data to enable reproduction of their key results, however, when we try that, we get very different values.

Below, we reanalyze the results from some of the figures. We will focus on the two-by-two tables deduced from the text in Figs. 1–3, 5 relating to multiple myeloma patients. According to the paper, the p values were obtained from the chi-square test and/or Fisher’s exact test. Using these test statistics, we obtain the results in Table 1, using the R statistical software [2], and verified using a trial version of GraphPad Prism 7.02, GraphPad Software, La Jolla, CA, USA, www.graphpad.com. In a few cases, the percentages are not quite right (e.g., 30% of 6 patients are in one group which is impossible and must have been 33%) and we have rounded to the nearest integer. These discrepancies between the number of patients and percentages are minor. We calculate p values based both on the Yates corrected χ 2 and the uncorrected χ 2 (see Table 1).

Table 1 Reanalysis of data from Hansen and Penkowa (H+P)

As is apparent from the table, only one of the p values is below the 0.05 level according to our calculations, the rest are clearly non-significant. For Fig. 1a, we see a p-value of 0.0393 using a Fisher’s exact test and a p value of 0.0259 when using an uncorrected χ 2. This is formally significant, but given that multiple endpoints are considered, with no stated distinction between primary and secondary endpoints, and that these are comparisons within a subgroup such a result should be taken with considerable reservation.

The manuscript contains other statistical results that cannot be immediately validated using the published information. However, when quantitative variables are analyzed using two-sample t tests, then we find p values that are larger than the p values in the paper.

In summary, the conclusions reached by the authors do not follow from the data presented, and the conclusions from the present study should be that there is no evidence of an effect of bismuth.