“To understand God's thoughts, we must study statistics, for these are the measure of his purpose.” FLORENCE NIGHTINGALE

“Absence of evidence is not evidence of absence.” CARL SAGAN

Dear Editor,

Like most cardiac surgeons in this country, I grew up with a less than passing familiarity with the world of numbers. And, of course, with no great enthusiasm to make further acquaintance with that forbidding, unfathomable and, yes, incredibly boring subject. The only numbers which truly mattered to us was how many operations we had done that week or that month or perhaps, more importantly, what those numbers were for our rivals, of which, unfortunately, there are always plenty. Our attitude to outcome data was much more relaxed, rather like the gross domestic product (GDP) figures often touted about in our newspapers. The numbers are wildly different, depending on whether it is the ruling government or the opposition whose data one is reading. Similar was the outcome data presented in our meetings, especially in the eighties and nineties, echoing Mark Twain “Facts are stubborn but statistics is more pliable!” If the mortality figures in a presentation were high, the chair person often congratulated the speaker for “an honest” report!! Most of the data was about hospital survivors with very few, if any, long-term survival analysis or risk stratification. I suspect it is true even today. To be charitable, data collection and maintenance was never the priority as we had so many more pressing matters to sort out. Establishing the safety of procedures was the priority, given the constraints of cost, availability of man power, infrastructure etc. The list is endless and it would be churlish to deny the collective efforts of our community in establishing the current standards of cardiac surgery in our country. A leading German cardiac surgeon, while visiting our unit a few years ago, remarked, Indian public infrastructure might be subject to the jibe of “third world”, but cardiac surgery standards were first world! And yet, the value of data, collected with rigor and analyzed sensibly, is undeniable. All of us use scoring systems almost every day to aid in decision-making. Perhaps if we peeked into that world of numbers, who knows, you might find a friend!

My own interest in statistics was kindled by a set of fortuitous circumstances. As our transplant numbers increased, we felt the dire need for data collection as we were unsure whether, given the reality of our water and air pollution, any of the hospital survivors would survive very long. All the hospitals I have worked in had an extremely efficient computer department which could capture a single syringe or needle which was used but had NO INTEREST in tracking clinical outcomes. With the help of a non-government organization (NGO) called Aishwarya trust, we hired two full-time data entry persons who entered all data into an Excel sheet. This coincided with mobile Internet revolution in our country and every patient had a smartphone and was able to send us data about immunosuppression, hospitalisations, lab reports and, yes, death. Transplanted patients tend to maintain close contact with the transplanting team forever, and while this can be a bit tiresome, sometimes, while dealing with trivial non-issues, from the perspective of data collection, it is priceless. While all this data was lying in an Excel sheet, we did nothing useful with it as we were plodding along as usual with our mundane daily surgical work.

Then the Corona Virus Disease (COVID) pandemic struck with the ensuing lockdown. A once-in-a-century catastrophe of unimaginable horror had some silver linings. Extra Corporeal Membrane Oxygenation (ECMO) became more sophisticated, mainstream, and several units developed the capability of supporting patients on ECMO for several months with significant success especially during the second wave and we had time in our hands as no surgery was possible.

I thought of hiring a professional statistician to analyze our data and in that process learnt two very important lessons. The data has to be clean; otherwise, it is meaningless. A creatinine value of 0.5 and 5 are very different and yet the data entry person can enter it wrongly, as they have no knowledge and even less interest. A left atrial pressure of .7, 7 and 70 are very different, but a carelessly entered decimal can have far-reaching consequences when analyzing the data. The second lesson was the person doing the statistical analysis needs some domain knowledge to be truly useful. We hired three statisticians before I realized the value of data cleaning and domain knowledge, and decided to learn statistics by doing an online course in R (https://www.r-project.org/about.html) and STATA statistical package (https://www.stata.com). And I got hooked. Data, as Chuck Huber, the chief technical officer in STATA, often states, “is like fresh ingredients used in cooking”. If it is clean and good, the statistics can be extremely revealing exactly like how good the food tastes with very little effort. And along the way, as I was deep diving into statistics, since I had the time, I delved a bit into each of the great personalities, who have contributed so much to this great science.

Pie chart

One of the most commonly used graphic charts in statistics is the pie chart. Interestingly, this was first described by Florence Nightingale … yes, of the nursing fame! We often think of her as the original nurse and a self-sacrificing angel of mercy, but she was also a formidable, self-educated statistician. During the Crimean War (fought in 1853–1856 between the Russian empire on one side and an alliance of the Ottoman empire with France and UK on the other. After 150 years, the war is still on, with the players merely changing sides!), Nightingale observed the appalling death rates in military hospitals. To advocate for improved sanitation, she used a creative adaptation of the pie chart, called the “Coxcomb Diagram”. It resembled a colorful, segmented circle, with each segment representing the number of deaths in a specific month and further divided into causes like wounds and diseases. Unlike dry statistics, the vibrant colors and varying segment lengths instantly captured the eye and effectively highlighted the shocking fact: more soldiers died from preventable diseases and infection than battle wounds. This compelling visual evidence swayed the “obtuse, seemingly ignorant army generals” and fuelled her campaign for sanitary reforms. She created various statistical diagrams throughout her career to use the power of visual communication to make complex data accessible and drive social change. Her work not only influenced data visualization practices, but also paved the way for modern infographics like ggplot 2 [1]. And the irony, in our country, is how little we value knowledge and innovation and spirit of enquiry in our nurses and how poorly we pay them. Often using the name of Nightingale!

P value and confidence intervals

These are so commonly used in medical literature that often P values assume the role of “The Gods have spoken!”, conclusive proof or lack of it in an experiment. Have they been around forever? Interestingly not. To unravel its origins, one needs to do a time travel to Cambridge of 1920s. In a summer afternoon, a bunch of academics were having tea when someone suggested that tea tasted different depending on whether tea was poured first or milk! This seemingly frivolous observation was immediately picked up by a thin short man with thick glasses and a Vandyke beard who said, “let us test the proposition “ [2] (This book is a “must read” for those interested in the non-mathematical aspects of statistics). The man was Ronald Fisher of the Fisher’s exact test used in comparing two different samples. He is credited with developing the theory behind P value, and the concept of confidence intervals and hypothesis testing was developed in the 1930s by Jerzy Neyman, a polish mathematician and Egon Pearson an English statistician and the son of Karl Pearson of the famous chi-square test. Karl Pearson was a controversial figure who tried to use statistical methods to propose racial superiority in the context of Darwinian evolution and had an ongoing bitter feud with Fisher. Interestingly, confidence intervals became popular in the medical literature only in the 1970s and early 80 s, and in cardiac surgery, due to the efforts of Eugene Blackstone and John Kirklin [3], though, increasingly, the absolute reliance on P value is being questioned. “To P or not to P is the question ?” [4, 5].

Kaplan–Meier curve

It is ubiquitous in survival studies and is used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. It is named after Edward Kaplan and Paul Meier who each submitted similar manuscripts to the Journal of American Statistical Association [6].The journal editor convinced them to combine their work into one paper, which has been cited more than 34,000 times since its publication in 1958.

Student’s t test

This has an extremely interesting origin. Young Lord Guiness inherited the Guiness brewing company in Dublin at the turn of the twentieth century and wanted to introduce scientific methods into the art of brewing beer by hiring university graduates from Oxford. Sealy Gosset, a 23 year old with a degree in mathematics and chemistry, joined the firm. He published a paper in 1904 dealing with the ratio of yeast to be added to a jar to produce the best tasting beer, by testing various combinations which could be modelled with a probability distribution called “Poisson distribution”. (Poisson distribution has been known for a hundred years prior and had been used to calculate the number of soldiers in the Prussian army who died of a horse kick.) Since it was against company policy to allow publication by its employees, Gosset used the name “student”!

Interestingly, Indian statisticians have been very prominent in the field and the pioneer Prasant Mahalanobis, a contemporary of Fisher, founded the famous Indian statistical institute and described the Mahalanobis D statistic.

The world of statistics invokes deep passion. “In God we trust and for everything else we need data” screams one headline. “Lies, damn lies and statistics” screams another. Where does the truth lie? Somewhere in between, I guess!