Setup and Context¶
Introduction¶
Dr Ignaz Semmelweis was a Hungarian physician born in 1818 who worked in the Vienna General Hospital. At a time when illness was still attributed to "bad air" or miasma, Semmelweis was building a case from numbers — and something in those numbers was deeply wrong.
In the early 1840s, one in ten women who gave birth at Vienna General Hospital died from childbed fever (puerperal fever). That rate fluctuated month to month, but it never went away. Semmelweis noticed it wasn't uniform across the hospital. Two maternity clinics sat side by side — same building, same city — and their mortality records looked nothing alike.
This analysis re-examines the same data Semmelweis published in 1861, covering monthly birth and death counts from 1841 to 1849. I want to see whether the numbers, examined fresh today, support the conclusion he drew: that a single procedural change — mandatory handwashing with chlorinated lime — was the difference between life and death.
The Data Source¶
Dr Semmelweis published his research in 1861. I found the scanned pages of the full text with the original tables in German, but an excellent English translation can be found here.
Environment Setup¶
Import Statements¶
Notebook Presentation¶
Read the Data¶
Did Handwashing Change the Distribution?¶
The aggregate rates already tell a story — the average monthly death rate drops from ~10.5% to ~5.0% after June 1846. But a mean alone can be pulled by a handful of extreme months. I want to see the full shape of each period: how wide the spread is, where the median sits, and whether the shift is broad-based or driven by outliers.
I labelled each month by handwashing status using np.where, then plotted both periods as box plots.
Overlapping Distributions¶
The box plot gives summary statistics but compresses the full shape. I want to see how often the death rate landed in any given range — and whether the two periods actually separate cleanly or overlap substantially.
A normalised histogram (histnorm='percent') puts both periods on a comparable scale despite the handwashing window being shorter. The marginal box lanes keep the quartile positions in view above the bars.
Kernel Density Estimate¶
Histograms are sensitive to bin width and can hide distribution shape. A kernel density estimate gives a smoother read of the underlying spread.
The first pass uses default parameters — which extend the density curve into negative values, a mathematical artefact that has no meaning here. The second pass clips to [0, 1] to keep the estimate grounded in what the data can actually represent.
Testing Whether the Drop Is Real¶
The visual evidence is striking, but two distributions can look different by chance — especially with a sample this size (98 months total). I want a formal answer: is the gap between periods statistically significant, or could random variation explain it?
An independent-samples t-test compares the difference in means relative to the variability within each group. A p-value below 0.01 means we can reject the null hypothesis — that handwashing had no effect — at the 99% confidence level.
How the T-Test Works¶
Hypotheses¶
We test two competing claims:
Null hypothesis $H_0$: there is no real difference between the average monthly death rate before and after handwashing. Any gap is random variation.
Alternative hypothesis $H_1$: the means differ — the handwashing policy had a real effect.
The Test Statistic¶
The t-statistic measures the difference in means relative to the expected random variation:
$$ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} $$
A large numerator (big difference in means) combined with a small denominator (low variability within each group) produces a large t — one unlikely to occur by chance.
The P-Value¶
The p-value answers: if there were truly no difference, how probable is a t-statistic this extreme?
$$ p = 2 \cdot \left(1 - F_{T_\nu}(|t_{\text{observed}}|)\right) $$
where $F_{T_\nu}$ is the CDF of the t-distribution with $\nu$ degrees of freedom.
Degrees of Freedom¶
Assuming equal variances (SciPy default):
$$ \nu = n_1 + n_2 - 2 $$
Welch's approximation (used when equal_var=False) adjusts this downward when the two groups have unequal variance:
$$ \nu \approx \frac{\left(\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}\right)^2}{\dfrac{\left(\dfrac{s_1^2}{n_1}\right)^2}{n_1-1} + \dfrac{\left(\dfrac{s_2^2}{n_2}\right)^2}{n_2-1}} $$
Result¶
- $t = 3.804$
- $p = 0.0002504$
If handwashing had no real effect, there is a 0.025% chance of observing a difference this large. Since $p \ll 0.01$, we reject the null hypothesis at the 99% confidence level. The drop in death rate is not noise — it is statistically significant.
Conclusions¶
The data makes a compelling case:
| Metric | Before June 1846 | After June 1846 |
|---|---|---|
| Avg. monthly death rate | ~10.5% | ~5.0% |
| Absolute reduction | — | ~5.5 pp |
| Relative reduction | — | ~52% |
| p-value (t-test) | — | 0.00025 |
The handwashing policy introduced in June 1846 roughly halved the average monthly death rate. The t-test rules out chance at the 99% confidence level — this is not a statistical artefact.
What makes this finding historically remarkable is the structural evidence that predates the intervention. Clinic 1, staffed by doctors who also performed autopsies, had a death rate roughly three times higher than Clinic 2, staffed by midwives with no autopsy contact. That difference — visible in the annual data years before handwashing began — is the analytical key. It isolates the contamination hypothesis without a controlled experiment.
Semmelweis couldn't explain why it worked. Germ theory was still decades away. But the numbers were unambiguous, and they remain so today.