Setup and Context¶
Introduction¶
On November 27, 1895, Alfred Nobel signed his last will in Paris. When it was opened after his death, the will caused a lot of controversy, as Nobel had left much of his wealth for the establishment of a prize.
Alfred Nobel dictates that his entire remaining estate should be used to endow “prizes to those who, during the preceding year, have conferred the greatest benefit to humankind”.
Every year the Nobel Prize is given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace.
Let's see what patterns we can find in the data of the past Nobel laureates. What can we learn about the Nobel prize and our world more generally?
Upgrade plotly (only Google Colab Notebook)¶
Google Colab may not be running the latest version of plotly. If you're working in Google Colab, uncomment the line below, run the cell, and restart your notebook server.
Requirement already satisfied: plotly in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (6.5.2) Requirement already satisfied: narwhals>=1.15.1 in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (from plotly) (2.15.0) Requirement already satisfied: packaging in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (from plotly) (24.2) Note: you may need to restart the kernel to use updated packages.
Import Statements¶
Notebook Presentation¶
Read the Data¶
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Netherlands | Male | Berlin University | Berlin | Germany | NLD |
| 1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
Caveats: The exact birth dates for Michael Houghton, Venkatraman Ramakrishnan, and Nadia Murad are unknown. I've substituted them with mid-year estimate of July 2nd.
Dataset Dimensions and Time Range¶
The dataset contains 962 records across 16 columns, spanning 1901 to 2023. Each row represents one prize share to one laureate or organisation.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 962 entries, 0 to 961 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 962 non-null int64 1 category 962 non-null object 2 prize 962 non-null object 3 motivation 874 non-null object 4 prize_share 962 non-null object 5 laureate_type 962 non-null object 6 full_name 962 non-null object 7 birth_date 934 non-null object 8 birth_city 931 non-null object 9 birth_country 934 non-null object 10 birth_country_current 934 non-null object 11 sex 934 non-null object 12 organization_name 707 non-null object 13 organization_city 707 non-null object 14 organization_country 708 non-null object 15 ISO 934 non-null object dtypes: int64(1), object(15) memory usage: 120.4+ KB
Completeness Assessment¶
Organisation laureates (primarily Peace and Literature) account for most NaN values in birth_date and sex. Institution fields are empty where laureates worked independently or affiliation data is unavailable.
Check for Duplicates¶
np.False_
Check for NaN Values¶
np.True_
year 0 category 0 prize 0 prize_share 0 laureate_type 0 full_name 0 birth_date 28 birth_country 28 birth_country_current 28 sex 28 ISO 28 birth_city 31 motivation 88 organization_country 254 organization_name 255 organization_city 255 dtype: int64
| full_name | birth_date | |
|---|---|---|
| 24 | Institut de droit international (Institute of International Law) | NaN |
| 60 | Bureau international permanent de la Paix (Permanent International Peace Bureau) | NaN |
| 89 | Comité international de la Croix Rouge (International Committee of the Red Cross) | NaN |
| 200 | Office international Nansen pour les Réfugiés (Nansen International Office for Refugees) | NaN |
| 215 | Comité international de la Croix Rouge (International Committee of the Red Cross) | NaN |
| year | category | full_name | birth_city | birth_country | birth_country_current | |
|---|---|---|---|---|---|---|
| 725 | 2001 | Literature | Sir Vidiadhar Surajprasad Naipaul | NaN | Trinidad | Trinidad |
| 837 | 2010 | Peace | Liu Xiaobo | NaN | China | China |
| 957 | 2020 | Medicine | Michael Houghton | NaN | United Kingdom | United Kingdom |
| full_name | category | organization_country | organization_city | organization_name | |
|---|---|---|---|---|---|
| 627 | Kary B. Mullis | Chemistry | United States of America | La Jolla, CA | NaN |
| 705 | Martinus J.G. Veltman | Physics | Netherlands | Bilthoven | NaN |
| 721 | William S. Knowles | Chemistry | United States of America | St. Louis, MO | NaN |
| 777 | J. Robin Warren | Medicine | Australia | Perth | NaN |
| 831 | Richard F. Heck | Chemistry | United States of America | NaN | University of Delaware |
Type Conversions and Feature Engineering¶
Parsing birth_date to datetime enables age calculations. The prize_share fraction (e.g. 1/2) is converted to a decimal percentage to support yearly share trend analysis.
Convert Year and Birth Date to Datetime¶
The birth_date entries are datatype: ---datetime64[ns]---
Add a Column with the Prize Share as a Percentage¶
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Netherlands | Male | Berlin University | Berlin | Germany | NLD |
| 1 | 1901 | Literature | The Nobel Prize in Literature 1901 | "in special recognition of his poetic composit... | 1/1 | Individual | Sully Prudhomme | 1839-03-16 | Paris | France | France | Male | NaN | NaN | NaN | FRA |
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | share_pct | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1901 | Chemistry | The Nobel Prize in Chemistry 1901 | "in recognition of the extraordinary services ... | 1/1 | Individual | Jacobus Henricus van 't Hoff | 1852-08-30 | Rotterdam | Netherlands | Netherlands | Male | Berlin University | Berlin | Germany | NLD | 100.00 |
Plotly Donut Chart: Percentage of Male vs. Female Laureates¶
Of the 934 individually tracked prizes, 58 went to women — approximately 6.2%. The donut chart below makes the scale of that disparity immediately visible.
sex Male 876 Female 58 Name: count, dtype: int64
The First Three Female Nobel Laureates¶
Marie Curie (Physics, 1903), Bertha von Suttner (Peace, 1905), and Selma Lagerlöf (Literature, 1909) were the first three women to win. All three were born in countries whose names and borders have since changed.
| year | category | full_name | birth_city | birth_country | |
|---|---|---|---|---|---|
| 18 | 1903 | Physics | Marie Curie, née Sklodowska | Warsaw | Russian Empire (Poland) |
| 29 | 1905 | Peace | Baroness Bertha Sophie Felicita von Suttner, n... | Prague | Austrian Empire (Czech Republic) |
| 51 | 1909 | Literature | Selma Ottilia Lovisa Lagerlöf | Mårbacka | Sweden |
Six winners have received the Nobel Prize more than once — including Marie Curie (Physics 1903, Chemistry 1911) and Linus Pauling (Chemistry 1954, Peace 1962).
There are 6 winners who were awarded the prize more than once.
| year | category | laureate_type | full_name | |
|---|---|---|---|---|
| 18 | 1903 | Physics | Individual | Marie Curie, née Sklodowska |
| 62 | 1911 | Chemistry | Individual | Marie Curie, née Sklodowska |
| 89 | 1917 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 215 | 1944 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 278 | 1954 | Chemistry | Individual | Linus Carl Pauling |
| 283 | 1954 | Peace | Organization | Office of the United Nations High Commissioner... |
| 297 | 1956 | Physics | Individual | John Bardeen |
| 306 | 1958 | Chemistry | Individual | Frederick Sanger |
| 340 | 1962 | Peace | Individual | Linus Carl Pauling |
| 348 | 1963 | Peace | Organization | Comité international de la Croix Rouge (Intern... |
| 424 | 1972 | Physics | Individual | John Bardeen |
| 505 | 1980 | Chemistry | Individual | Frederick Sanger |
| 523 | 1981 | Peace | Organization | Office of the United Nations High Commissioner... |
Comité international de la Croix Rouge (International Committee of the Red Cross) --— Peace (1917), Peace (1944), Peace (1963) Frederick Sanger --— Chemistry (1958), Chemistry (1980) John Bardeen --— Physics (1956), Physics (1972) Linus Carl Pauling --— Chemistry (1954), Peace (1962) Marie Curie, née Sklodowska --— Physics (1903), Chemistry (1911) Office of the United Nations High Commissioner for Refugees (UNHCR) --— Peace (1954), Peace (1981)
Prizes are awarded across six categories. Medicine leads with 222 prizes; Economics, introduced in 1969, has the fewest at 86.
6
category Medicine 222 Physics 216 Chemistry 186 Peace 135 Literature 117 Economics 86 Name: count, dtype: int64
The Economics Prize — Awarded from 1969¶
Unlike the original five categories dating to 1901, the Economics prize was first awarded in 1969 to Jan Tinbergen and Ragnar Frisch.
<class 'pandas.core.frame.DataFrame'> RangeIndex: 962 entries, 0 to 961 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 962 non-null int64 1 category 962 non-null object 2 prize 962 non-null object 3 motivation 874 non-null object 4 prize_share 962 non-null object 5 laureate_type 962 non-null object 6 full_name 962 non-null object 7 birth_date 934 non-null datetime64[ns] 8 birth_city 931 non-null object 9 birth_country 934 non-null object 10 birth_country_current 934 non-null object 11 sex 934 non-null object 12 organization_name 707 non-null object 13 organization_city 707 non-null object 14 organization_country 708 non-null object 15 ISO 934 non-null object 16 share_pct 962 non-null float64 dtypes: datetime64[ns](1), float64(1), int64(1), object(14) memory usage: 127.9+ KB
| category | year | full_name | |
|---|---|---|---|
| 393 | Economics | 1969 | Jan Tinbergen |
| 394 | Economics | 1969 | Ragnar Frisch |
| 402 | Economics | 1970 | Paul A. Samuelson |
Peace and Literature show the highest proportion of female laureates. Physics has the lowest: only 4 women among 216 prizes.
| category | sex | prize | |
|---|---|---|---|
| 11 | Physics | Male | 212 |
| 7 | Medicine | Male | 210 |
| 1 | Chemistry | Male | 179 |
| 5 | Literature | Male | 101 |
| 9 | Peace | Male | 90 |
| 3 | Economics | Male | 84 |
| 8 | Peace | Female | 17 |
| 4 | Literature | Female | 16 |
| 6 | Medicine | Female | 12 |
| 0 | Chemistry | Female | 7 |
| 10 | Physics | Female | 4 |
| 2 | Economics | Female | 2 |
Prize counts dipped during both World Wars. The rolling average reveals a steady increase from the 1950s onward, partly because more prizes are now shared among multiple recipients.
The secondary axis (inverted) plots the 5-year rolling average of prize share percentage. As annual prize counts rise, the average share per laureate falls — prizes are being split more widely.
The Countries with the Most Nobel Prizes¶
Birth country at current borders is used rather than country at time of birth, reducing distortion from historical boundary changes. The United States leads with 281 prizes — more than twice the UK's 105.
| birth_country_current | prize | |
|---|---|---|
| 7 | Belgium | 9 |
| 31 | Hungary | 9 |
| 33 | India | 9 |
| 2 | Australia | 10 |
| 20 | Denmark | 12 |
| 54 | Norway | 12 |
| 13 | China | 12 |
| 51 | Netherlands | 18 |
| 3 | Austria | 18 |
| 39 | Italy | 19 |
| 68 | Switzerland | 19 |
| 11 | Canada | 20 |
| 61 | Russia | 26 |
| 40 | Japan | 27 |
| 57 | Poland | 27 |
| 67 | Sweden | 29 |
| 25 | France | 57 |
| 26 | Germany | 84 |
| 73 | United Kingdom | 105 |
| 74 | United States of America | 281 |
Use a Choropleth Map to Show the Number of Prizes Won by Country¶
- Create this choropleth map using the plotly documentation:
- Experiment with plotly's available colours. I quite like the sequential colour
matteron this map.
Hint: You'll need to use a 3 letter country code for each country.
| birth_country_current | ISO | prize | |
|---|---|---|---|
| 74 | United States of America | USA | 281 |
| 73 | United Kingdom | GBR | 105 |
| 26 | Germany | DEU | 84 |
| 25 | France | FRA | 57 |
| 67 | Sweden | SWE | 29 |
Breaking down country totals by category reveals distinct national specialisms. Germany and Japan lag the US most sharply in Economics. France leads Germany in Literature and Peace.
| birth_country_current | category | prize | |
|---|---|---|---|
| 204 | United States of America | Medicine | 78 |
| 206 | United States of America | Physics | 70 |
| 201 | United States of America | Chemistry | 55 |
| 202 | United States of America | Economics | 49 |
| 198 | United Kingdom | Medicine | 28 |
| 195 | United Kingdom | Chemistry | 27 |
| 76 | Germany | Physics | 26 |
| 71 | Germany | Chemistry | 26 |
| 200 | United Kingdom | Physics | 24 |
| 205 | United States of America | Peace | 19 |
| birth_country_current | category | cat_prize | total_prize | |
|---|---|---|---|---|
| 109 | India | Physics | 1 | 9 |
| 77 | Hungary | Medicine | 2 | 9 |
| 71 | Hungary | Physics | 2 | 9 |
| 68 | India | Medicine | 2 | 9 |
| 67 | India | Literature | 2 | 9 |
Number of Prizes Won by Each Country Over Time¶
- When did the United States eclipse every other country in terms of the number of prizes won?
- Which country or countries were leading previously?
- Calculate the cumulative number of prizes won by each country in every year. Again, use the
birth_country_currentof the winner to calculate this. - Create a plotly line chart where each country is a coloured line.
The top institutions are heavily US-dominated. Harvard University and the University of Chicago are the two leading affiliated organisations by laureate count.
New York stands out as the top research hub among all cities where prize-winning research took place.
New York is the most common birth city in the dataset. Among the top five cities, three are in the United States.
The sunburst chart reveals that US research clusters are especially concentrated in specific cities — particularly Boston and New York. Germany's and France's contributions are more spread across institutions.
The laureate's age in the year of the ceremony is computed from birth_date and year and stored as winning_age.
Index(['year', 'category', 'prize', 'motivation', 'prize_share',
'laureate_type', 'full_name', 'birth_date', 'birth_city',
'birth_country', 'birth_country_current', 'sex', 'organization_name',
'organization_city', 'organization_country', 'ISO', 'share_pct',
'winning_age'],
dtype='object')
| year | category | full_name | winning_age | |
|---|---|---|---|---|
| 0 | 1901 | Chemistry | Jacobus Henricus van 't Hoff | 49.00 |
| 1 | 1901 | Literature | Sully Prudhomme | 62.00 |
| 2 | 1901 | Medicine | Emil Adolf von Behring | 47.00 |
| 3 | 1901 | Peace | Frédéric Passy | 79.00 |
| 4 | 1901 | Peace | Jean Henry Dunant | 73.00 |
Malala Yousafzai is the youngest winner at 17 (Peace, 2014); John Goodenough the oldest at 97 (Chemistry, 2019). The average is approximately 60, with 75% of laureates under 69.
| year | category | full_name | winning_age | |
|---|---|---|---|---|
| 937 | 2019 | Chemistry | John Goodenough | 97.00 |
| year | category | full_name | winning_age | |
|---|---|---|---|---|
| 885 | 2014 | Peace | Malala Yousafzai | 17.00 |
Descriptive Statistics for the Laureate Age at Time of Award¶
- Calculate the descriptive statistics for the age at the time of the award.
- Then visualise the distribution in the form of a histogram using Seaborn's .histplot() function.
- Experiment with the
binsize. Try 10, 20, 30, and 50.
count 934.00 mean 59.95 std 12.62 min 17.00 25% 51.00 50% 60.00 75% 69.00 max 97.00 Name: winning_age, dtype: float64
Age Trend Over Time¶
The LOWESS trend line shows winning ages around 55–57 in the early 20th century trending toward 65–70 today — consistent with prizes recognising work done decades before the award.
Winning Age Across the Nobel Prize Categories¶
How does the age of laureates vary by category?
- Use Seaborn's
.boxplot()to show how the mean, quartiles, max, and minimum values vary across categories. Which category has the longest "whiskers"? - In which prize category are the average winners the oldest?
- In which prize category are the average winners the youngest?
Age Trends by Category¶
Separate panels per category reveal diverging trends. Some categories show ages rising sharply; Peace has the widest spread. The combined chart with hue makes cross-category comparison direct.
Key Findings¶
- US dominance: 281 prizes (29% of all individual prizes) — more than 2× the UK's 105 and more than 3× Germany's 84.
- Gender gap: 58 of 934 individually tracked prizes went to women (6.2%). Physics has the fewest female laureates: 4 out of 216.
- Repeat winners: 6 winners received the prize more than once. Marie Curie is the only person to have won in two different scientific categories (Physics 1903, Chemistry 1911).
- Prize sharing: The average prize share per laureate has declined over time as prizes are increasingly split among multiple recipients.
- Laureate age: Mean winning age is ~60 years. The youngest winner was Malala Yousafzai at 17 (Peace, 2014); the oldest was John Goodenough at 97 (Chemistry, 2019).
- Age trend: The LOWESS regression shows winning ages have risen by roughly 8–10 years over the full 120-year span, consistent with prizes recognising earlier work.
- Economics: Newest category, first awarded 1969. Has the fewest prizes (86) and the highest proportion of US winners.
- World Wars: Both conflicts produced visible dips in the annual prize count, clearest in the 5-year rolling average.
- Research geography: The United States, United Kingdom, and Germany together account for the majority of affiliated institution records. Harvard and University of Chicago lead by laureate count.