Setup and Context¶

Introduction¶

On November 27, 1895, Alfred Nobel signed his last will in Paris. When it was opened after his death, the will caused a lot of controversy, as Nobel had left much of his wealth for the establishment of a prize.

Alfred Nobel dictates that his entire remaining estate should be used to endow “prizes to those who, during the preceding year, have conferred the greatest benefit to humankind”.

Every year the Nobel Prize is given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace.

No description has been provided for this image

Let's see what patterns we can find in the data of the past Nobel laureates. What can we learn about the Nobel prize and our world more generally?

Upgrade plotly (only Google Colab Notebook)¶

Google Colab may not be running the latest version of plotly. If you're working in Google Colab, uncomment the line below, run the cell, and restart your notebook server.

Requirement already satisfied: plotly in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (6.5.2)
Requirement already satisfied: narwhals>=1.15.1 in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (from plotly) (2.15.0)
Requirement already satisfied: packaging in /opt/anaconda3/envs/safety_mark1/lib/python3.12/site-packages (from plotly) (24.2)
Note: you may need to restart the kernel to use updated packages.

Import Statements¶

Notebook Presentation¶

Read the Data¶

year category prize motivation prize_share laureate_type full_name birth_date birth_city birth_country birth_country_current sex organization_name organization_city organization_country ISO
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Netherlands Male Berlin University Berlin Germany NLD
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 Individual Sully Prudhomme 1839-03-16 Paris France France Male NaN NaN NaN FRA

Caveats: The exact birth dates for Michael Houghton, Venkatraman Ramakrishnan, and Nadia Murad are unknown. I've substituted them with mid-year estimate of July 2nd.

Dataset Dimensions and Time Range¶

The dataset contains 962 records across 16 columns, spanning 1901 to 2023. Each row represents one prize share to one laureate or organisation.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 962 entries, 0 to 961
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   year                   962 non-null    int64 
 1   category               962 non-null    object
 2   prize                  962 non-null    object
 3   motivation             874 non-null    object
 4   prize_share            962 non-null    object
 5   laureate_type          962 non-null    object
 6   full_name              962 non-null    object
 7   birth_date             934 non-null    object
 8   birth_city             931 non-null    object
 9   birth_country          934 non-null    object
 10  birth_country_current  934 non-null    object
 11  sex                    934 non-null    object
 12  organization_name      707 non-null    object
 13  organization_city      707 non-null    object
 14  organization_country   708 non-null    object
 15  ISO                    934 non-null    object
dtypes: int64(1), object(15)
memory usage: 120.4+ KB

Completeness Assessment¶

Organisation laureates (primarily Peace and Literature) account for most NaN values in birth_date and sex. Institution fields are empty where laureates worked independently or affiliation data is unavailable.

Check for Duplicates¶

np.False_

Check for NaN Values¶

np.True_
year                       0
category                   0
prize                      0
prize_share                0
laureate_type              0
full_name                  0
birth_date                28
birth_country             28
birth_country_current     28
sex                       28
ISO                       28
birth_city                31
motivation                88
organization_country     254
organization_name        255
organization_city        255
dtype: int64
full_name birth_date
24 Institut de droit international (Institute of International Law) NaN
60 Bureau international permanent de la Paix (Permanent International Peace Bureau) NaN
89 Comité international de la Croix Rouge (International Committee of the Red Cross) NaN
200 Office international Nansen pour les Réfugiés (Nansen International Office for Refugees) NaN
215 Comité international de la Croix Rouge (International Committee of the Red Cross) NaN
year category full_name birth_city birth_country birth_country_current
725 2001 Literature Sir Vidiadhar Surajprasad Naipaul NaN Trinidad Trinidad
837 2010 Peace Liu Xiaobo NaN China China
957 2020 Medicine Michael Houghton NaN United Kingdom United Kingdom
full_name category organization_country organization_city organization_name
627 Kary B. Mullis Chemistry United States of America La Jolla, CA NaN
705 Martinus J.G. Veltman Physics Netherlands Bilthoven NaN
721 William S. Knowles Chemistry United States of America St. Louis, MO NaN
777 J. Robin Warren Medicine Australia Perth NaN
831 Richard F. Heck Chemistry United States of America NaN University of Delaware

Type Conversions and Feature Engineering¶

Parsing birth_date to datetime enables age calculations. The prize_share fraction (e.g. 1/2) is converted to a decimal percentage to support yearly share trend analysis.

Convert Year and Birth Date to Datetime¶

The birth_date entries are datatype:
---datetime64[ns]---

Add a Column with the Prize Share as a Percentage¶

year category prize motivation prize_share laureate_type full_name birth_date birth_city birth_country birth_country_current sex organization_name organization_city organization_country ISO
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Netherlands Male Berlin University Berlin Germany NLD
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 Individual Sully Prudhomme 1839-03-16 Paris France France Male NaN NaN NaN FRA
year category prize motivation prize_share laureate_type full_name birth_date birth_city birth_country birth_country_current sex organization_name organization_city organization_country ISO share_pct
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Netherlands Male Berlin University Berlin Germany NLD 100.00

Plotly Donut Chart: Percentage of Male vs. Female Laureates¶

Of the 934 individually tracked prizes, 58 went to women — approximately 6.2%. The donut chart below makes the scale of that disparity immediately visible.

sex
Male      876
Female     58
Name: count, dtype: int64

The First Three Female Nobel Laureates¶

Marie Curie (Physics, 1903), Bertha von Suttner (Peace, 1905), and Selma Lagerlöf (Literature, 1909) were the first three women to win. All three were born in countries whose names and borders have since changed.

year category full_name birth_city birth_country
18 1903 Physics Marie Curie, née Sklodowska Warsaw Russian Empire (Poland)
29 1905 Peace Baroness Bertha Sophie Felicita von Suttner, n... Prague Austrian Empire (Czech Republic)
51 1909 Literature Selma Ottilia Lovisa Lagerlöf Mårbacka Sweden

Six winners have received the Nobel Prize more than once — including Marie Curie (Physics 1903, Chemistry 1911) and Linus Pauling (Chemistry 1954, Peace 1962).

There are 6 winners who were awarded the prize more than once.
year category laureate_type full_name
18 1903 Physics Individual Marie Curie, née Sklodowska
62 1911 Chemistry Individual Marie Curie, née Sklodowska
89 1917 Peace Organization Comité international de la Croix Rouge (Intern...
215 1944 Peace Organization Comité international de la Croix Rouge (Intern...
278 1954 Chemistry Individual Linus Carl Pauling
283 1954 Peace Organization Office of the United Nations High Commissioner...
297 1956 Physics Individual John Bardeen
306 1958 Chemistry Individual Frederick Sanger
340 1962 Peace Individual Linus Carl Pauling
348 1963 Peace Organization Comité international de la Croix Rouge (Intern...
424 1972 Physics Individual John Bardeen
505 1980 Chemistry Individual Frederick Sanger
523 1981 Peace Organization Office of the United Nations High Commissioner...
Comité international de la Croix Rouge (International Committee of the Red Cross) --— Peace (1917), Peace (1944), Peace (1963)
Frederick Sanger --— Chemistry (1958), Chemistry (1980)
John Bardeen --— Physics (1956), Physics (1972)
Linus Carl Pauling --— Chemistry (1954), Peace (1962)
Marie Curie, née Sklodowska --— Physics (1903), Chemistry (1911)
Office of the United Nations High Commissioner for Refugees (UNHCR) --— Peace (1954), Peace (1981)

Prizes are awarded across six categories. Medicine leads with 222 prizes; Economics, introduced in 1969, has the fewest at 86.

6
category
Medicine      222
Physics       216
Chemistry     186
Peace         135
Literature    117
Economics      86
Name: count, dtype: int64

The Economics Prize — Awarded from 1969¶

Unlike the original five categories dating to 1901, the Economics prize was first awarded in 1969 to Jan Tinbergen and Ragnar Frisch.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 962 entries, 0 to 961
Data columns (total 17 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   year                   962 non-null    int64         
 1   category               962 non-null    object        
 2   prize                  962 non-null    object        
 3   motivation             874 non-null    object        
 4   prize_share            962 non-null    object        
 5   laureate_type          962 non-null    object        
 6   full_name              962 non-null    object        
 7   birth_date             934 non-null    datetime64[ns]
 8   birth_city             931 non-null    object        
 9   birth_country          934 non-null    object        
 10  birth_country_current  934 non-null    object        
 11  sex                    934 non-null    object        
 12  organization_name      707 non-null    object        
 13  organization_city      707 non-null    object        
 14  organization_country   708 non-null    object        
 15  ISO                    934 non-null    object        
 16  share_pct              962 non-null    float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(14)
memory usage: 127.9+ KB
category year full_name
393 Economics 1969 Jan Tinbergen
394 Economics 1969 Ragnar Frisch
402 Economics 1970 Paul A. Samuelson

Peace and Literature show the highest proportion of female laureates. Physics has the lowest: only 4 women among 216 prizes.

category sex prize
11 Physics Male 212
7 Medicine Male 210
1 Chemistry Male 179
5 Literature Male 101
9 Peace Male 90
3 Economics Male 84
8 Peace Female 17
4 Literature Female 16
6 Medicine Female 12
0 Chemistry Female 7
10 Physics Female 4
2 Economics Female 2

Prize counts dipped during both World Wars. The rolling average reveals a steady increase from the 1950s onward, partly because more prizes are now shared among multiple recipients.

No description has been provided for this image

The secondary axis (inverted) plots the 5-year rolling average of prize share percentage. As annual prize counts rise, the average share per laureate falls — prizes are being split more widely.

No description has been provided for this image

The Countries with the Most Nobel Prizes¶

Birth country at current borders is used rather than country at time of birth, reducing distortion from historical boundary changes. The United States leads with 281 prizes — more than twice the UK's 105.

birth_country_current prize
7 Belgium 9
31 Hungary 9
33 India 9
2 Australia 10
20 Denmark 12
54 Norway 12
13 China 12
51 Netherlands 18
3 Austria 18
39 Italy 19
68 Switzerland 19
11 Canada 20
61 Russia 26
40 Japan 27
57 Poland 27
67 Sweden 29
25 France 57
26 Germany 84
73 United Kingdom 105
74 United States of America 281

Use a Choropleth Map to Show the Number of Prizes Won by Country¶

  • Create this choropleth map using the plotly documentation:
No description has been provided for this image
  • Experiment with plotly's available colours. I quite like the sequential colour matter on this map.

Hint: You'll need to use a 3 letter country code for each country.

birth_country_current ISO prize
74 United States of America USA 281
73 United Kingdom GBR 105
26 Germany DEU 84
25 France FRA 57
67 Sweden SWE 29

Breaking down country totals by category reveals distinct national specialisms. Germany and Japan lag the US most sharply in Economics. France leads Germany in Literature and Peace.

birth_country_current category prize
204 United States of America Medicine 78
206 United States of America Physics 70
201 United States of America Chemistry 55
202 United States of America Economics 49
198 United Kingdom Medicine 28
195 United Kingdom Chemistry 27
76 Germany Physics 26
71 Germany Chemistry 26
200 United Kingdom Physics 24
205 United States of America Peace 19
birth_country_current category cat_prize total_prize
109 India Physics 1 9
77 Hungary Medicine 2 9
71 Hungary Physics 2 9
68 India Medicine 2 9
67 India Literature 2 9

Number of Prizes Won by Each Country Over Time¶

  • When did the United States eclipse every other country in terms of the number of prizes won?
  • Which country or countries were leading previously?
  • Calculate the cumulative number of prizes won by each country in every year. Again, use the birth_country_current of the winner to calculate this.
  • Create a plotly line chart where each country is a coloured line.

The top institutions are heavily US-dominated. Harvard University and the University of Chicago are the two leading affiliated organisations by laureate count.

New York stands out as the top research hub among all cities where prize-winning research took place.

New York is the most common birth city in the dataset. Among the top five cities, three are in the United States.

The sunburst chart reveals that US research clusters are especially concentrated in specific cities — particularly Boston and New York. Germany's and France's contributions are more spread across institutions.

The laureate's age in the year of the ceremony is computed from birth_date and year and stored as winning_age.

Index(['year', 'category', 'prize', 'motivation', 'prize_share',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'birth_country_current', 'sex', 'organization_name',
       'organization_city', 'organization_country', 'ISO', 'share_pct',
       'winning_age'],
      dtype='object')
year category full_name winning_age
0 1901 Chemistry Jacobus Henricus van 't Hoff 49.00
1 1901 Literature Sully Prudhomme 62.00
2 1901 Medicine Emil Adolf von Behring 47.00
3 1901 Peace Frédéric Passy 79.00
4 1901 Peace Jean Henry Dunant 73.00

Malala Yousafzai is the youngest winner at 17 (Peace, 2014); John Goodenough the oldest at 97 (Chemistry, 2019). The average is approximately 60, with 75% of laureates under 69.

year category full_name winning_age
937 2019 Chemistry John Goodenough 97.00
year category full_name winning_age
885 2014 Peace Malala Yousafzai 17.00

Descriptive Statistics for the Laureate Age at Time of Award¶

  • Calculate the descriptive statistics for the age at the time of the award.
  • Then visualise the distribution in the form of a histogram using Seaborn's .histplot() function.
  • Experiment with the bin size. Try 10, 20, 30, and 50.
count   934.00
mean     59.95
std      12.62
min      17.00
25%      51.00
50%      60.00
75%      69.00
max      97.00
Name: winning_age, dtype: float64
No description has been provided for this image

Age Trend Over Time¶

The LOWESS trend line shows winning ages around 55–57 in the early 20th century trending toward 65–70 today — consistent with prizes recognising work done decades before the award.

No description has been provided for this image

Winning Age Across the Nobel Prize Categories¶

How does the age of laureates vary by category?

  • Use Seaborn's .boxplot() to show how the mean, quartiles, max, and minimum values vary across categories. Which category has the longest "whiskers"?
  • In which prize category are the average winners the oldest?
  • In which prize category are the average winners the youngest?
No description has been provided for this image

Age Trends by Category¶

Separate panels per category reveal diverging trends. Some categories show ages rising sharply; Peace has the widest spread. The combined chart with hue makes cross-category comparison direct.

No description has been provided for this image
No description has been provided for this image

Key Findings¶

  • US dominance: 281 prizes (29% of all individual prizes) — more than 2× the UK's 105 and more than 3× Germany's 84.
  • Gender gap: 58 of 934 individually tracked prizes went to women (6.2%). Physics has the fewest female laureates: 4 out of 216.
  • Repeat winners: 6 winners received the prize more than once. Marie Curie is the only person to have won in two different scientific categories (Physics 1903, Chemistry 1911).
  • Prize sharing: The average prize share per laureate has declined over time as prizes are increasingly split among multiple recipients.
  • Laureate age: Mean winning age is ~60 years. The youngest winner was Malala Yousafzai at 17 (Peace, 2014); the oldest was John Goodenough at 97 (Chemistry, 2019).
  • Age trend: The LOWESS regression shows winning ages have risen by roughly 8–10 years over the full 120-year span, consistent with prizes recognising earlier work.
  • Economics: Newest category, first awarded 1969. Has the fewest prizes (86) and the highest proportion of US winners.
  • World Wars: Both conflicts produced visible dips in the annual prize count, clearest in the 5-year rolling average.
  • Research geography: The United States, United Kingdom, and Germany together account for the majority of affiliated institution records. Harvard and University of Chicago lead by laureate count.