Open data resources spanning multiple areas of science
I’m interested in looking at data sets that span a wide range but not limited to social, climate, energy, sports, and healthcare.
Some of the useful data sources I’ve looked at to understand the world around me better are listed below in no particular order:
-
The world bank provides a data catalog that is very versatile and searchable. Specifically, this URL is useful to download a particular data set and analyze it with R / Python.
The R package wbstats provides an interface to access the data.
-
Air quality life index measures how much longer we can live if we were to breathe clean air. The data here is fine-tuned to the regional level.
-
The national oceanic and atmospheric administration provides free access to archives of global historical weather and climate data.
-
The dataset is the World Health Organization’s principal health statistics repository. It contains statistics for more than 1000 indicators for 194 member states of the WHO.
-
The datasets by FiveThirtyEight range from sports to politics. Some of the data is well-curated as CSV files and ready to use. I specifically like their polling data and soccer data.
-
The UN Data brings a variety of data resources compiled by the UN. They cover a vast range of statistical themes from agriculture to trade grouped by country.
-
The happy planet index is another unique dataset which measures “sustainable wellbeing” by country. I was surprised to find that Costa Rica was 1st on the HPI.
Soccer Power Index(SPI) - FiveThirtyEight
The FiveThirtyEight SPI is powered mainly by the R package, engsoccerdata. The SPI informs the world of soccer with predictions and is used by many platforms like ESPN.
-
The “living while black” dataset was collected by Baratunde Thurston. He talks about this topic in a fantastic TED talk and I encourage everyone to visit, https://www.baratunde.com/livingwhileblack. This is a very unique dataset.
-
The Gapminder dataset accompanies the fantastic book Factfulness. The goal of the gapminder dataset and the book is to provide a more fact-based view of the world. The overdramatized picture, as shown on media, does not particularly help our world view.