SK Logo

I watch a lot of movies

Jul 8, 2020Last updated: Aug 7, 2020
Data & Viz. in 📚 ref

⚠️ This page is currently under migration. The VEGA charts will not render.

For better or for worse, I have watched over 900 titles. It seems natural to put that data to good use, at least academically. This post is written, in expectation, to mainly serve three purposes -

  1. a way for me to ask as many questions as possible on a simple dataset to keep the creative juices flowing

  2. to keep thinking about informative ways to visualize data ubiquitous around us

  3. a quick reference for Altair, a declarative charting library


IMDb allows users to rate every movie or a TV show on a scale of 1 to 10, restricted to integer ratings. Conveniently, it also collects them into a list which I have made public.

I wrote a tiny web spider using Playwright which collects some basic information - title, release year, genres, ratings (including mine) and total number of votes, into a CSV file. The code and data is available at activatedgeek/imdb-ratings. It is a pretty straightforward set of CSS path selectors. I do some further organization in a Jupyter notebook using Pandas DataFrames to make charting easier.

My rule of thumb to store or organize data is to do it in a format I would design for a typical relational database. All downstream analysis can then be pretty much summarized via operations in relational algebra - Cartesian product, projection, selection, union and difference. More SQL-esque notions would be the operations of table merge and join.


This is a list of questions I've thought of visualizing so far for a qualitative inspection of the statistics.

Do not forget to scroll as some charts may be larger than they appear.

Count based

Movies watched by release year

specUrl: /vega/hist_year.alt.json

Genre heatmap by release year

specUrl: /vega/heatmap_genre.alt.json

Votes based

Distribution of votes

specUrl: /vega/hist_votes.alt.json

Distribution of votes by year

specUrl: /vega/hist_votes_year.alt.json

Ratings based

Distribution over all ratings

specUrl: /vega/ratings.alt.json

Distribution by release year

specUrl: /vega/ratings_year.alt.json

Distribution by genre

specUrl: /vega/ratings_genre.alt.json

© 2021 Sanyam Kapoor