<-- home

Starry Night Plots

I think good data science reads like a good story. In that it flows. Has an arc. And is compelling.

But data science has a dirty secret. For every piece that works, there are about nine others that didn’t. Nine other stories that look like they were typed up by a monkey with a typewriter (and in a finite amount of time!)

The problem with this problem is that unless you did the work you never get to see the monkey scripts. The stuff that never worked. The stuff that got thrown out.

This creates, I think, unrealistic expectations about how data science gets done. Because every tutorial, textbook, lesson, workshop, talk, and blog post about data science just works. And we forget that this stuff has been designed and vetted to work. Most real world stuff doesn’t just work…

I initially wanted this post to be about the Super Bowl. To build some simple toy model on “Points For/Against” and push it through a new plot that I’ve been working on. But nothing was working. The data and my toy model told a story that literally made zero sense.

But then it struck me. This is perfect. I was able to catch and diagnose a garbage model exactly because of the plot I wanted to talk about!

So. Here’s my monkey script and the “Starry Night Plot” that helped me throw it out:




df <- read_csv(paste0(

These data are scraped from pro-football-reference.com with purrr and rvest. If you want to learn more about webscraping within the tidverse I have a quick post on the process here.


I was trying to build a toy model to predict who might win Super Bowl LI. I don’t know much about the NFL, but I thought that I might be able to do something with “Season Average Points For” and “Season Average Points Against”. Basically, I wanted to tease out the relative importance of Defense and Offense for past Super Bowl contenders:

mod1 <- glm(
    win ~ pf + pa,
    family = "binomial",
    data = df)

# summary(mod1)

If you run summary(mod1) you can see that the p-values aren’t traditionally “significant”, however, pf hits 0.1 which gets you a little . if you’re “star-gazing” the printout.

Pumping the model through a prediction grid of all likely/possible points combinations:

pred_grid <- expand.grid(
    pf = 0:40, 
    pa = 0:40)

pred_grid$prob <- predict(
    mod1, newdata = pred_grid, type = "response")

And the putting that grid through a Tile Plot gets you:

pred_grid %>% 
    ggplot(aes(x = pf, y = pa)) + 
    geom_tile(aes(fill = prob))


Sweet. Looks alright. But what does it mean? It’s kind of hard to tell…

To improve the interpretability of geom_tile I added a contour layer and made the breaks hit the same sequence.

br <- 0.2

pred_grid %>% 
    ggplot(aes(x = pf, y = pa)) + 
    geom_tile(aes(fill = prob)) + 
    geom_contour(aes(z = prob), color = "white", 
        lty = 2, binwidth = br) + 
        breaks = seq(0, 1, br),
        labels = scales::percent_format())


Now I can see that my dumb toy model thinks that “Points For” is more predictive than “Points Against”. And that “Season Average Points For” is a bad thing when it comes to winning the Super Bowl.

Basically, if I put Atlanta and New England through this model the better team on paper would be expected to lose the Super Bowl (lol).

I wanted to figure out why my toy model was behaving so strangely, so I added in the underlying data through a couple of geom_point layers (Os for actual wins, Xs for actual losses) and a bit of polish:

br <- 0.2

pred_grid %>% 
    ggplot(aes(x = pf, y = pa)) + 
    geom_tile(aes(fill = prob)) + 
    geom_contour(aes(z = prob), 
        color = "white", lty = 2, binwidth = br) +
        data = (df %>% filter(win == 1)),
        color = "white", shape = 16, size = 2) +
        data = (df %>% filter(win == 0)),
        color = "white", shape = 4, size = 2) +
        direction = 1, end = 0.85, option = "D", 
        breaks = seq(0, 1, br),
        labels = scales::percent_format()) + 
    coord_cartesian(xlim = c(10, 40), ylim = c(10, 40)) + 
    labs(title = "Super Dumb Super Bowl Model", 
        x = "Points For", y = "Points Against", fill = "Prob") + 
    theme(panel.background = element_rect(fill = "white"))


The behaviour of this model suddenly makes some sense (small data, and big upsets).

There’s nothing really new about this type of plot. It’s just a bunch of tile, contour and point layers and the viridis colourmap but I’ve been calling it the “Starry Night Plot” because it kind of looks like the painting. And I wanted to write about it because it has been super helpful in diagnosing and explaining model behaviour to other people.

I guess I hope you might be able to use it.


P.S. I also tried building models on point differentials for the exact matchups and conditioning the points for/against on Strength of Schedule… but then I realized that I was p-hacking…

P.S.S. Go Falcons.