Time to make a graph. First we’re going to replicate this one:
First, we have to load up the four R libraries we will need. You need to do this every time you open R! We also have to load our fonts up. I think technically you can skip loadfonts but it was behaving erratically for me so might as well do it.
library(ggplot2)
library(grid)
library(nyloncalc)
library(extrafont)
loadfonts()
The data is in threes_by_year.csv, so first load it up and take a look at how it’s formatted:
So that I don’t have to use the dollar sign constantly, I’m going to attach data. Then I’m going to set the size of my plotting region. For the moment, all graphs need to be 8x6, which we set using dev.new(). This tells R to open a new plotting region of a specific size (I am working on the whole 8x6 thing but at the moment, if you deviate from that size it will cause problems for the Nylon Calculus badge).
attach(data)
dev.new(width=8,height=6)
A blank window will open. Just ignore that for now. We're going to 'build up' our plot. By that I just mean that we are going to create a plot object, and then we're going to make gradual modifications to it until we have it the way we want. The first step is to tell R what your Y and X axes are going to be.
plot <- ggplot(data,aes(x=year,y=threes))
This creates the plot object, tells ggplot that we are going to be using columns in the data dataframe, and then tells ggplot what those columns are. aes stands for aesthetic and this has never been the most intuitive thing about ggplot for me but our interaction with it is going to be pretty limited. Here, I tell ggplot that part of the graphic aesthetic is that the x-axis is year and the y-axis is threes (R cares about capitalizations by the way, notice that in our first tutorial dataframe year was capitalized, and now it's not). Now we have a graph but there's basically nothing in it. Before we can see anything, we have to tell ggplot what kind of plot this is. In ggplot this is done using geoms. In this case, we want geom_line:plot <- plot + geom_line(size=1.2,alpha=.8,colour="cyan")
So if you look at that command, what we're doing is overwriting the plot object with the plot object + this new thing. The new thing is a cyan colored line (yes you have to spell it colour) plot with a line width of 1.2 and a transparency of 0.8. You can experiment for yourself with putting nothing in the parenthesis, you'll just get a thinner black line. Ok, let's plot what we've got:plot
Pretty simple! Your dev window should now have something in it:
plot <- plot + theme_nyloncalc()
plot <- plot + ggtitle("Increasing Reliance on the 3-pointer")
plot <- plot + ylab("% of Shots That are 3s")
plot <- plot + xlab("Year")
plot <- plot + scale_x_continuous(breaks=seq(1996,2013,2))
plot <- plot + scale_y_continuous(breaks=seq(16,30,2))
x_min = ggplot_build(plot)$panel$ranges[[1]]$x.range[1]
x_max = ggplot_build(plot)$panel$ranges[[1]]$x.range[2]
y_min = ggplot_build(plot)$panel$ranges[[1]]$y.range[1]
y_max = ggplot_build(plot)$panel$ranges[[1]]$y.range[2]
domain = x_max - x_min
range = y_max - y_min
plot <- plot + annotate("rect",xmin=x_max-.32*(domain),xmax=x_max,ymin=y_min,ymax=y_min+(range*.07),alpha=.8)
plot <- plot + annotate("text",x=((x_max-.32*(domain))+x_max)/2,y=((y_min+(range*.08))+y_min)/2,label="Nylon Calculus",colour="#ffffff",family="Chalk Line Outline",size=4.3,vjust=.5,hjust=.5)
ggsave(filename="/Users/austinc/Desktop/nyloncalc_line.png")
Here it is all together, in window:Let's replicate this graph:
First, load up the data in the percomparison.csv file:
This data illustrates something very important about how your data should be formatted: every row should be a single x,y coordinate. If you were just putting this data in a spreadsheet, you might have a row for Tim Duncan, and then columns for his first year, second year, third year, and so on. But for our purposes, it's best to have each row representing just a single data point on the graph. Notice that I have a 'grouping' variable here, to tell ggplot that we want a line for each player. ggplot will also automatically create a key for me based on this column (the player column), as we'll see in a second.
As before, I'm going to attach the dataset and set up the plotting window.
attach(data)
dev.new(width=8,height=6)
If your session has been open for a while and you've reused the data object, you may start getting these warning messages from attach() telling you that you are masking other objects. You can ignore these. I start building my graph the same way with one critical difference. I'm going to add a colour parameter to my graphing aesthetic and set that colour parameter equal to player. That's because I want each player to have a different colored line. I don't have to change my geom_line at all - it already knows I want grouped lines thanks to the colour aesthetic. And of course I won't set a colour for the line this time, since that's already set in the aesthetic.
plot <- ggplot(data,aes(x=Year,y=PER,colour=Player))
plot <- plot + geom_line(size=1.2,alpha=.8)
Now to set the theme and label my axes:
plot <- plot + theme_nyloncalc()
plot <- plot + ggtitle("Player PER Over Career")
plot <- plot + ylab("Player Efficiency Rating")
plot <- plot + xlab("Years Since Draft")
plot <- plot + scale_colour_brewer(palette="Dark2")
And now the exact same code adds the Nylon Calculus badge:
x_min = ggplot_build(plot)$panel$ranges[[1]]$x.range[1]
x_max = ggplot_build(plot)$panel$ranges[[1]]$x.range[2]
y_min = ggplot_build(plot)$panel$ranges[[1]]$y.range[1]
y_max = ggplot_build(plot)$panel$ranges[[1]]$y.range[2]
domain = x_max - x_min
range = y_max - y_min
plot <- plot + annotate("rect",xmin=x_max-.32*(domain),xmax=x_max,ymin=y_min,ymax=y_min+(range*.07),alpha=.8)
plot <- plot + annotate("text",x=((x_max-.32*(domain))+x_max)/2,y=((y_min+(range*.08))+y_min)/2,label="Nylon Calculus",colour="#ffffff",family="Chalk Line Outline",size=4.3,vjust=.5,hjust=.5)
At this point you can type plot and hit enter to see the graph, or you can just save it directly using ggsave:
ggsave(filename="/Users/austinc/Desktop/groupedlines.png")
Next up, this graph I used in an early NC article about forcing midrange shots:
The data for this one is midshots.csv. Load the data up:
Attach the data and open a new plotting window:
attach(data)
dev.new(width=8,height=6)
And now we can start the plot. For this one, we're going to be using a different type of geom. It's geom_line:
plot<-ggplot(data,aes(x=mid*100,y=drtg,label=Over))
plot<-plot+geom_point(shape=1,alpha=.8)
I'm going to change the breaks a bit (remember you can type plot and hit enter at any time to see what the graph you've built to this point looks like), add labels, and apply the NC theme.
plot <- plot + theme_nyloncalc()
plot <- plot + ggtitle("Teams That Force Midrange Shots Excel Defensively")
plot <- plot + ylab("Defensive Rating")
plot <- plot + xlab("% of Opponent Shots Taken from Midrange")
plot <- plot + scale_x_continuous(breaks=seq(27,38,2))
plot <- plot + scale_y_continuous(breaks=seq(94,110,2))
And throw the badge on there:
x_min = ggplot_build(plot)$panel$ranges[[1]]$x.range[1]
x_max = ggplot_build(plot)$panel$ranges[[1]]$x.range[2]
y_min = ggplot_build(plot)$panel$ranges[[1]]$y.range[1]
y_max = ggplot_build(plot)$panel$ranges[[1]]$y.range[2]
domain = x_max - x_min
range = y_max - y_min
plot <- plot + annotate("rect",xmin=x_max-.32*(domain),xmax=x_max,ymin=y_min,ymax=y_min+(range*.07),alpha=.8)
plot <- plot + annotate("text",x=((x_max-.32*(domain))+x_max)/2,y=((y_min+(range*.08))+y_min)/2,label="Nylon Calculus",colour="#ffffff",family="Chalk Line Outline",size=4.3,vjust=.5,hjust=.5)
Now let's pause and see what we have. When I type plot and hit enter this is what I see:
Not exactly what I promised. We're missing two things: a fit line and team labels. Let's deal with the fit line first because that's really easy. Lines of best fit are just a geom type you add to the graph. In this case I'm going to use geom_smooth, but try replacing it with geom_line and seeing what happens, and give geom_smooth(method="lm") a whirl too.
plot <- plot + geom_smooth()
If you plot that, you can see we're pretty close to a finished product. The last thing is to add team labels. This can be a little tricky. The way to do it is to use geom_text, like this:
plot <- plot + geom_text(size=2.5,family="Gulim")
If you plot that it's pretty messy. Team names are sitting right on top of the points for the team. So let's add a little space:
plot <- plot + geom_text(size=2.5,family="Gulim",vjust=1.7,hjust=.3)
Notice the vjust and hjust commands. These are vertical justification and horizontal justification. Increasing the vertical justification moved the labels down, while increasing the horizontal justification moved them to the left a bit. If you plot it now you should see what I have below.
It's much better, and maybe you could publish this, but the perfectionist in me wants to clean it up further. Unfortunately there's no simple way to do it. What I did was I created two columns in my data, vjust and hjust, and I fiddled with the values for each team, looking at the graph and choosing a vjust and hjust value for that specific team. Then I set vjust and hjust equal to those columns such that each team received its own location coordinates. Here's the final code:
plot <- plot + geom_text(size=2.5,family="Gulim",vjust=data$vjust+.3,hjust=data$hjust+.5)
And now we can save it:
ggsave(filename="/Users/austinc/Desktop/midscatter.png")
Last one for this category. This is a replication of a chart Ian used.
I'll let you inspect the data - it's in fryemagic.csv. You've seen these lines many times now...
attach(data)
dev.new(width=8,height=6)
Just like with our grouped lines example above, we have a grouping variable, player. The x-axis is height of player, and the y-axis is percent of shots that are 3s. Here's the aesthetic and geom:
plot <- ggplot(data,aes(x=Height,y=X3PTA.FGA,colour=group))
plot <- plot + geom_point(shape=1,alpha=.8,size=3)
Labeling my axes and applying the theme...
plot <- plot + theme_nyloncalc()
plot <- plot + ggtitle("Visualizing the Rise of the Stretch 4")
plot <- plot + ylab("3PTA/FGA")
plot <- plot + xlab("Height in Inches")
Throw the badge on:
x_min = ggplot_build(plot)$panel$ranges[[1]]$x.range[1]
x_max = ggplot_build(plot)$panel$ranges[[1]]$x.range[2]
y_min = ggplot_build(plot)$panel$ranges[[1]]$y.range[1]
y_max = ggplot_build(plot)$panel$ranges[[1]]$y.range[2]
domain = x_max - x_min
range = y_max - y_min
plot <- plot + annotate("rect",xmin=x_max-.32*(domain),xmax=x_max,ymin=y_min,ymax=y_min+(range*.07),alpha=.8)
plot <- plot + annotate("text",x=((x_max-.32*(domain))+x_max)/2,y=((y_min+(range*.08))+y_min)/2,label="Nylon Calculus",colour="#ffffff",family="Chalk Line Outline",size=4.3,vjust=.5,hjust=.5)
And we're done! If you plot that you should get the finished product above.
ggsave(filename="/Users/austinc/Desktop/fryespacing.png")