+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization with ggplot2

CSTEP R course

Meenakshi Kushwaha, July 28, 2022

1 / 55

Grammar of Graphics

  • First published in 1999

    • Foundation for many graphic applications
  • Grammar can be applied to every type of plot

  • Concisely describe components

  • Construct and deconstruct

2 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

3 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Your dataset

  • Tidy format

4 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • This is how we tell R which variables we want to plot

  • Aesthetics mapping
    Links variable in the data to graphical properties

  • Facets mapping
    Links variable in data to panels in the plot layout

5 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Even tidy data may need some transformation

  • Transform input variables to displayed values

    • Bins for histogram
    • Summary statistics for boxplot
    • No. of observations in a category for bar chart
  • Implicit in many plot types

6 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Help you interpret the plot

    • Categories -> color
    • Numeric -> position
  • Automatically generated in ggplot and can be customized

    • log scale
    • time series
7 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Aesthetics as graphical repersentations

  • Determines your plot type

    • bar chart
    • scatter
    • boxplot
    • ...
8 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Divide your data into panels using one or two groups

  • Allows you to look at smaller subsets of data

9 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Positions are interpreted by the coordinate system

  • Defines the physical mapping of the aesthetics

10 / 55

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Overall look of the plot

  • Spans every part of the graphic that is not linked to the data

    • "non-data ink"
11 / 55

12 / 55

Getting Started

  • Load the tidyverse package
library(tidyverse)
  • If this is your first time you may have to install it first
install.packages("tidyverse")
library(tidyverse)
13 / 55

Do cars with big engines use more fuel than cars with small engines?

14 / 55

Data set mpg

Observations collected by US EPA on 38 models of cars

head(ggplot2::mpg)
# A tibble: 6 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
  • displ : car's engine size

  • hwy : car's fuel efficiency on the highway in miles per gallon

  • type ?mpg to learn more about the dataset

15 / 55

a car with low fuel efficiency consumes more fuel than a car with high fuel efficiency for the same distance

Your first ggplot

ggplot(data=mpg)

15 / 55

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)

15 / 55

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)

15 / 55

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

15 / 55

What did we need?

Source: ggplot2 workshop by @thomasp85

16 / 55

What did we need?

Source: ggplot2 workshop by @thomasp85

All other components use defaults

16 / 55

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

17 / 55

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))

17 / 55

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))

17 / 55

Use of + to link code instead of %>%

ggplot was built before %>% was introduced

mpg %>%
ggplot()+
aes(x=displ)+
aes(y=hwy)+
geom_point()

is same as

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()
18 / 55

Common Problems

  • Make sure that every ( is matched with a )

  • Make sure that every " is paired with another "

  • Make sure you use + and not %>%

  • Make sure that + is in the right place: it has to come at the end of the line, not the start. The following code will not work

ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))
  • Look for help by typing ?function_name

    • scroll down to examples
  • Look at the error message

    • try googling the error message
19 / 55

As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work! Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference.

Quiz

What is the minimum requirement to make a plot using ggplot()?

  1. data, scales, theme
  2. data, facets, geometries
  3. data, statistics, coordinates
  4. data, mapping, geometries
20 / 55

Quiz

Which of this is correct syntax for ggplot()?

a) ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))

b) ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

c) ggplot(data = mpg) %>%
geom_point(mapping = aes(x = displ, y = hwy))

d) ggplot(data = mpg)
geom_point(mapping = aes(x = displ, y = hwy))

21 / 55

Let's look at the plot again

22 / 55

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

22 / 55

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
aes(color=class)+
geom_point()

22 / 55

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

22 / 55

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
aes(shape=class)+
geom_point()

22 / 55

Aesthetics

Setting the properties of geom manually

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

23 / 55

Aesthetics

Setting the properties of geom manually

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

Here, the color "blue" doesn’t convey information about a variable, but only changes the appearance of the plot

23 / 55

Aesthetics

To set a geometric property manually, place it outside of aes()

  • The name of a color as a character string

  • The size of a point in mm

  • The shape of a point as a number

24 / 55

Aesthetics

To set a geometric property manually, place it outside of aes()

  • The name of a color as a character string

  • The size of a point in mm

  • The shape of a point as a number

R has 25 built in shapes that are identified by numbers

24 / 55

Aesthetics

Remember aesthetics depend on geometry...

25 / 55

Geometric Objects

Both plots have the same x and y axes but use different geoms or geometries

26 / 55

Geometric Objects

Both plots have the same x and y axes but use different geoms or geometries

26 / 55

Plots are often described as their geoms as boxplots, line plots, etc. often described as their geoms as boxplots, line plots, etc.

Geometric objects

ggplot(data = mpg)

26 / 55

Geometric objects

ggplot(data = mpg) +
geom_smooth(mapping = aes(
x = displ,
y = hwy,
linetype = drv))

26 / 55

Mulitple geoms

ggplot(data = mpg)

26 / 55

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

26 / 55

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
geom_smooth(mapping = aes(
x = displ,
y = hwy))

26 / 55

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
geom_smooth(mapping = aes(
x = displ,
y = hwy))

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy))

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point()

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point() +
geom_smooth()

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point() +
geom_smooth()

26 / 55

If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy))

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class))

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth()

26 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth()

26 / 55

Where to place aes()

  • If aes() function is placed inside ggplot(), the same aes is used for all layers

  • If aes() is placed outside ggplot() function then its definition is used for the specific layer

  • Multiple aes() can be defined for multiple geometries within the same plot

27 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy))

27 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class))

27 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth(data =
filter(mpg,
class == "suv"),
se = FALSE)

27 / 55

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth(data =
filter(mpg,
class == "suv"),
se = FALSE)

27 / 55

Exercises

28 / 55

Quiz

Which of the following is NOT true?

  1. If aes() function is placed inside ggplot(), the same aes is used for all layers

  2. If aes() is placed outside ggplot() function then its definition is used for the specific layer

  3. Multiple aes() can be defined for multiple geometries within the same plot

  4. Variables or columns to be plotted can be placed outside aes()

29 / 55

Statistical Transformations

  • Linked to geometries

  • Every geom has a default stat and vice versa

  • Can use geom_*() and stat_()* interchangeably but former is more common

30 / 55

Statistical Transformations

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))

31 / 55

Statistical Transformations

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))

Where does count on y-axis come from?

31 / 55

Statistical Transformations

Some plots calculate new values from the data

  • Bar charts and histograms

  • smoothing functions

  • boxplots

32 / 55

Statistical Transformations

Some plots calculate new values from the data

  • Bar charts and histograms

  • smoothing functions

  • boxplots

The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation

32 / 55

Statistical Transformations

You can find out which stat each geom uses by looking at the default value of the stat argument of the help page.

What it the default stat for geom_bar?

33 / 55

Statistical Transformations

  • Overriding default options
  • Here, display bar chart of proportions instead of count
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop),
group = 1))

34 / 55

Scales

Source: ggplot2 workshop by @thomasp85

  • Everything inside aes() will have a scale by default

  • scale_<aesthetic>_<type>()

  • <type> can either be a generic (continuous, discrete, or binned) or specific (e.g. area, for scaling size to circle area)

35 / 55

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, colour = class))

36 / 55

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, colour = class))+
scale_colour_brewer(type = 'qual')

37 / 55

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy)) +
scale_x_continuous(breaks = c(3, 5, 6)) +
scale_y_continuous(trans = 'log10')

38 / 55

Facets

Source: ggplot2 workshop by @thomasp85

  • Split data into multiple panels

  • Another way to add additional variable

  • Useful for categorical variables

  • Facet by a single variable facet_wrap()

  • Facet by two variables facet_grid()

39 / 55

Facets

ggplot(data = mpg)

39 / 55

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

39 / 55

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
facet_wrap(~ class, nrow = 2)

39 / 55

Facets

ggplot(data = mpg)

39 / 55

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

39 / 55

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
facet_grid(drv ~ cyl)

39 / 55

Exercises

40 / 55

Coordinates

Source: ggplot2 workshop by @thomasp85

  • Defining your plot canvas

    • How should x and y be interpreted?
  • Default is the Cartesian coordinate system

  • Useful for spatial data (map projections)

41 / 55

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy))

41 / 55

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot()

41 / 55

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot() +
coord_flip()

41 / 55

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot() +
coord_flip()

41 / 55

Themes

Source: ggplot2 workshop by @thomasp85

  • Style changes that are not related to data

  • Can apply built-in themes or modify each element separately

  • Follows hierarchy i.e. changes in the upper level percolate to lower levels

42 / 55

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_classic()

42 / 55

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()

42 / 55

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_dark()

42 / 55

Themes

ggplot(data=mpg, aes(x=displ, y=hwy))+geom_point()+
theme(
panel.grid.major = element_line('white',size = 0.5),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank(),
panel.border = element_rect(colour = "blue", fill = NA, linetype = 2),
panel.background = element_rect(fill = "aliceblue"),
axis.title = element_text(colour = "blue", face = "bold", family = "Times"),
axis.text=element_text(face="bold")
)

43 / 55

Check out ggthemes package for many more theme options

Adding labels to your plot

ggplot(data=mpg)

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")

43 / 55

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")+
labs(caption="Source:mpg dataset")

43 / 55

ggplot object

myplot <- ggplot(data=mpg)
43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)
43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)
43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()
43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot

43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")

43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")

43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")

43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")

43 / 55

ggplot object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")+
labs(caption="Source:mpg dataset")

43 / 55

A ggplot template

ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>

In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.

44 / 55

BEYOND ggplot2

How to better compose, annotate and highlight your plots

45 / 55

46 / 55

Plot Composition with patchwork

library(ggplot2)
library(patchwork)
p1 <- ggplot(mpg) + geom_point(aes(displ, hwy)) # first plot
p2 <- ggplot(mpg) + geom_boxplot(aes(displ, hwy, group = class)) # second plot
p1+p2 # combined plot output using patchwork package

47 / 55

Plot Composition with patchwork

p3 <- ggplot(mpg, aes(displ, hwy))+geom_point(aes(color=class))+geom_smooth(aes(color=class))
p4 <- ggplot(mpg) + geom_bar(aes(class))
p5<-(p1 | p2 | p3) / p4
p5+plot_annotation('This is a title', caption = 'Source: mpg dataset',
theme = theme(plot.caption = element_text(size = 14),
plot.title = element_text(size = 18)))

48 / 55

49 / 55

Plot Annotation without ggrepel

ggplot(mpg[1:20,], aes(x = displ, y = hwy)) +
geom_point() +
geom_text(aes(label = model))

50 / 55

Plot Annotation with ggrepel

library(ggrepel)
ggplot(mpg[1:20,], aes(x = displ, y = hwy)) +
geom_point() +
geom_text_repel(aes(label = model))

51 / 55

52 / 55

Highlighting geoms

library(gghighlight)
52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))+
theme_bw()

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))+
theme_bw()+
labs(x = "Day of Month",
y = "Temperature")

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))+
theme_bw()+
labs(x = "Day of Month",
y = "Temperature") +
theme(legend.position = "top")

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))+
theme_bw()+
labs(x = "Day of Month",
y = "Temperature") +
theme(legend.position = "top")+
gghighlight(max(Temp) > 93,
label_key = Month)

52 / 55

Highlighting geoms

library(gghighlight)
ggplot(airquality)+
geom_line(aes(Day, Temp,
color=factor(Month)))+
theme_bw()+
labs(x = "Day of Month",
y = "Temperature") +
theme(legend.position = "top")+
gghighlight(max(Temp) > 93,
label_key = Month)

52 / 55

What next?

53 / 55

Resources Used

54 / 55

THANK YOU

55 / 55

Grammar of Graphics

  • First published in 1999

    • Foundation for many graphic applications
  • Grammar can be applied to every type of plot

  • Concisely describe components

  • Construct and deconstruct

2 / 55
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow