class: center, middle, inverse, title-slide .title[ # Data Visualization with ggplot2 ] .subtitle[ ## CSTEP R course ] .author[ ### Meenakshi Kushwaha, July 28, 2022 ] --- <style type="text/css"> # This chunk for sequential highlighting # Label class appropriately for slides that need this .highlight-last-item > ul > li, .highlight-last-item > ol > li { opacity: 0.5; } .highlight-last-item > ul > li:last-of-type, .highlight-last-item > ol > li:last-of-type { opacity: 1; } </style> # Grammar of Graphics .pull-right[![](images/wilkinson.jpeg)] .pull-left[ - First published in 1999 - Foundation for many graphic applications - Grammar can be applied to every type of plot - Concisely describe components - Construct and deconstruct ] --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] ![](images/tom1.png) --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_data.png)] - Your dataset - Tidy format --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_mapping.png)] - This is how we tell R which variables we want to plot - *Aesthetics mapping* Links variable in the data to graphical properties - *Facets mapping* Links variable in data to panels in the plot layout --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_statistics.png)] - Even tidy data may need some transformation - Transform input variables to displayed values - Bins for histogram - Summary statistics for boxplot - No. of observations in a category for bar chart - Implicit in many plot types --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_scales.png)] - Help you interpret the plot - Categories -> color - Numeric -> position - Automatically generated in ggplot and can be customized - log scale - time series --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_geometries.png)] - Aesthetics as graphical repersentations - Determines your plot type - bar chart - scatter - boxplot - ... --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_facets.png)] - Divide your data into panels using one or two groups - Allows you to look at smaller subsets of data --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_coordinates.png)] - Positions are interpreted by the coordinate system - Defines the physical mapping of the aesthetics --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_theme.png)] - Overall look of the plot - Spans every part of the graphic that is not linked to the data - "non-data ink" --- ![](images/horst_ggplot.jpeg) --- # Getting Started - Load the tidyverse package ```r library(tidyverse) ``` - If this is your first time you may have to install it first ```r install.packages("tidyverse") library(tidyverse) ``` --- class: center, middle ## Do cars with big engines use more fuel than cars with small engines? --- # Data set `mpg` Observations collected by US EPA on 38 models of cars ```r head(ggplot2::mpg) ``` ``` # A tibble: 6 × 11 manufacturer model displ year cyl trans drv cty hwy fl class <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa… 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa… 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa… 4 audi a4 2 2008 4 auto(av) f 21 30 p compa… 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa… 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa… ``` - `displ` : car's engine size - `hwy` : car's fuel efficiency on the highway in miles per gallon - type `?mpg` to learn more about the dataset ??? a car with low fuel efficiency consumes more fuel than a car with high fuel efficiency for the same distance --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r *ggplot(data=mpg) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_01_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_02_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_03_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_04_output-1.png)<!-- --> ] <style> .panel1-my_cars-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- #What did we need? .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_need.png)] -- .center[ .pull-left[ ##All other components use defaults ] ] --- class:middle # A template ###`ggplot(data = <DATA>) +` ###`<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))` -- `ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))` -- --- # Use of `+` to link code instead of `%>%` `ggplot` was built before `%>%` was introduced ```r mpg %>% ggplot()+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` is same as ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` --- # Common Problems - Make sure that every `(` is matched with a `)` - Make sure that every `"` is paired with another `"` - Make sure you use `+` and not `%>%` - Make sure that `+` is in the right place: it has to come at the end of the line, not the start. The following code will **not work** ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` - Look for help by typing `?function_name` - scroll down to examples - Look at the error message - *try googling the error message* ??? As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work! Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. --- # Quiz What is the minimum requirement to make a plot using `ggplot()`? 1. data, scales, theme 2. data, facets, geometries 3. data, statistics, coordinates 4. data, mapping, geometries --- # Quiz Which of this is correct syntax for `ggplot()`? a) `ggplot(data = mpg)` `+ geom_point(mapping = aes(x = displ, y = hwy))` b) `ggplot(data = mpg) + ` `geom_point(mapping = aes(x = displ, y = hwy))` c) `ggplot(data = mpg) %>%` ` geom_point(mapping = aes(x = displ, y = hwy))` d) `ggplot(data = mpg)` ` geom_point(mapping = aes(x = displ, y = hwy))` --- ####*Let's look at the plot again* ![](images/cars.png) --- count: false #Aesthetics .panel1-my_cars3-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` ] .panel2-my_cars3-non_seq[ ![](index_files/figure-html/my_cars3_non_seq_01_output-1.png)<!-- --> ] --- count: false #Aesthetics .panel1-my_cars3-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * aes(color=class)+ geom_point() ``` ] .panel2-my_cars3-non_seq[ ![](index_files/figure-html/my_cars3_non_seq_02_output-1.png)<!-- --> ] <style> .panel1-my_cars3-non_seq { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars3-non_seq { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars3-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #Aesthetics .panel1-my_cars4-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` ] .panel2-my_cars4-non_seq[ ![](index_files/figure-html/my_cars4_non_seq_01_output-1.png)<!-- --> ] --- count: false #Aesthetics .panel1-my_cars4-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * aes(shape=class)+ geom_point() ``` ] .panel2-my_cars4-non_seq[ ![](index_files/figure-html/my_cars4_non_seq_02_output-1.png)<!-- --> ] <style> .panel1-my_cars4-non_seq { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars4-non_seq { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars4-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Aesthetics Setting the properties of `geom` manually ```r ggplot(data = mpg) + * geom_point(mapping = aes(x = displ, y = hwy), color = "blue") ``` ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> -- Here, the color "blue" doesn’t convey information about a variable, but only changes the appearance of the plot --- # Aesthetics To set a geometric property manually, place it outside of `aes()` - The name of a color as a character string - The size of a point in mm - The shape of a point as a number -- ![](images/shapes-1.png) *R has 25 built in shapes that are identified by numbers* --- # Aesthetics ![](images/common-aesthetics-1.png) Remember aesthetics depend on geometry... --- # Geometric Objects .pull-left[ ![](images/cars.png) ] .pull-right[ ![](images/cars_line.png) ] Both plots have the same `x` and `y` axes but use different `geoms` or geometries -- ??? Plots are often described as their geoms as boxplots, line plots, etc. often described as their geoms as boxplots, line plots, etc. --- count: false #Geometric objects .panel1-line-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-line-auto[ ![](index_files/figure-html/line_auto_01_output-1.png)<!-- --> ] --- count: false #Geometric objects .panel1-line-auto[ ```r ggplot(data = mpg) + * geom_smooth(mapping = aes( * x = displ, * y = hwy, * linetype = drv)) ``` ] .panel2-line-auto[ ![](index_files/figure-html/line_auto_02_output-1.png)<!-- --> ] <style> .panel1-line-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-line-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-line-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * geom_smooth(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + geom_smooth(mapping = aes( x = displ, y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r *ggplot(data = mpg, * mapping = aes * (x = displ, * y = hwy)) ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + * geom_point() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + * geom_smooth() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + geom_smooth() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers. --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + * geom_point( * mapping = aes( * color = class)) ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + * geom_smooth() ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth() ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms3-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms3-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ##Where to place `aes()` - If `aes()` function is placed inside ggplot(), the same `aes` is used for all layers - If `aes()` is placed outside ggplot() function then its definition is used for the specific layer - Multiple `aes()` can be defined for multiple geometries within the same plot --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_01_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + * geom_point( * mapping = aes( * color = class)) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_02_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + * geom_smooth(data = * filter(mpg, * class == "suv"), * se = FALSE) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_03_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth(data = filter(mpg, class == "suv"), se = FALSE) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms4-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms4-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class:inverse, middle, center #Exercises --- # Quiz Which of the following is NOT true? 1. If `aes()` function is placed inside `ggplot()`, the same aes is used for all layers 2. If `aes()` is placed outside `ggplot()` function then its definition is used for the specific layer 3. Multiple `aes()` can be defined for multiple geometries within the same plot 4. Variables or columns to be plotted can be placed outside `aes()` --- # Statistical Transformations .pull-right[![](images/tom_statistics.png)] .pull-left[ - Linked to geometries - Every `geom` has a default `stat` and vice versa - Can use `geom_*()` and `stat_()*` interchangeably but former is more common ] --- # Statistical Transformations ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut)) ``` ![](index_files/figure-html/unnamed-chunk-8-1.png)<!-- --> -- Where does count on y-axis come from? --- # Statistical Transformations .pull-left[ ![](images/ggplot4.png) ] .pull-right[ Some plots calculate new values from the data - Bar charts and histograms - smoothing functions - boxplots ] -- The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation --- # Statistical Transformations ![](images/visualization-stat-bar.png) You can find out which `stat` each `geom` uses by looking at the default value of the `stat` argument of the help page. What it the default `stat` for `geom_bar`? --- # Statistical Transformations - Overriding default options - Here, display bar chart of proportions instead of count ```r ggplot(data = diamonds) + * geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1)) ``` ![](index_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- #Scales .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_scales.png)] .pull-left[ - Everything inside `aes()` will have a scale by default - `scale_<aesthetic>_<type>()` - `<type>` can either be a generic (continuous, discrete, or binned) or specific (e.g. area, for scaling size to circle area) ] --- # Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) ``` ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class))+ * scale_colour_brewer(type = 'qual') ``` ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- #Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) + * scale_x_continuous(breaks = c(3, 5, 6)) + * scale_y_continuous(trans = 'log10') ``` ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- # Facets .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_facets.png)] .pull-left[ - Split data into multiple panels - Another way to add additional variable - Useful for categorical variables - Facet by a single variable `facet_wrap()` - Facet by two variables `facet_grid()` ] --- count: false #Facets .panel1-facets-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_01_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_02_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * facet_wrap(~ class, nrow = 2) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_03_output-1.png)<!-- --> ] <style> .panel1-facets-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facets-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facets-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #Facets .panel1-facets2-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_01_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets2-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_02_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets2-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * facet_grid(drv ~ cyl) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_03_output-1.png)<!-- --> ] <style> .panel1-facets2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facets2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facets2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class:inverse, middle, center #Exercises --- # Coordinates .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_coordinates.png)] .pull-left[ - Defining your plot canvas - How should x and y be interpreted? - Default is the Cartesian coordinate system - Useful for spatial data (map projections) ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = class, * y = hwy)) ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_01_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + * geom_boxplot() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_02_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + * coord_flip() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_03_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + coord_flip() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_04_output-1.png)<!-- --> ] <style> .panel1-coord-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-coord-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-coord-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Themes .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_theme.png)] .pull-left[ - Style changes that are not related to data - Can apply built-in themes or modify each element separately - Follows hierarchy i.e. changes in the upper level percolate to lower levels ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_classic() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_01_output-1.png)<!-- --> ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_02_output-1.png)<!-- --> ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_dark() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_03_output-1.png)<!-- --> ] <style> .panel1-themes1-rotate { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-themes1-rotate { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-themes1-rotate { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- #Themes ```r ggplot(data=mpg, aes(x=displ, y=hwy))+geom_point()+ theme( panel.grid.major = element_line('white',size = 0.5), panel.grid.minor = element_blank(), panel.grid.major.y = element_blank(), panel.border = element_rect(colour = "blue", fill = NA, linetype = 2), panel.background = element_rect(fill = "aliceblue"), axis.title = element_text(colour = "blue", face = "bold", family = "Times"), axis.text=element_text(face="bold") ) ``` ![](index_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ??? Check out `ggthemes` package for many more theme options --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r *ggplot(data=mpg) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_01_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_02_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_03_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_04_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_05_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ * labs(x="Displacement") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_06_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ * labs(y="Highway Mileage") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_07_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ * labs(title="My first GGPLOT") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_08_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ * labs(subtitle="This is the subtitle") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_09_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ * labs(caption="Source:mpg dataset") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_10_output-1.png)<!-- --> ] <style> .panel1-lables1-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-lables1-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-lables1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##ggplot object .panel1-lables2-auto[ ```r *myplot <- ggplot(data=mpg) ``` ] .panel2-lables2-auto[ ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-lables2-auto[ ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-lables2-auto[ ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-lables2-auto[ ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-lables2-auto[ ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() *myplot ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_06_output-1.png)<!-- --> ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ * labs(x="Displacement") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_07_output-1.png)<!-- --> ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ * labs(y="Highway Mileage") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_08_output-1.png)<!-- --> ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ * labs(title="My first GGPLOT") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_09_output-1.png)<!-- --> ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ * labs(subtitle="This is the subtitle") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_10_output-1.png)<!-- --> ] --- count: false ##ggplot object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ * labs(caption="Source:mpg dataset") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_11_output-1.png)<!-- --> ] <style> .panel1-lables2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-lables2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-lables2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # A `ggplot` template `ggplot(data = <DATA>) + ` ` <GEOM_FUNCTION>(` `mapping = aes(<MAPPINGS>),` `stat = <STAT>,` ` position = <POSITION>` `) +` ` <COORDINATE_FUNCTION> +` `<FACET_FUNCTION>` *In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.* --- class: inverse, middle, center # BEYOND ggplot2 How to better compose, annotate and highlight your plots --- ![](images/horst_patchwork_1.jpg) --- # Plot Composition with `patchwork` ```r library(ggplot2) library(patchwork) p1 <- ggplot(mpg) + geom_point(aes(displ, hwy)) # first plot p2 <- ggplot(mpg) + geom_boxplot(aes(displ, hwy, group = class)) # second plot p1+p2 # combined plot output using patchwork package ``` ![](index_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Plot Composition with `patchwork` ```r p3 <- ggplot(mpg, aes(displ, hwy))+geom_point(aes(color=class))+geom_smooth(aes(color=class)) p4 <- ggplot(mpg) + geom_bar(aes(class)) *p5<-(p1 | p2 | p3) / p4 p5+plot_annotation('This is a title', caption = 'Source: mpg dataset', theme = theme(plot.caption = element_text(size = 14), plot.title = element_text(size = 18))) ``` ![](index_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- ![](images/horst_ggrepel.jpeg) --- # Plot Annotation without `ggrepel` ```r ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + * geom_text(aes(label = model)) ``` ![](index_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- # Plot Annotation with `ggrepel` ```r *library(ggrepel) ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + * geom_text_repel(aes(label = model)) ``` ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- ![](images/horst_gghighlight.jpeg) --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r *library(gghighlight) ``` ] .panel2-highlight1-auto[ ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) *ggplot(airquality) ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_02_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ * geom_line(aes(Day, Temp, * color=factor(Month))) ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_03_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ geom_line(aes(Day, Temp, color=factor(Month)))+ * theme_bw() ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_04_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ geom_line(aes(Day, Temp, color=factor(Month)))+ theme_bw()+ * labs(x = "Day of Month", * y = "Temperature") ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_05_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ geom_line(aes(Day, Temp, color=factor(Month)))+ theme_bw()+ labs(x = "Day of Month", y = "Temperature") + * theme(legend.position = "top") ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_06_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ geom_line(aes(Day, Temp, color=factor(Month)))+ theme_bw()+ labs(x = "Day of Month", y = "Temperature") + theme(legend.position = "top")+ * gghighlight(max(Temp) > 93, * label_key = Month) ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_07_output-1.png)<!-- --> ] --- count: false #Highlighting geoms .panel1-highlight1-auto[ ```r library(gghighlight) ggplot(airquality)+ geom_line(aes(Day, Temp, color=factor(Month)))+ theme_bw()+ labs(x = "Day of Month", y = "Temperature") + theme(legend.position = "top")+ gghighlight(max(Temp) > 93, label_key = Month) ``` ] .panel2-highlight1-auto[ ![](index_files/figure-html/highlight1_auto_08_output-1.png)<!-- --> ] <style> .panel1-highlight1-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-highlight1-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-highlight1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # What next? - `ggplot2` [extensions](https://exts.ggplot2.tidyverse.org/gallery/) - Rstudio [cheatsheet](https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) - BBC Visual and Data Journalism [cookbook for R graphics](https://bbc.github.io/rcookbook/) ![](images/bbc.png) --- #Resources Used - [`flipbookr`](https://github.com/EvaMaeRey/flipbookr) package by Gina Reynolds - [`xaringan`](https://github.com/yihui/xaringan) package by Yihui Xie - [R for Data Science](https://r4ds.had.co.nz/) book by Hadley Wickham & Garrett Grolemund - [`ggplot2` workshop](https://www.youtube.com/watch?v=h29g21z0a68) by Thomas Lin Pedersen - [Illustrations](https://github.com/allisonhorst/stats-illustrations) by Allison Horst --- class: inverse, middle, center # THANK YOU