Introduction to Plotting with R
Welcome to our comprehensive lecture on creating plots from
dataframes in R! Data visualization is a crucial skill in data analysis,
allowing us to communicate complex information clearly and efficiently.
In this session, we’ll explore various plotting techniques using
built-in R dataframes and the powerful ggplot2
library.
Throughout this lecture, we’ll cover different types of plots, their
purposes, and when to use them. Each section will include detailed
explanations and practical exercises to reinforce your learning.
1. Basic Plotting with Base R
Introduction
We’ll start our journey with R’s built-in plotting functions. These
functions provide a quick and straightforward way to visualize data.
While they may not be as flexible as more advanced libraries,
understanding base R plotting is fundamental and can be useful for quick
data exploration.
Purpose and Utility
Scatter plots, which we’ll create in this section, are excellent for
visualizing relationships between two continuous variables. They’re
particularly useful when you want to: - Identify correlations between
variables - Detect outliers or unusual patterns in your data -
Understand the distribution of data points across two dimensions
Scatter plots are widely used in various fields, including: -
Economics: plotting GDP against life expectancy - Biology: comparing
gene expression levels - Environmental science: examining the
relationship between temperature and pollution levels
# Load the mtcars dataset
data(mtcars)
# Create a simple scatter plot
plot(mtcars$wt, mtcars$mpg,
main = "Car Weight vs. Miles Per Gallon",
xlab = "Weight (1000 lbs)",
ylab = "Miles Per Gallon",
pch = 19,
col = "blue")
In this example, we: 1. Load the mtcars
dataset, which
is built into R. 2. Use the plot()
function to create a
scatter plot. 3. Set the main title with main
, x-axis label
with xlab
, and y-axis label with ylab
. 4. Use
pch = 19
for solid circle points and
col = "blue"
for blue color.
Exercises
Create a scatter plot using the mtcars
dataset to
visualize the relationship between horsepower (hp
) and
quarter-mile time (qsec
). Use red triangles for the
points.
Using the iris
dataset (another built-in R dataset),
create a scatter plot of sepal length vs. sepal width. Color the points
based on the species. Hint: You’ll need to use the col
parameter with a vector of colors corresponding to the species.
2. Introduction to ggplot2
Introduction
Now we’ll dive into ggplot2
, a powerful and flexible
plotting library in R. ggplot2
is based on the Grammar of
Graphics, a coherent system for describing and building graphs. This
system allows for highly customizable and layered graphics.
Purpose and Utility
The ggplot2
library offers several advantages over base
R plotting: - Consistent and intuitive syntax - Layered approach to
building complex graphics - Beautiful default aesthetics - Extensive
customization options
ggplot2
is particularly useful when: - Creating
publication-quality graphics - Building complex, multi-layered plots -
Needing to quickly change aesthetic properties of plots - Working with
large datasets
# Install and load ggplot2 if not already installed
# install.packages("ggplot2")
library(ggplot2)
# Create a scatter plot using ggplot2
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
labs(title = "Car Weight vs. Miles Per Gallon",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon") +
theme_minimal()
Here’s what we did: 1. Load the ggplot2
library. 2. Use
ggplot()
to initialize the plot, specifying the data and
aesthetics. 3. Add points with geom_point()
. 4. Set labels
with labs()
. 5. Apply a minimal theme with
theme_minimal()
.
Exercises
Using ggplot2
and the economics
dataset
(comes with ggplot2), create a line plot of unemployment over time. Use
the date
column for the x-axis and unemploy
for the y-axis. Add appropriate labels and a title.
With the mpg
dataset (also included in ggplot2),
create a scatter plot of engine displacement (displ
)
vs. highway miles per gallon (hwy
). Color the points by the
class
of the vehicle. Add a title and appropriate axis
labels.
3. Enhancing Plots with Color and Shape
Introduction
In this section, we’ll explore how to enhance our plots by
incorporating additional variables through color and shape. This
technique allows us to display multidimensional data in a
two-dimensional plot, increasing the information density of our
visualizations.
Purpose and Utility
Adding color and shape to plots serves several important purposes: -
Grouping: It helps viewers quickly identify different categories or
groups within the data. - Pattern recognition: It makes it easier to
spot trends or patterns specific to certain groups. - Information
density: It allows for the representation of additional variables
without adding more dimensions to the plot.
This technique is particularly useful when: - Comparing multiple
categories within a dataset - Identifying how different factors interact
with the main variables being plotted - Presenting complex,
multivariable data in a single, comprehensible visualization
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), shape = factor(am))) +
geom_point(size = 3) +
labs(title = "Car Weight vs. MPG",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon",
color = "Cylinders",
shape = "Transmission") +
scale_color_brewer(palette = "Set1") +
theme_light()
In this example: 1. We use color = factor(cyl)
to color
points by number of cylinders. 2. shape = factor(am)
changes point shapes based on transmission type. 3.
scale_color_brewer()
applies a color palette from
ColorBrewer. 4. theme_light()
gives a light background
theme.
Exercises
Using the diamonds
dataset (included in ggplot2),
create a scatter plot of price vs. carat. Use color to represent the cut
quality and shape to represent the clarity. Add appropriate labels and a
title.
With the iris
dataset, create a scatter plot of
petal length vs. petal width. Use color to represent the species.
Instead of using different shapes, vary the size of the points based on
the sepal width. Add a legend for both color and size.
4. Creating Bar Plots
Introduction
Bar plots are one of the most common and effective ways to visualize
categorical data. They allow for easy comparison of quantities across
different categories or groups.
Purpose and Utility
Bar plots are particularly useful for: - Comparing quantities or
frequencies across different categories - Displaying the distribution of
a categorical variable - Showing changes in a quantity over time (when
categories are time periods) - Presenting survey results or other
categorical data
You might use bar plots when: - Analyzing market share across
different products or companies - Comparing sales figures across
different regions - Visualizing the distribution of responses in a
survey - Presenting budget allocations across different departments
# Prepare data
cylinders <- as.data.frame(table(mtcars$cyl))
colnames(cylinders) <- c("Cylinders", "Count")
ggplot(cylinders, aes(x = Cylinders, y = Count, fill = Cylinders)) +
geom_bar(stat = "identity") +
labs(title = "Number of Cars by Cylinder Count",
x = "Number of Cylinders",
y = "Count") +
theme_classic() +
scale_fill_brewer(palette = "Pastel1")
Here’s what we did: 1. Create a summary dataframe of cylinder counts.
2. Use geom_bar()
with stat = "identity"
to
create bars of specified heights. 3. Fill bars with different colors
based on cylinder count. 4. Apply a classic theme and a pastel color
palette.
Exercises
Using the mpg
dataset, create a bar plot showing the
count of cars for each manufacturer. Order the bars from highest to
lowest count. Add appropriate labels and a title.
With the diamonds
dataset, create a stacked bar plot
showing the proportion of different cuts (fair, good, very good,
premium, ideal) for each clarity category. Use different colors for each
cut. Add a legend and appropriate labels.
5. Box Plots for Comparing Distributions
Introduction
Box plots, also known as box-and-whisker plots, are an excellent tool
for visualizing the distribution of a continuous variable across
different categories. They provide a concise summary of the data’s
central tendency, spread, and potential outliers.
Purpose and Utility
Box plots are particularly useful for: - Comparing distributions
across different groups or categories - Identifying the median,
quartiles, and potential outliers in a dataset - Detecting skewness in
the data distribution - Comparing the spread of data across different
groups
You might use box plots when: - Comparing salary distributions across
different departments - Analyzing the distribution of test scores across
different schools - Examining the variability of measurement data in
scientific experiments - Comparing the performance of different
algorithms or methods
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot() +
labs(title = "Distribution of MPG by Number of Cylinders",
x = "Number of Cylinders",
y = "Miles Per Gallon") +
theme_bw() +
scale_fill_brewer(palette = "Set2") +
theme(legend.position = "none")
In this example: 1. We create box plots using
geom_boxplot()
. 2. Group and fill by number of cylinders.
3. Remove the legend as it’s redundant with x-axis labels.
Exercises
Using the diamonds
dataset, create a box plot
showing the distribution of price for each cut category. Add color to
the boxes based on the cut. Include appropriate labels and a
title.
With the gapminder
dataset (you may need to install
the gapminder package), create a box plot showing the distribution of
life expectancy for each continent. Arrange the continents in descending
order of median life expectancy. Add color and appropriate
labels.
6. Histograms and Density Plots
Introduction
Histograms and density plots are powerful tools for visualizing the
distribution of a single continuous variable. They provide insights into
the shape, central tendency, and spread of the data.
Purpose and Utility
Histograms and density plots are particularly useful for: -
Visualizing the overall distribution of a continuous variable -
Identifying the mode(s) of a distribution - Detecting skewness or
unusual patterns in the data - Comparing the distribution of a variable
across different groups
You might use these plots when: - Analyzing the distribution of ages
in a population - Examining the distribution of response times in a
psychology experiment - Investigating the distribution of prices in a
real estate market - Comparing the distribution of a variable before and
after an intervention
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..), binwidth = 2, fill = "skyblue", color = "black") +
geom_density(color = "red", size = 1) +
labs(title = "Distribution of Miles Per Gallon",
x = "Miles Per Gallon",
y = "Density") +
theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Here’s what we did: 1. Create a histogram with
geom_histogram()
, setting y = ..density..
for
density scale. 2. Overlay a density curve with
geom_density()
. 3. Customize colors and labels for
clarity.
Exercises
Using the diamonds
dataset, create a histogram of
the ‘price’ variable. Experiment with different bin widths to see how it
affects the visualization. Add a density curve on top of the histogram.
Include appropriate labels and a title.
With the faithful
dataset (built into R), create two
density plots on the same graph: one for eruption duration and one for
waiting time between eruptions. Use different colors for each density
curve and add a legend. Normalize the scales so that both curves use the
same y-axis. Add appropriate labels and a title.
7. Faceting for Multi-panel Plots
Introduction
Faceting is a powerful technique in data visualization that allows
you to create multiple panels or subplots based on categorical
variables. This approach is particularly useful when you want to compare
patterns across different subgroups of your data.
Purpose and Utility
Faceting is especially useful for: - Comparing trends or patterns
across different categories - Visualizing how the relationship between
variables changes across different groups - Displaying multiple aspects
of a dataset in a single, organized figure - Reducing overplotting in
complex datasets
You might use faceting when: - Comparing sales trends across
different regions over time - Analyzing how the relationship between two
variables varies across different categories - Visualizing multiple
related metrics for different groups - Exploring how a distribution
changes based on one or more categorical variables
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am))) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~cyl, nrow = 1) +
labs(title = "Weight vs. MPG by Cylinders and Transmission",
x = "Weight (1000 lbs)",
y = "Miles Per Gallon",
color = "Transmission") +
theme_bw() +
scale_color_brewer(palette = "Set1", labels = c("Automatic", "Manual"))
`geom_smooth()` using formula = 'y ~ x'
In this example: 1. We use facet_wrap()
to create
separate panels for each cylinder count. 2. Add trend lines with
geom_smooth()
. 3. Color points and lines by transmission
type. 4. Customize labels and theme for better readability.
Exercises
Using the diamonds
dataset, create a scatter plot of
price vs. carat. Facet the plot by cut, creating a 2x3 grid of subplots.
Color the points by clarity. Add a smooth trend line to each facet.
Include appropriate labels and a title.
With the mpg
dataset, create a box plot of highway
fuel efficiency (hwy) for different car classes. Facet the plot by the
number of cylinders (cyl). Color the boxes by the type of drive (drv).
Arrange the facets in a single row. Add appropriate labels and a
title.
Conclusion
This lecture has covered a range of plotting techniques in R, from
basic scatter plots to more complex, multi-layered visualizations.
Remember, the key to effective data visualization is choosing the right
plot type for your data and research question. Practice with different
datasets and experiment with various ggplot2
functions to
become proficient in creating informative and visually appealing
plots.
Additional Resources
Happy plotting!
---
title: "Creating Plots from Dataframes in R"
author: "Nayel Bettache"
date: "2024-09-18"
output: html_notebook
editor_options: 
  markdown: 
    wrap: 72
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction to Plotting with R

Welcome to our comprehensive lecture on creating plots from dataframes
in R! Data visualization is a crucial skill in data analysis, allowing
us to communicate complex information clearly and efficiently. In this
session, we'll explore various plotting techniques using built-in R
dataframes and the powerful `ggplot2` library.

Throughout this lecture, we'll cover different types of plots, their
purposes, and when to use them. Each section will include detailed
explanations and practical exercises to reinforce your learning.

## 1. Basic Plotting with Base R

### Introduction

We'll start our journey with R's built-in plotting functions. These
functions provide a quick and straightforward way to visualize data.
While they may not be as flexible as more advanced libraries,
understanding base R plotting is fundamental and can be useful for quick
data exploration.

### Purpose and Utility

Scatter plots, which we'll create in this section, are excellent for
visualizing relationships between two continuous variables. They're
particularly useful when you want to: - Identify correlations between
variables - Detect outliers or unusual patterns in your data -
Understand the distribution of data points across two dimensions

Scatter plots are widely used in various fields, including: - Economics:
plotting GDP against life expectancy - Biology: comparing gene
expression levels - Environmental science: examining the relationship
between temperature and pollution levels

```{r}
# Load the mtcars dataset
data(mtcars)

# Create a simple scatter plot
plot(mtcars$wt, mtcars$mpg, 
     main = "Car Weight vs. Miles Per Gallon",
     xlab = "Weight (1000 lbs)", 
     ylab = "Miles Per Gallon",
     pch = 19, 
     col = "blue")
```

In this example, we: 1. Load the `mtcars` dataset, which is built into
R. 2. Use the `plot()` function to create a scatter plot. 3. Set the
main title with `main`, x-axis label with `xlab`, and y-axis label with
`ylab`. 4. Use `pch = 19` for solid circle points and `col = "blue"` for
blue color.

### Exercises

1.  Create a scatter plot using the `mtcars` dataset to visualize the
    relationship between horsepower (`hp`) and quarter-mile time
    (`qsec`). Use red triangles for the points.

2.  Using the `iris` dataset (another built-in R dataset), create a
    scatter plot of sepal length vs. sepal width. Color the points based
    on the species. Hint: You'll need to use the `col` parameter with a
    vector of colors corresponding to the species.

## 2. Introduction to ggplot2

### Introduction

Now we'll dive into `ggplot2`, a powerful and flexible plotting library
in R. `ggplot2` is based on the Grammar of Graphics, a coherent system
for describing and building graphs. This system allows for highly
customizable and layered graphics.

### Purpose and Utility

The `ggplot2` library offers several advantages over base R plotting: -
Consistent and intuitive syntax - Layered approach to building complex
graphics - Beautiful default aesthetics - Extensive customization
options

`ggplot2` is particularly useful when: - Creating publication-quality
graphics - Building complex, multi-layered plots - Needing to quickly
change aesthetic properties of plots - Working with large datasets

```{r}
# Install and load ggplot2 if not already installed
# install.packages("ggplot2")
library(ggplot2)

# Create a scatter plot using ggplot2
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(title = "Car Weight vs. Miles Per Gallon",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon") +
  theme_minimal()
```

Here's what we did: 1. Load the `ggplot2` library. 2. Use `ggplot()` to
initialize the plot, specifying the data and aesthetics. 3. Add points
with `geom_point()`. 4. Set labels with `labs()`. 5. Apply a minimal
theme with `theme_minimal()`.

### Exercises

1.  Using `ggplot2` and the `economics` dataset (comes with ggplot2),
    create a line plot of unemployment over time. Use the `date` column
    for the x-axis and `unemploy` for the y-axis. Add appropriate labels
    and a title.

2.  With the `mpg` dataset (also included in ggplot2), create a scatter
    plot of engine displacement (`displ`) vs. highway miles per gallon
    (`hwy`). Color the points by the `class` of the vehicle. Add a title
    and appropriate axis labels.

## 3. Enhancing Plots with Color and Shape

### Introduction

In this section, we'll explore how to enhance our plots by incorporating
additional variables through color and shape. This technique allows us
to display multidimensional data in a two-dimensional plot, increasing
the information density of our visualizations.

### Purpose and Utility

Adding color and shape to plots serves several important purposes: -
Grouping: It helps viewers quickly identify different categories or
groups within the data. - Pattern recognition: It makes it easier to
spot trends or patterns specific to certain groups. - Information
density: It allows for the representation of additional variables
without adding more dimensions to the plot.

This technique is particularly useful when: - Comparing multiple
categories within a dataset - Identifying how different factors interact
with the main variables being plotted - Presenting complex,
multivariable data in a single, comprehensible visualization

```{r}
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), shape = factor(am))) +
  geom_point(size = 3) +
  labs(title = "Car Weight vs. MPG",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Cylinders",
       shape = "Transmission") +
  scale_color_brewer(palette = "Set1") +
  theme_light()
```

In this example: 1. We use `color = factor(cyl)` to color points by
number of cylinders. 2. `shape = factor(am)` changes point shapes based
on transmission type. 3. `scale_color_brewer()` applies a color palette
from ColorBrewer. 4. `theme_light()` gives a light background theme.

### Exercises

1.  Using the `diamonds` dataset (included in ggplot2), create a scatter
    plot of price vs. carat. Use color to represent the cut quality and
    shape to represent the clarity. Add appropriate labels and a title.

2.  With the `iris` dataset, create a scatter plot of petal length vs.
    petal width. Use color to represent the species. Instead of using
    different shapes, vary the size of the points based on the sepal
    width. Add a legend for both color and size.

## 4. Creating Bar Plots

### Introduction

Bar plots are one of the most common and effective ways to visualize
categorical data. They allow for easy comparison of quantities across
different categories or groups.

### Purpose and Utility

Bar plots are particularly useful for: - Comparing quantities or
frequencies across different categories - Displaying the distribution of
a categorical variable - Showing changes in a quantity over time (when
categories are time periods) - Presenting survey results or other
categorical data

You might use bar plots when: - Analyzing market share across different
products or companies - Comparing sales figures across different
regions - Visualizing the distribution of responses in a survey -
Presenting budget allocations across different departments

```{r}
# Prepare data
cylinders <- as.data.frame(table(mtcars$cyl))
colnames(cylinders) <- c("Cylinders", "Count")

ggplot(cylinders, aes(x = Cylinders, y = Count, fill = Cylinders)) +
  geom_bar(stat = "identity") +
  labs(title = "Number of Cars by Cylinder Count",
       x = "Number of Cylinders",
       y = "Count") +
  theme_classic() +
  scale_fill_brewer(palette = "Pastel1")
```

Here's what we did: 1. Create a summary dataframe of cylinder counts. 2.
Use `geom_bar()` with `stat = "identity"` to create bars of specified
heights. 3. Fill bars with different colors based on cylinder count. 4.
Apply a classic theme and a pastel color palette.

### Exercises

1.  Using the `mpg` dataset, create a bar plot showing the count of cars
    for each manufacturer. Order the bars from highest to lowest count.
    Add appropriate labels and a title.

2.  With the `diamonds` dataset, create a stacked bar plot showing the
    proportion of different cuts (fair, good, very good, premium, ideal)
    for each clarity category. Use different colors for each cut. Add a
    legend and appropriate labels.

## 5. Box Plots for Comparing Distributions

### Introduction

Box plots, also known as box-and-whisker plots, are an excellent tool
for visualizing the distribution of a continuous variable across
different categories. They provide a concise summary of the data's
central tendency, spread, and potential outliers.

### Purpose and Utility

Box plots are particularly useful for: - Comparing distributions across
different groups or categories - Identifying the median, quartiles, and
potential outliers in a dataset - Detecting skewness in the data
distribution - Comparing the spread of data across different groups

You might use box plots when: - Comparing salary distributions across
different departments - Analyzing the distribution of test scores across
different schools - Examining the variability of measurement data in
scientific experiments - Comparing the performance of different
algorithms or methods

```{r}
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot() +
  labs(title = "Distribution of MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Miles Per Gallon") +
  theme_bw() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")
```

In this example: 1. We create box plots using `geom_boxplot()`. 2. Group
and fill by number of cylinders. 3. Remove the legend as it's redundant
with x-axis labels.

### Exercises

1.  Using the `diamonds` dataset, create a box plot showing the
    distribution of price for each cut category. Add color to the boxes
    based on the cut. Include appropriate labels and a title.

2.  With the `gapminder` dataset (you may need to install the gapminder
    package), create a box plot showing the distribution of life
    expectancy for each continent. Arrange the continents in descending
    order of median life expectancy. Add color and appropriate labels.

## 6. Histograms and Density Plots

### Introduction

Histograms and density plots are powerful tools for visualizing the
distribution of a single continuous variable. They provide insights into
the shape, central tendency, and spread of the data.

### Purpose and Utility

Histograms and density plots are particularly useful for: - Visualizing
the overall distribution of a continuous variable - Identifying the
mode(s) of a distribution - Detecting skewness or unusual patterns in
the data - Comparing the distribution of a variable across different
groups

You might use these plots when: - Analyzing the distribution of ages in
a population - Examining the distribution of response times in a
psychology experiment - Investigating the distribution of prices in a
real estate market - Comparing the distribution of a variable before and
after an intervention

```{r}
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(aes(y = ..density..), binwidth = 2, fill = "skyblue", color = "black") +
  geom_density(color = "red", size = 1) +
  labs(title = "Distribution of Miles Per Gallon",
       x = "Miles Per Gallon",
       y = "Density") +
  theme_minimal()
```

Here's what we did: 1. Create a histogram with `geom_histogram()`,
setting `y = ..density..` for density scale. 2. Overlay a density curve
with `geom_density()`. 3. Customize colors and labels for clarity.

### Exercises

1.  Using the `diamonds` dataset, create a histogram of the 'price'
    variable. Experiment with different bin widths to see how it affects
    the visualization. Add a density curve on top of the histogram.
    Include appropriate labels and a title.

2.  With the `faithful` dataset (built into R), create two density plots
    on the same graph: one for eruption duration and one for waiting
    time between eruptions. Use different colors for each density curve
    and add a legend. Normalize the scales so that both curves use the
    same y-axis. Add appropriate labels and a title.

## 7. Faceting for Multi-panel Plots

### Introduction

Faceting is a powerful technique in data visualization that allows you
to create multiple panels or subplots based on categorical variables.
This approach is particularly useful when you want to compare patterns
across different subgroups of your data.

### Purpose and Utility

Faceting is especially useful for: - Comparing trends or patterns across
different categories - Visualizing how the relationship between
variables changes across different groups - Displaying multiple aspects
of a dataset in a single, organized figure - Reducing overplotting in
complex datasets

You might use faceting when: - Comparing sales trends across different
regions over time - Analyzing how the relationship between two variables
varies across different categories - Visualizing multiple related
metrics for different groups - Exploring how a distribution changes
based on one or more categorical variables

```{r}
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~cyl, nrow = 1) +
  labs(title = "Weight vs. MPG by Cylinders and Transmission",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Transmission") +
  theme_bw() +
  scale_color_brewer(palette = "Set1", labels = c("Automatic", "Manual"))
```

In this example: 1. We use `facet_wrap()` to create separate panels for
each cylinder count. 2. Add trend lines with `geom_smooth()`. 3. Color
points and lines by transmission type. 4. Customize labels and theme for
better readability.

### Exercises

1.  Using the `diamonds` dataset, create a scatter plot of price vs.
    carat. Facet the plot by cut, creating a 2x3 grid of subplots. Color
    the points by clarity. Add a smooth trend line to each facet.
    Include appropriate labels and a title.

2.  With the `mpg` dataset, create a box plot of highway fuel efficiency
    (hwy) for different car classes. Facet the plot by the number of
    cylinders (cyl). Color the boxes by the type of drive (drv). Arrange
    the facets in a single row. Add appropriate labels and a title.

## Conclusion

This lecture has covered a range of plotting techniques in R, from basic
scatter plots to more complex, multi-layered visualizations. Remember,
the key to effective data visualization is choosing the right plot type
for your data and research question. Practice with different datasets
and experiment with various `ggplot2` functions to become proficient in
creating informative and visually appealing plots.

## Additional Resources

-   ggplot2 documentation: <https://ggplot2.tidyverse.org/>
-   R Graphics Cookbook: <https://r-graphics.org/>
-   Datacamp ggplot2 tutorial:
    <https://www.datacamp.com/community/tutorials/ggplot2-tutorial-r>

Happy plotting!
