Introduction to Plotting with R

Welcome to our comprehensive lecture on creating plots from dataframes in R! Data visualization is a crucial skill in data analysis, allowing us to communicate complex information clearly and efficiently. In this session, we’ll explore various plotting techniques using built-in R dataframes and the powerful ggplot2 library.

Throughout this lecture, we’ll cover different types of plots, their purposes, and when to use them. Each section will include detailed explanations and practical exercises to reinforce your learning.

1. Basic Plotting with Base R

Introduction

We’ll start our journey with R’s built-in plotting functions. These functions provide a quick and straightforward way to visualize data. While they may not be as flexible as more advanced libraries, understanding base R plotting is fundamental and can be useful for quick data exploration.

Purpose and Utility

Scatter plots, which we’ll create in this section, are excellent for visualizing relationships between two continuous variables. They’re particularly useful when you want to: - Identify correlations between variables - Detect outliers or unusual patterns in your data - Understand the distribution of data points across two dimensions

Scatter plots are widely used in various fields, including: - Economics: plotting GDP against life expectancy - Biology: comparing gene expression levels - Environmental science: examining the relationship between temperature and pollution levels

# Load the mtcars dataset
data(mtcars)

# Create a simple scatter plot
plot(mtcars$wt, mtcars$mpg, 
     main = "Car Weight vs. Miles Per Gallon",
     xlab = "Weight (1000 lbs)", 
     ylab = "Miles Per Gallon",
     pch = 19, 
     col = "blue")

In this example, we: 1. Load the mtcars dataset, which is built into R. 2. Use the plot() function to create a scatter plot. 3. Set the main title with main, x-axis label with xlab, and y-axis label with ylab. 4. Use pch = 19 for solid circle points and col = "blue" for blue color.

Exercises

  1. Create a scatter plot using the mtcars dataset to visualize the relationship between horsepower (hp) and quarter-mile time (qsec). Use red triangles for the points.

  2. Using the iris dataset (another built-in R dataset), create a scatter plot of sepal length vs. sepal width. Color the points based on the species. Hint: You’ll need to use the col parameter with a vector of colors corresponding to the species.

2. Introduction to ggplot2

Introduction

Now we’ll dive into ggplot2, a powerful and flexible plotting library in R. ggplot2 is based on the Grammar of Graphics, a coherent system for describing and building graphs. This system allows for highly customizable and layered graphics.

Purpose and Utility

The ggplot2 library offers several advantages over base R plotting: - Consistent and intuitive syntax - Layered approach to building complex graphics - Beautiful default aesthetics - Extensive customization options

ggplot2 is particularly useful when: - Creating publication-quality graphics - Building complex, multi-layered plots - Needing to quickly change aesthetic properties of plots - Working with large datasets

# Install and load ggplot2 if not already installed
# install.packages("ggplot2")
library(ggplot2)

# Create a scatter plot using ggplot2
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(title = "Car Weight vs. Miles Per Gallon",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon") +
  theme_minimal()

Here’s what we did: 1. Load the ggplot2 library. 2. Use ggplot() to initialize the plot, specifying the data and aesthetics. 3. Add points with geom_point(). 4. Set labels with labs(). 5. Apply a minimal theme with theme_minimal().

Exercises

  1. Using ggplot2 and the economics dataset (comes with ggplot2), create a line plot of unemployment over time. Use the date column for the x-axis and unemploy for the y-axis. Add appropriate labels and a title.

  2. With the mpg dataset (also included in ggplot2), create a scatter plot of engine displacement (displ) vs. highway miles per gallon (hwy). Color the points by the class of the vehicle. Add a title and appropriate axis labels.

3. Enhancing Plots with Color and Shape

Introduction

In this section, we’ll explore how to enhance our plots by incorporating additional variables through color and shape. This technique allows us to display multidimensional data in a two-dimensional plot, increasing the information density of our visualizations.

Purpose and Utility

Adding color and shape to plots serves several important purposes: - Grouping: It helps viewers quickly identify different categories or groups within the data. - Pattern recognition: It makes it easier to spot trends or patterns specific to certain groups. - Information density: It allows for the representation of additional variables without adding more dimensions to the plot.

This technique is particularly useful when: - Comparing multiple categories within a dataset - Identifying how different factors interact with the main variables being plotted - Presenting complex, multivariable data in a single, comprehensible visualization

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), shape = factor(am))) +
  geom_point(size = 3) +
  labs(title = "Car Weight vs. MPG",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Cylinders",
       shape = "Transmission") +
  scale_color_brewer(palette = "Set1") +
  theme_light()

In this example: 1. We use color = factor(cyl) to color points by number of cylinders. 2. shape = factor(am) changes point shapes based on transmission type. 3. scale_color_brewer() applies a color palette from ColorBrewer. 4. theme_light() gives a light background theme.

Exercises

  1. Using the diamonds dataset (included in ggplot2), create a scatter plot of price vs. carat. Use color to represent the cut quality and shape to represent the clarity. Add appropriate labels and a title.

  2. With the iris dataset, create a scatter plot of petal length vs. petal width. Use color to represent the species. Instead of using different shapes, vary the size of the points based on the sepal width. Add a legend for both color and size.

4. Creating Bar Plots

Introduction

Bar plots are one of the most common and effective ways to visualize categorical data. They allow for easy comparison of quantities across different categories or groups.

Purpose and Utility

Bar plots are particularly useful for: - Comparing quantities or frequencies across different categories - Displaying the distribution of a categorical variable - Showing changes in a quantity over time (when categories are time periods) - Presenting survey results or other categorical data

You might use bar plots when: - Analyzing market share across different products or companies - Comparing sales figures across different regions - Visualizing the distribution of responses in a survey - Presenting budget allocations across different departments

# Prepare data
cylinders <- as.data.frame(table(mtcars$cyl))
colnames(cylinders) <- c("Cylinders", "Count")

ggplot(cylinders, aes(x = Cylinders, y = Count, fill = Cylinders)) +
  geom_bar(stat = "identity") +
  labs(title = "Number of Cars by Cylinder Count",
       x = "Number of Cylinders",
       y = "Count") +
  theme_classic() +
  scale_fill_brewer(palette = "Pastel1")

Here’s what we did: 1. Create a summary dataframe of cylinder counts. 2. Use geom_bar() with stat = "identity" to create bars of specified heights. 3. Fill bars with different colors based on cylinder count. 4. Apply a classic theme and a pastel color palette.

Exercises

  1. Using the mpg dataset, create a bar plot showing the count of cars for each manufacturer. Order the bars from highest to lowest count. Add appropriate labels and a title.

  2. With the diamonds dataset, create a stacked bar plot showing the proportion of different cuts (fair, good, very good, premium, ideal) for each clarity category. Use different colors for each cut. Add a legend and appropriate labels.

5. Box Plots for Comparing Distributions

Introduction

Box plots, also known as box-and-whisker plots, are an excellent tool for visualizing the distribution of a continuous variable across different categories. They provide a concise summary of the data’s central tendency, spread, and potential outliers.

Purpose and Utility

Box plots are particularly useful for: - Comparing distributions across different groups or categories - Identifying the median, quartiles, and potential outliers in a dataset - Detecting skewness in the data distribution - Comparing the spread of data across different groups

You might use box plots when: - Comparing salary distributions across different departments - Analyzing the distribution of test scores across different schools - Examining the variability of measurement data in scientific experiments - Comparing the performance of different algorithms or methods

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot() +
  labs(title = "Distribution of MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Miles Per Gallon") +
  theme_bw() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")

In this example: 1. We create box plots using geom_boxplot(). 2. Group and fill by number of cylinders. 3. Remove the legend as it’s redundant with x-axis labels.

Exercises

  1. Using the diamonds dataset, create a box plot showing the distribution of price for each cut category. Add color to the boxes based on the cut. Include appropriate labels and a title.

  2. With the gapminder dataset (you may need to install the gapminder package), create a box plot showing the distribution of life expectancy for each continent. Arrange the continents in descending order of median life expectancy. Add color and appropriate labels.

6. Histograms and Density Plots

Introduction

Histograms and density plots are powerful tools for visualizing the distribution of a single continuous variable. They provide insights into the shape, central tendency, and spread of the data.

Purpose and Utility

Histograms and density plots are particularly useful for: - Visualizing the overall distribution of a continuous variable - Identifying the mode(s) of a distribution - Detecting skewness or unusual patterns in the data - Comparing the distribution of a variable across different groups

You might use these plots when: - Analyzing the distribution of ages in a population - Examining the distribution of response times in a psychology experiment - Investigating the distribution of prices in a real estate market - Comparing the distribution of a variable before and after an intervention

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(aes(y = ..density..), binwidth = 2, fill = "skyblue", color = "black") +
  geom_density(color = "red", size = 1) +
  labs(title = "Distribution of Miles Per Gallon",
       x = "Miles Per Gallon",
       y = "Density") +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Here’s what we did: 1. Create a histogram with geom_histogram(), setting y = ..density.. for density scale. 2. Overlay a density curve with geom_density(). 3. Customize colors and labels for clarity.

Exercises

  1. Using the diamonds dataset, create a histogram of the ‘price’ variable. Experiment with different bin widths to see how it affects the visualization. Add a density curve on top of the histogram. Include appropriate labels and a title.

  2. With the faithful dataset (built into R), create two density plots on the same graph: one for eruption duration and one for waiting time between eruptions. Use different colors for each density curve and add a legend. Normalize the scales so that both curves use the same y-axis. Add appropriate labels and a title.

7. Faceting for Multi-panel Plots

Introduction

Faceting is a powerful technique in data visualization that allows you to create multiple panels or subplots based on categorical variables. This approach is particularly useful when you want to compare patterns across different subgroups of your data.

Purpose and Utility

Faceting is especially useful for: - Comparing trends or patterns across different categories - Visualizing how the relationship between variables changes across different groups - Displaying multiple aspects of a dataset in a single, organized figure - Reducing overplotting in complex datasets

You might use faceting when: - Comparing sales trends across different regions over time - Analyzing how the relationship between two variables varies across different categories - Visualizing multiple related metrics for different groups - Exploring how a distribution changes based on one or more categorical variables

ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~cyl, nrow = 1) +
  labs(title = "Weight vs. MPG by Cylinders and Transmission",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Transmission") +
  theme_bw() +
  scale_color_brewer(palette = "Set1", labels = c("Automatic", "Manual"))
`geom_smooth()` using formula = 'y ~ x'

In this example: 1. We use facet_wrap() to create separate panels for each cylinder count. 2. Add trend lines with geom_smooth(). 3. Color points and lines by transmission type. 4. Customize labels and theme for better readability.

Exercises

  1. Using the diamonds dataset, create a scatter plot of price vs. carat. Facet the plot by cut, creating a 2x3 grid of subplots. Color the points by clarity. Add a smooth trend line to each facet. Include appropriate labels and a title.

  2. With the mpg dataset, create a box plot of highway fuel efficiency (hwy) for different car classes. Facet the plot by the number of cylinders (cyl). Color the boxes by the type of drive (drv). Arrange the facets in a single row. Add appropriate labels and a title.

Conclusion

This lecture has covered a range of plotting techniques in R, from basic scatter plots to more complex, multi-layered visualizations. Remember, the key to effective data visualization is choosing the right plot type for your data and research question. Practice with different datasets and experiment with various ggplot2 functions to become proficient in creating informative and visually appealing plots.

Additional Resources

Happy plotting!

---
title: "Creating Plots from Dataframes in R"
author: "Nayel Bettache"
date: "2024-09-18"
output: html_notebook
editor_options: 
  markdown: 
    wrap: 72
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Introduction to Plotting with R

Welcome to our comprehensive lecture on creating plots from dataframes
in R! Data visualization is a crucial skill in data analysis, allowing
us to communicate complex information clearly and efficiently. In this
session, we'll explore various plotting techniques using built-in R
dataframes and the powerful `ggplot2` library.

Throughout this lecture, we'll cover different types of plots, their
purposes, and when to use them. Each section will include detailed
explanations and practical exercises to reinforce your learning.

## 1. Basic Plotting with Base R

### Introduction

We'll start our journey with R's built-in plotting functions. These
functions provide a quick and straightforward way to visualize data.
While they may not be as flexible as more advanced libraries,
understanding base R plotting is fundamental and can be useful for quick
data exploration.

### Purpose and Utility

Scatter plots, which we'll create in this section, are excellent for
visualizing relationships between two continuous variables. They're
particularly useful when you want to: - Identify correlations between
variables - Detect outliers or unusual patterns in your data -
Understand the distribution of data points across two dimensions

Scatter plots are widely used in various fields, including: - Economics:
plotting GDP against life expectancy - Biology: comparing gene
expression levels - Environmental science: examining the relationship
between temperature and pollution levels

```{r}
# Load the mtcars dataset
data(mtcars)

# Create a simple scatter plot
plot(mtcars$wt, mtcars$mpg, 
     main = "Car Weight vs. Miles Per Gallon",
     xlab = "Weight (1000 lbs)", 
     ylab = "Miles Per Gallon",
     pch = 19, 
     col = "blue")
```

In this example, we: 1. Load the `mtcars` dataset, which is built into
R. 2. Use the `plot()` function to create a scatter plot. 3. Set the
main title with `main`, x-axis label with `xlab`, and y-axis label with
`ylab`. 4. Use `pch = 19` for solid circle points and `col = "blue"` for
blue color.

### Exercises

1.  Create a scatter plot using the `mtcars` dataset to visualize the
    relationship between horsepower (`hp`) and quarter-mile time
    (`qsec`). Use red triangles for the points.

2.  Using the `iris` dataset (another built-in R dataset), create a
    scatter plot of sepal length vs. sepal width. Color the points based
    on the species. Hint: You'll need to use the `col` parameter with a
    vector of colors corresponding to the species.

## 2. Introduction to ggplot2

### Introduction

Now we'll dive into `ggplot2`, a powerful and flexible plotting library
in R. `ggplot2` is based on the Grammar of Graphics, a coherent system
for describing and building graphs. This system allows for highly
customizable and layered graphics.

### Purpose and Utility

The `ggplot2` library offers several advantages over base R plotting: -
Consistent and intuitive syntax - Layered approach to building complex
graphics - Beautiful default aesthetics - Extensive customization
options

`ggplot2` is particularly useful when: - Creating publication-quality
graphics - Building complex, multi-layered plots - Needing to quickly
change aesthetic properties of plots - Working with large datasets

```{r}
# Install and load ggplot2 if not already installed
# install.packages("ggplot2")
library(ggplot2)

# Create a scatter plot using ggplot2
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(title = "Car Weight vs. Miles Per Gallon",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon") +
  theme_minimal()
```

Here's what we did: 1. Load the `ggplot2` library. 2. Use `ggplot()` to
initialize the plot, specifying the data and aesthetics. 3. Add points
with `geom_point()`. 4. Set labels with `labs()`. 5. Apply a minimal
theme with `theme_minimal()`.

### Exercises

1.  Using `ggplot2` and the `economics` dataset (comes with ggplot2),
    create a line plot of unemployment over time. Use the `date` column
    for the x-axis and `unemploy` for the y-axis. Add appropriate labels
    and a title.

2.  With the `mpg` dataset (also included in ggplot2), create a scatter
    plot of engine displacement (`displ`) vs. highway miles per gallon
    (`hwy`). Color the points by the `class` of the vehicle. Add a title
    and appropriate axis labels.

## 3. Enhancing Plots with Color and Shape

### Introduction

In this section, we'll explore how to enhance our plots by incorporating
additional variables through color and shape. This technique allows us
to display multidimensional data in a two-dimensional plot, increasing
the information density of our visualizations.

### Purpose and Utility

Adding color and shape to plots serves several important purposes: -
Grouping: It helps viewers quickly identify different categories or
groups within the data. - Pattern recognition: It makes it easier to
spot trends or patterns specific to certain groups. - Information
density: It allows for the representation of additional variables
without adding more dimensions to the plot.

This technique is particularly useful when: - Comparing multiple
categories within a dataset - Identifying how different factors interact
with the main variables being plotted - Presenting complex,
multivariable data in a single, comprehensible visualization

```{r}
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(cyl), shape = factor(am))) +
  geom_point(size = 3) +
  labs(title = "Car Weight vs. MPG",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Cylinders",
       shape = "Transmission") +
  scale_color_brewer(palette = "Set1") +
  theme_light()
```

In this example: 1. We use `color = factor(cyl)` to color points by
number of cylinders. 2. `shape = factor(am)` changes point shapes based
on transmission type. 3. `scale_color_brewer()` applies a color palette
from ColorBrewer. 4. `theme_light()` gives a light background theme.

### Exercises

1.  Using the `diamonds` dataset (included in ggplot2), create a scatter
    plot of price vs. carat. Use color to represent the cut quality and
    shape to represent the clarity. Add appropriate labels and a title.

2.  With the `iris` dataset, create a scatter plot of petal length vs.
    petal width. Use color to represent the species. Instead of using
    different shapes, vary the size of the points based on the sepal
    width. Add a legend for both color and size.

## 4. Creating Bar Plots

### Introduction

Bar plots are one of the most common and effective ways to visualize
categorical data. They allow for easy comparison of quantities across
different categories or groups.

### Purpose and Utility

Bar plots are particularly useful for: - Comparing quantities or
frequencies across different categories - Displaying the distribution of
a categorical variable - Showing changes in a quantity over time (when
categories are time periods) - Presenting survey results or other
categorical data

You might use bar plots when: - Analyzing market share across different
products or companies - Comparing sales figures across different
regions - Visualizing the distribution of responses in a survey -
Presenting budget allocations across different departments

```{r}
# Prepare data
cylinders <- as.data.frame(table(mtcars$cyl))
colnames(cylinders) <- c("Cylinders", "Count")

ggplot(cylinders, aes(x = Cylinders, y = Count, fill = Cylinders)) +
  geom_bar(stat = "identity") +
  labs(title = "Number of Cars by Cylinder Count",
       x = "Number of Cylinders",
       y = "Count") +
  theme_classic() +
  scale_fill_brewer(palette = "Pastel1")
```

Here's what we did: 1. Create a summary dataframe of cylinder counts. 2.
Use `geom_bar()` with `stat = "identity"` to create bars of specified
heights. 3. Fill bars with different colors based on cylinder count. 4.
Apply a classic theme and a pastel color palette.

### Exercises

1.  Using the `mpg` dataset, create a bar plot showing the count of cars
    for each manufacturer. Order the bars from highest to lowest count.
    Add appropriate labels and a title.

2.  With the `diamonds` dataset, create a stacked bar plot showing the
    proportion of different cuts (fair, good, very good, premium, ideal)
    for each clarity category. Use different colors for each cut. Add a
    legend and appropriate labels.

## 5. Box Plots for Comparing Distributions

### Introduction

Box plots, also known as box-and-whisker plots, are an excellent tool
for visualizing the distribution of a continuous variable across
different categories. They provide a concise summary of the data's
central tendency, spread, and potential outliers.

### Purpose and Utility

Box plots are particularly useful for: - Comparing distributions across
different groups or categories - Identifying the median, quartiles, and
potential outliers in a dataset - Detecting skewness in the data
distribution - Comparing the spread of data across different groups

You might use box plots when: - Comparing salary distributions across
different departments - Analyzing the distribution of test scores across
different schools - Examining the variability of measurement data in
scientific experiments - Comparing the performance of different
algorithms or methods

```{r}
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot() +
  labs(title = "Distribution of MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Miles Per Gallon") +
  theme_bw() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")
```

In this example: 1. We create box plots using `geom_boxplot()`. 2. Group
and fill by number of cylinders. 3. Remove the legend as it's redundant
with x-axis labels.

### Exercises

1.  Using the `diamonds` dataset, create a box plot showing the
    distribution of price for each cut category. Add color to the boxes
    based on the cut. Include appropriate labels and a title.

2.  With the `gapminder` dataset (you may need to install the gapminder
    package), create a box plot showing the distribution of life
    expectancy for each continent. Arrange the continents in descending
    order of median life expectancy. Add color and appropriate labels.

## 6. Histograms and Density Plots

### Introduction

Histograms and density plots are powerful tools for visualizing the
distribution of a single continuous variable. They provide insights into
the shape, central tendency, and spread of the data.

### Purpose and Utility

Histograms and density plots are particularly useful for: - Visualizing
the overall distribution of a continuous variable - Identifying the
mode(s) of a distribution - Detecting skewness or unusual patterns in
the data - Comparing the distribution of a variable across different
groups

You might use these plots when: - Analyzing the distribution of ages in
a population - Examining the distribution of response times in a
psychology experiment - Investigating the distribution of prices in a
real estate market - Comparing the distribution of a variable before and
after an intervention

```{r}
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(aes(y = ..density..), binwidth = 2, fill = "skyblue", color = "black") +
  geom_density(color = "red", size = 1) +
  labs(title = "Distribution of Miles Per Gallon",
       x = "Miles Per Gallon",
       y = "Density") +
  theme_minimal()
```

Here's what we did: 1. Create a histogram with `geom_histogram()`,
setting `y = ..density..` for density scale. 2. Overlay a density curve
with `geom_density()`. 3. Customize colors and labels for clarity.

### Exercises

1.  Using the `diamonds` dataset, create a histogram of the 'price'
    variable. Experiment with different bin widths to see how it affects
    the visualization. Add a density curve on top of the histogram.
    Include appropriate labels and a title.

2.  With the `faithful` dataset (built into R), create two density plots
    on the same graph: one for eruption duration and one for waiting
    time between eruptions. Use different colors for each density curve
    and add a legend. Normalize the scales so that both curves use the
    same y-axis. Add appropriate labels and a title.

## 7. Faceting for Multi-panel Plots

### Introduction

Faceting is a powerful technique in data visualization that allows you
to create multiple panels or subplots based on categorical variables.
This approach is particularly useful when you want to compare patterns
across different subgroups of your data.

### Purpose and Utility

Faceting is especially useful for: - Comparing trends or patterns across
different categories - Visualizing how the relationship between
variables changes across different groups - Displaying multiple aspects
of a dataset in a single, organized figure - Reducing overplotting in
complex datasets

You might use faceting when: - Comparing sales trends across different
regions over time - Analyzing how the relationship between two variables
varies across different categories - Visualizing multiple related
metrics for different groups - Exploring how a distribution changes
based on one or more categorical variables

```{r}
ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~cyl, nrow = 1) +
  labs(title = "Weight vs. MPG by Cylinders and Transmission",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon",
       color = "Transmission") +
  theme_bw() +
  scale_color_brewer(palette = "Set1", labels = c("Automatic", "Manual"))
```

In this example: 1. We use `facet_wrap()` to create separate panels for
each cylinder count. 2. Add trend lines with `geom_smooth()`. 3. Color
points and lines by transmission type. 4. Customize labels and theme for
better readability.

### Exercises

1.  Using the `diamonds` dataset, create a scatter plot of price vs.
    carat. Facet the plot by cut, creating a 2x3 grid of subplots. Color
    the points by clarity. Add a smooth trend line to each facet.
    Include appropriate labels and a title.

2.  With the `mpg` dataset, create a box plot of highway fuel efficiency
    (hwy) for different car classes. Facet the plot by the number of
    cylinders (cyl). Color the boxes by the type of drive (drv). Arrange
    the facets in a single row. Add appropriate labels and a title.

## Conclusion

This lecture has covered a range of plotting techniques in R, from basic
scatter plots to more complex, multi-layered visualizations. Remember,
the key to effective data visualization is choosing the right plot type
for your data and research question. Practice with different datasets
and experiment with various `ggplot2` functions to become proficient in
creating informative and visually appealing plots.

## Additional Resources

-   ggplot2 documentation: <https://ggplot2.tidyverse.org/>
-   R Graphics Cookbook: <https://r-graphics.org/>
-   Datacamp ggplot2 tutorial:
    <https://www.datacamp.com/community/tutorials/ggplot2-tutorial-r>

Happy plotting!
