[The introduction remains the same as in the previous version]
[This section remains the same until the exercises]
mtcars
dataset to
visualize the relationship between horsepower (hp
) and
quarter-mile time (qsec
). Use red triangles for the
points.Solution:
plot(mtcars$hp, mtcars$qsec,
main = "Horsepower vs. Quarter-Mile Time",
xlab = "Horsepower",
ylab = "Quarter-Mile Time (seconds)",
pch = 17,
col = "red")
iris
dataset (another built-in R dataset),
create a scatter plot of sepal length vs. sepal width. Color the points
based on the species.colors <- c("setosa" = "red", "versicolor" = "blue", "virginica" = "green")
species_colors <- colors[iris$Species]
colors[iris$Species]
setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
"red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red"
setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
"red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red"
setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
"red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red"
setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
"red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red" "red"
setosa setosa versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
"red" "red" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
"blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
"blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
"blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"
versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica virginica virginica virginica
"blue" "blue" "blue" "blue" "green" "green" "green" "green" "green" "green" "green" "green"
virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica
"green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green"
virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica
"green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green"
virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica
"green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green" "green"
virginica virginica virginica virginica virginica virginica
"green" "green" "green" "green" "green" "green"
Solution:
# Create a color vector based on species
colors <- c("setosa" = "red", "versicolor" = "blue", "virginica" = "green")
species_colors <- colors[iris$Species]
plot(iris$Sepal.Length, iris$Sepal.Width,
main = "Iris Sepal Length vs. Sepal Width",
xlab = "Sepal Length",
ylab = "Sepal Width",
pch = 19,
col = species_colors)
# Add a legend
legend("topright", legend = levels(iris$Species),
col = colors, pch = 19, title = "Species")
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
labs(title = "Iris Sepal Length vs. Sepal Width",
x = "Sepal Length",
y = "Sepal Width",
color = "Species") +
theme_minimal()
[This section remains the same until the exercises]
ggplot2
and the economics
dataset
(comes with ggplot2), create a line plot of unemployment over time. Use
the date
column for the x-axis and unemploy
for the y-axis. Add appropriate labels and a title.Solution:
ggplot(economics, aes(x = date, y = unemploy)) +
geom_line(color = "blue") +
labs(title = "Unemployment Over Time",
x = "Date",
y = "Number of Unemployed (in thousands)") +
theme_minimal()
mpg
dataset (also included in ggplot2), create
a scatter plot of engine displacement (displ
) vs. highway
miles per gallon (hwy
). Color the points by the
class
of the vehicle. Add a title and appropriate axis
labels.Solution:
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
labs(title = "Engine Displacement vs. Highway MPG",
x = "Engine Displacement (L)",
y = "Highway MPG",
color = "Vehicle Class") +
theme_minimal()
[This section remains the same until the exercises]
diamonds
dataset (included in ggplot2),
create a scatter plot of price vs. carat. Use color to represent the cut
quality and shape to represent the clarity. Add appropriate labels and a
title.Solution:
ggplot(diamonds, aes(x = carat, y = price, color = cut, shape = clarity)) +
geom_point(alpha = .6) +
labs(title = "Diamond Price vs. Carat",
x = "Carat",
y = "Price (USD)",
color = "Cut Quality",
shape = "Clarity") +
theme_minimal() +
scale_color_brewer(palette = "Set1")
Warning: Using shapes for an ordinal variable is not advised
Warning: The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate
ℹ you have requested 8 values. Consider specifying shapes manually if you need that many have them.
Warning: Removed 5445 rows containing missing values or values outside the scale range (`geom_point()`).
iris
dataset, create a scatter plot of petal
length vs. petal width. Use color to represent the species. Instead of
using different shapes, vary the size of the points based on the sepal
width. Add a legend for both color and size.Solution:
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species, size = Sepal.Width)) +
geom_point(alpha = 0.6) +
labs(title = "Iris Petal Length vs. Petal Width",
x = "Petal Length",
y = "Petal Width",
color = "Species",
size = "Sepal Width") +
theme_minimal() +
scale_color_brewer(palette = "Set2")
[This section remains the same until the exercises]
mpg
dataset, create a bar plot showing the
count of cars for each manufacturer. Order the bars from highest to
lowest count. Add appropriate labels and a title.Solution:
# Prepare data
manufacturer_counts <- mpg %>%
count(manufacturer) %>%
arrange(desc(n))
ggplot(manufacturer_counts, aes(x = reorder(manufacturer, -n), y = n)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Number of Cars by Manufacturer",
x = "Manufacturer",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
diamonds
dataset, create a stacked bar plot
showing the proportion of different cuts (fair, good, very good,
premium, ideal) for each clarity category. Use different colors for each
cut. Add a legend and appropriate labels.Solution:
ggplot(diamonds, aes(x = clarity, fill = cut)) +
geom_bar(position = "fill") +
labs(title = "Proportion of Diamond Cuts by Clarity",
x = "Clarity",
y = "Proportion",
fill = "Cut") +
theme_minimal() +
scale_fill_brewer(palette = "Set3")
[This section remains the same until the exercises]
diamonds
dataset, create a box plot showing
the distribution of price for each cut category. Add color to the boxes
based on the cut. Include appropriate labels and a title.Solution:
ggplot(diamonds, aes(x = cut, y = price, fill = cut)) +
geom_boxplot() +
labs(title = "Distribution of Diamond Prices by Cut",
x = "Cut",
y = "Price (USD)") +
theme_minimal() +
scale_fill_brewer(palette = "Set2") +
theme(legend.position = "none")
gapminder
dataset (you may need to install the
gapminder package), create a box plot showing the distribution of life
expectancy for each continent. Arrange the continents in descending
order of median life expectancy. Add color and appropriate labels.Solution:
# Install and load gapminder if not already installed
# install.packages("gapminder")
library(gapminder)
# Calculate median life expectancy for each continent
continent_order <- gapminder %>%
group_by(continent) %>%
summarize(median_lifeExp = median(lifeExp)) %>%
arrange(desc(median_lifeExp)) %>%
pull(continent)
ggplot(gapminder, aes(x = factor(continent, levels = continent_order), y = lifeExp, fill = continent)) +
geom_boxplot() +
labs(title = "Distribution of Life Expectancy by Continent",
x = "Continent",
y = "Life Expectancy (years)") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
theme(legend.position = "none")
[This section remains the same until the exercises]
diamonds
dataset, create a histogram of the
‘price’ variable. Experiment with different bin widths to see how it
affects the visualization. Add a density curve on top of the histogram.
Include appropriate labels and a title.Solution:
ggplot(diamonds, aes(x = price)) +
geom_histogram(aes(y = ..density..), binwidth = 500, fill = "lightblue", color = "black") +
geom_density(color = "red", size = 1) +
labs(title = "Distribution of Diamond Prices",
x = "Price (USD)",
y = "Density") +
theme_minimal()
faithful
dataset (built into R), create two
density plots on the same graph: one for eruption duration and one for
waiting time between eruptions. Use different colors for each density
curve and add a legend. Normalize the scales so that both curves use the
same y-axis. Add appropriate labels and a title.Solution:
# Prepare data
faithful_long <- faithful %>%
pivot_longer(cols = everything(), names_to = "variable", values_to = "value")
ggplot(faithful_long, aes(x = value, fill = variable)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plots of Old Faithful Eruptions",
x = "Duration (minutes) / Waiting Time (minutes)",
y = "Density",
fill = "Variable") +
scale_fill_manual(values = c("blue", "red"),
labels = c("Eruption duration", "Waiting time")) +
theme_minimal()
[This section remains the same until the exercises]
diamonds
dataset, create a scatter plot of
price vs. carat. Facet the plot by cut, creating a 2x3 grid of subplots.
Color the points by clarity. Add a smooth trend line to each facet.
Include appropriate labels and a title.Solution:
ggplot(diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~cut, nrow = 2) +
labs(title = "Diamond Price vs. Carat by Cut and Clarity",
x = "Carat",
y = "Price (USD)",
color = "Clarity") +
theme_minimal() +
scale_color_brewer(palette = "Set1")
mpg
dataset, create a box plot of highway fuel
efficiency (hwy) for different car classes. Facet the plot by the number
of cylinders (cyl). Color the boxes by the type of drive (drv). Arrange
the facets in a single row. Add appropriate labels and a title.Solution:
ggplot(mpg, aes(x = class, y = hwy, fill = drv)) +
geom_boxplot() +
facet_wrap(~cyl, nrow = 1) +
labs(title = "Highway MPG by Class, Cylinders, and Drive Type",
x = "Vehicle Class",
y = "Highway MPG",
fill = "Drive Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_brewer(palette = "Set2")
[The conclusion remains the same as in the previous version]