How to Build an R Markdown File

R Markdown (.Rmd) is a powerful tool that allows you to combine code and text in a single document, which can be rendered into different formats (HTML, PDF, Word).

Steps to Create an R Markdown File

  1. Open RStudio.
  2. Go to File -> New File -> R Markdown....
  3. Choose a title, author name, and output format (e.g., HTML).
  4. Write your document in the editor. You can include:
    • Plain text: Just type normally for descriptions.
    • Code chunks: Use three backticks ```{r} to include R code.
    • Markdown syntax for headers, bold, italics, lists, etc.
  5. To run the file, click the Knit button at the top of the editor.

Example of a Simple Rmd File

# This is a header

## This is a subheader

Some text here.
# R code here
summary(cars)

The S3 System

The S3 system is the simplest and most widely used system in R for object-oriented programming. The idea is to allow you to group related information (data) together and write code that can handle different types of data in different ways. This makes your code more organized and flexible.

R has multiple object-oriented systems, but S3 is the easiest to learn and use. Let’s start by understanding three key terms in the S3 system: classes, generic functions, and methods.

Basic Concepts of S3

  • Class: A class is a type that is assigned to an object using an attribute.

Think of a class as a label you put on an object to tell R what kind of data it is working with. Just like how a real-world object, such as a car, can belong to a class like “Vehicle” or “Sedan,” objects in R can belong to a class like “numeric,” “character,” or a custom class you create.

In S3, assigning a class to an object is as simple as giving it an attribute. This tells R how the object should be treated.

# Create a list to represent a person
person <- list(name = "Alice", age = 30)

# Assign the class "Person" to this list
class(person) <- "Person"

In this case, we are telling R that person belongs to the class “Person.” This gives R the context to treat this object differently from other objects, like numbers or vectors.

  • Generic function: A function that behaves differently based on the class of its argument.

A generic function is a function that does different things depending on the class of the object you give it. It’s called “generic” because it doesn’t know in advance what type of data it will receive. Instead, it waits until you give it an object and then decides how to handle that object based on its class.

The simplest example of a generic function in R is the print() function. You might have used print() before to display something on the screen:

print(42)
print("Hello, world!")

What’s happening here is that print() is a generic function. When you give it a number (like 42), it uses one method to print it. When you give it a string (like “Hello, world!”), it uses a different method. The method that gets called depends on the class of the object you are printing.

  • Method: A function that implements behavior for a specific class.

A method is the specific function that gets executed when a generic function is called on an object of a certain class. Think of it like this: the generic function is a manager, and the method is the worker that actually does the job. The manager (generic function) checks the type of object you passed and sends it to the appropriate worker (method) to handle it.

For example, if you give the print() function a number, the method for the “numeric” class is called to display it. If you give it a string, the method for the “character” class is called instead.

In the S3 system, methods are defined in the format generic.class. So if you create your own class, say “Person,” you can also create a method that handles how objects of class “Person” are printed:

# Define a method for printing objects of class "Person"
print.Person <- function(x) {
  cat("Name:", x$name, "\\n Age:", x$age, "\\n")
}

# Create an object of class "Person"
person <- list(name = "Alice", age = 30)
class(person) <- "Person"

# Call the generic print() function
print(person)

In this example, when you call print(person), R looks at the class of person, sees that it’s “Person,” and uses the print.Person method to display the information.

In R, the cat() function is used to concatenate and print objects, specifically to output text in a clean and formatted way. Unlike print(), which displays objects in a somewhat structured format (like showing quotes around character strings or printing lists in a more detailed manner), cat() outputs the text exactly as you specify, without extra formatting.

How Does S3 Work?

S3 is described as dynamic because methods are not tightly attached to objects. This means that you don’t have to pre-define a class with strict rules like in other programming languages (such as Java or Python). Instead, methods are determined when you call a generic function. Based on the class of the object, R looks for the corresponding method to execute.

To summarize:

  1. Class: The label or type you assign to an object, which tells R what kind of object it is.
  2. Generic function: A function that behaves differently based on the class of the object it receives.
  3. Method: The specific function that gets executed based on the class of the object passed to the generic function.

Why is the S3 System Useful?

The flexibility of S3 makes it ideal for beginners because you don’t need to define everything in advance. You can easily create new classes and methods on the fly, and R will handle the rest for you.

For example, if you are analyzing different types of data (e.g., numbers, people, cars), you can create custom classes and methods for each type. This allows you to write code that’s more organized and easier to maintain as your project grows.

Exercises

  1. Create an object with two fields and assign an S3 class to it.
  2. Write a custom print method for your class.

Attributes

In R, attributes are extra pieces of information that you can attach to an object. They act as metadata, providing context or describing some properties of the object. Attributes can hold various information, such as names, dimensions, or even custom information specific to a user-defined object.

Attributes are an essential part of R’s flexibility. They allow you to extend the behavior of objects without changing their underlying structure. For example, a numeric vector might have additional information like units, or a matrix might have row and column names. These attributes don’t change the values of the object, but they give R more details about how to treat it.

Basic Concepts of Attributes

  1. What is an Attribute?

An attribute is simply a piece of information (a label, a name, or a tag) that can be attached to any object in R. It doesn’t modify the actual data in the object, but it can provide extra information or context that might be useful in certain operations.

In R, common attributes include:

names (for named vectors) dim (for matrices or arrays) class (for objects of a particular class, like factors) tsp (for time series data) You can think of attributes as an optional “extra layer” of information about the object.

  1. Attaching and Retrieving Attributes

You can attach attributes to almost any object in R using the attr() function. To retrieve an attribute, you also use the attr() function. Let’s see how you can assign and retrieve an attribute:

# Create a numeric vector
vec <- c(1, 2, 3)

# Add a custom attribute
attr(vec, "description") <- "This is a numeric vector"

# Retrieve the attribute
attr(vec, "description")

In this example:

The numeric vector vec is just a simple set of numbers (1, 2, 3). We use attr() to add an attribute called “description” to this vector. The attribute holds a short description: “This is a numeric vector”. We can then retrieve the description using attr(vec, “description”). Notice that the values of the vector remain unchanged. The attribute only serves as extra information.

  1. Attributes and Core R Structures

Attributes are often used behind the scenes with common R objects like matrices, data frames, and time series. These objects have special attributes that make them behave differently from other objects, even though their underlying structure might be similar to simpler objects like vectors.

Examples of attributes in common R structures:

Names for vectors: You can attach names to the elements of a vector using the names() function, which is essentially setting an attribute.

# Create a numeric vector and assign names
vec <- c(1, 2, 3)
names(vec) <- c("A", "B", "C")

# The 'names' attribute now holds the labels "A", "B", and "C"
attr(vec, "names")

Dimensions for matrices: Matrices in R are just numeric vectors with a dim attribute that specifies the number of rows and columns.

# Create a numeric vector
vec <- c(1, 2, 3, 4, 5, 6)

# Assign dimensions (2 rows, 3 columns)
dim(vec) <- c(2, 3)

# The vector now behaves like a matrix
print(vec)

In this example, by adding a dim attribute, the vector vec becomes a 2x3 matrix.

Setting and Getting Attributes

# Create a vector
x <- c(1, 2, 3)

# Set an attribute
attr(x, "description") <- "A simple vector"

# Get the attribute
attr(x, "description")

Common Attributes

  • names(): Names for the elements of an object.
  • dim(): Dimensions for matrices and arrays.
  • class(): The class of an object.
  • attributes(): Retrieve all attributes of an object.

Why Are Attributes Useful?

Attributes are helpful because they allow you to add information to objects without modifying their core structure. For example, in data analysis, you might want to label certain data with units, descriptions, or class information without changing the actual values of the data.

Flexibility: You can store additional context (such as metadata) without affecting the data itself. Custom behavior: You can define attributes like class and create custom methods (as we saw with the S3 system) that give special behavior to certain objects.

Exercises

  1. Create a numeric vector and assign a custom description attribute to it. Check the attribute using attr().
  2. Explore the dim attribute by creating a matrix and changing its dimensions.

Generic Functions

Generic functions are a core concept in the S3 system. A generic function is a special type of function that behaves differently depending on the class of the object passed to it. When you call a generic function, R doesn’t immediately execute a specific piece of code. Instead, it checks the class of the object you pass to it and then decides which method (i.e., function) to use based on the object’s class.

This flexibility allows you to use the same function name for a wide range of classes, with each class having its own specific implementation of the function. Let’s break this down step by step for beginners.

What is a Generic Function?

A generic function is like a dispatcher. Its role is to figure out what type of object it has been given and then send it to the appropriate function (known as a method) that’s designed to handle that specific type of object. It does not actually implement the operation itself, but rather determines which method (specific function) should be used.

Here’s a simple analogy: Imagine you have a delivery service, and you can deliver different items like packages, letters, or groceries. When you receive an item to deliver, you check what kind of item it is, and based on that, you follow the specific delivery process for that item. The generic function is like the person in charge of deciding which delivery process to follow based on the type of item.

Generic Functions in Practice

In R, many common functions are generic functions. Some examples include print(), summary(), and plot(). These functions behave differently depending on the type (or class) of the object you pass to them.

For example, calling print() on a numeric value will print the number, while calling print() on a data frame will print the contents of the data frame in a table-like format. The print() function is a generic function because it behaves differently based on the class of the object being printed.

Example 1: Generic print() function

# Print a number
print(42)
# Output: [1] 42

# Print a character string
print("Hello, world!")
# Output: [1] "Hello, world!"

In this case, the print() function is generic: it prints the numeric value 42 in one way and the character string “Hello, world!” in another way. R decides which method to use based on the type of the object.

How Generic Functions Work in S3

Let’s look at the steps involved in how a generic function works under the S3 system.

  1. You call the generic function: When you pass an object to a generic function, R doesn’t immediately know what to do with it. Instead, it looks at the class of the object.

  2. R looks up the class of the object: R checks the class of the object you passed to the generic function using class().

  3. **R looks for the method: Based on the class of the object, R looks for a method that corresponds to both the generic function and the class of the object. Methods are named in the format generic.class (e.g., print.numeric for numeric objects).

  4. R runs the method: Once it finds the appropriate method, R executes that method and produces the result.

This process happens every time you call a generic function, which allows for great flexibility. You don’t need to know in advance what type of object will be passed to the function; the correct method will be chosen automatically based on the class of the object.

How to Define a Generic Function

In the S3 system, you can define your own generic functions. This is useful when you want to create functions that can handle different types of objects in different ways.

Let’s go through the steps to create a simple generic function and corresponding methods.

Step 1: Create the Generic Function

You can create a generic function using the UseMethod() function. This function tells R to dispatch the call to the appropriate method based on the class of the object.

# Define a generic function called 'describe'
describe <- function(x) {
  UseMethod("describe")
}

In this case, describe() is now a generic function. It doesn’t do anything by itself, but it will call a method that’s specific to the class of the object x.

Step 2: Create Methods for Specific Classes

Now that we have a generic function, we need to define methods for specific classes. For example, let’s create a method for numeric vectors and another for character vectors.

# Method for numeric vectors
describe.numeric <- function(x) {
  cat("This is a numeric vector with", length(x), "elements.\n")
}

# Method for character vectors
describe.character <- function(x) {
  cat("This is a character vector with", length(x), "elements.\n")
}

Step 3: Call the Generic Function

When you call describe() with an object, R will automatically dispatch the call to the appropriate method based on the class of the object:

# Call describe on a numeric vector
describe(c(1, 2, 3))
# Output: This is a numeric vector with 3 elements.

# Call describe on a character vector
describe(c("apple", "banana", "cherry"))
# Output: This is a character vector with 3 elements.

In each case, R looks at the class of the object passed to describe() and calls either describe.numeric or describe.character based on the class.

Exercises

  1. Define a generic function called describe() that behaves differently for numeric and character vectors. For numeric vectors, it should print “This is a numeric vector with X elements,” and for character vectors, it should print “This is a character vector with X elements.” Create a method for each class.
  2. Create a new class called “Person”. Define a method for the generic function print() that prints a custom message for objects of class “Person.” The message should display the name and age of the person.

Methods

Methods are functions that handle specific types of objects. They are invoked when a generic function is called, and the class of the argument is checked to find the appropriate method.

Defining a Method

In R, methods are the specific implementations of a generic function for objects of a particular class. When you call a generic function (like print() or summary()), R selects the appropriate method based on the class of the object you pass to it. This allows the same function to behave differently depending on the type of data it’s working with.

In simpler terms, methods are the “workers” that carry out the instructions given by a generic function. When you pass an object to a generic function, R identifies what kind of object it is (its class) and then calls the correct method to handle it.

What are Methods?

A method is a function that is associated with a particular class of objects. The name of a method is always in the format generic.class, where:

generic is the name of the generic function. class is the name of the class for which this method applies. For example, if you call the print() function on a data frame, R automatically calls the method print.data.frame(), which knows how to print data frames in a tabular format. If you call print() on a numeric vector, R calls print.numeric(), which prints the numbers.

How Methods Work in S3

Here’s how the process of calling a generic function and selecting a method works in the S3 system:

  1. You call the generic function: You pass an object to the generic function (e.g., print()).

  2. R checks the class of the object: R checks the class of the object using class().

  3. R looks for the method: Based on the class of the object, R looks for a method named generic.class. If it finds this method, it calls it. If no method is found for the specific class, it looks for a default method.

  4. R calls the method: R runs the method that corresponds to the class of the object.

Example of Methods in Action

Let’s use the print() function as an example. When you call print() on an object, R looks at the class of the object and then calls the appropriate method.

Example 1: Printing a Numeric Vector

# Create a numeric vector
num_vec <- c(1, 2, 3, 4)

# Call the print() function
print(num_vec)
# Output: [1] 1 2 3 4

In this case, print() is called on a numeric vector, so R automatically dispatches to the method print.numeric(), which knows how to handle numeric vectors.

Example 2: Printing a Data Frame

# Create a data frame
df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))

# Call the print() function
print(df)
# Output:
#    name age
# 1 Alice  25
# 2   Bob  30

When print() is called on a data frame, R dispatches to the method print.data.frame(), which prints the data frame in a table-like format.

Defining Your Own Methods

One of the strengths of the S3 system is that you can define your own methods for generic functions. Let’s walk through how you can create methods for your own classes.

Step 1: Define a Generic Function

Let’s first define a generic function called describe():

describe <- function(x) {
  UseMethod("describe")
}

This function doesn’t do anything by itself. Instead, it will dispatch to a specific method based on the class of the object x.

Step 2: Create a Class Next, let’s create a new class called “Person”. A “Person” object will store a person’s name and age.

# Create a list representing a Person
person <- list(name = "John Doe", age = 30)

# Assign the class 'Person' to the object
class(person) <- "Person"

Step 3: Define a Method for the “Person” Class Now, let’s define a describe() method for objects of class “Person”. This method will print out information about the person’s name and age.

# Define a method for the 'Person' class
describe.Person <- function(x) {
  cat("This is a person named", x$name, "who is", x$age, "years old.\n")
}

Step 4: Call the Generic Function

When you call describe() on an object of class “Person”, R will dispatch to describe.Person():

# Call the describe() function on the 'person' object
describe(person)
# Output: This is a person named John Doe who is 30 years old.

R looks at the class of person (which is “Person”), and calls the appropriate method, describe.Person().

Method Selection

When R dispatches to a method, it follows a simple process:

  1. Look for a specific method: R checks if a method exists for the class of the object (e.g., describe.Person).
  2. Use a default method: If no specific method is found for the object’s class, R looks for a default method named describe.default(). This is the fallback if no class-specific method exists. For example, if you call describe() on a type of object that doesn’t have a specific method, you can provide a default behavior:
# Define a default method for describe()
describe.default <- function(x) {
  cat("No specific method for this type of object.\n")
}

Exercises

  1. Create a class called “Animal” with attributes species and age. Define a method for the print() generic function that prints the species and age of the animal.
  2. Define a generic function called info(). Create two methods: info.data.frame (for data frames) that prints the number of rows and columns in the data frame, and info.matrix (for matrices) that prints the dimensions of the matrix.

Classes

In R, classes are used to define the type or structure of an object. A class essentially tells R what kind of object it is dealing with, and this in turn determines how certain functions or methods behave when applied to the object. Classes are central to object-oriented programming in R, particularly in the S3 system.

When you assign a class to an object, R knows how to interact with that object using specific methods designed for that class. For example, if you create an object of class “data.frame”, R knows how to print it in a tabular format because there is a print.data.frame() method.

What is a class ?

A class in R is essentially a label that you assign to an object. It helps R know what kind of object it’s working with and how to handle it. The class determines the behavior of generic functions (like print(), summary(), etc.) because these functions will look for methods that correspond to the class of the object.

For example:

  • A numeric vector has the class “numeric”.
  • A data frame has the class “data.frame”.
  • A list can have a custom class that you define (e.g., “Person”). You can check the class of an object using the class() function. You can also assign a new class to an object by modifying its class attribute.

Example: Checking the Class of an Object

# Create a numeric vector
x <- c(1, 2, 3)

# Check the class of the vector
class(x)
# Output: [1] "numeric"

Assigning a Class

In R, you can assign a class to an object using the class() function or by setting the class as an attribute of the object. Once an object has a class, R will treat it according to that class.

Example: Assigning a Custom Class

Let’s create a simple object and assign it a custom class “Person”:

# Create a list representing a person
person <- list(name = "Alice", age = 25)

# Assign the class "Person" to the list
class(person) <- "Person"

# Check the class of the object
class(person)
# Output: [1] "Person"

Multiple classes

In R, objects can have multiple classes. This is often referred to as class inheritance. When an object has more than one class, R will look for methods in the order the classes are specified. The first class has the highest priority, followed by the next, and so on.

You can assign multiple classes to an object by passing a vector of class names to the class() function.

Example: Assigning Multiple Classes

# Create a list representing a student
student <- list(name = "Bob", age = 22)

# Assign two classes: "Student" and "Person"
class(student) <- c("Student", "Person")

# Check the class of the object
class(student)
# Output: [1] "Student" "Person"

In this case, student has both “Student” and “Person” as classes. When R looks for a method, it will first look for Student-specific methods, and if it doesn’t find any, it will then look for Person-specific methods.

Defining Custom Classes in S3

he S3 system allows you to define custom classes and methods for those classes. Custom classes are extremely flexible and easy to create because R doesn’t require a formal structure for them.

Step 1: Create an Object

You can create an object of any type (usually a list) to store the information you want for the class.

# Create a list representing a car
car <- list(brand = "Toyota", year = 2015)

Step 2: Assign a Class

You can then assign a class to this object using the class() function.

# Assign the class "Car" to the object
class(car) <- "Car"

Now, the object car has the class “Car”, and you can define specific methods for that class.

Methods and Classes in S3

Once you’ve defined a custom class, you can create methods specifically for that class. The methods you define will be automatically called when a generic function is applied to an object of that class.

Example: Creating a Print Method for the “Car” Class

You can define a custom print method for objects of class “Car”.

# Define a custom print method for "Car" class
print.Car <- function(x) {
  cat("Car brand:", x$brand, "\nYear:", x$year, "\n")
}

# Test the print method with the "car" object
print(car)
# Output:
# Car brand: Toyota
# Year: 2015

Here:

We defined a method print.Car() for objects of class “Car”. When we call print() on an object of class “Car”, R automatically calls print.Car() and prints the custom message.

Built-in Classes in R

R comes with many built-in classes. Here are some common ones:

  • numeric: For numeric vectors.
  • character: For character vectors (strings).
  • factor: For categorical data.
  • data.frame: For data frames (tabular data).
  • matrix: For matrices.
  • list: For lists (which can store multiple types of data). Each of these classes has methods associated with them, so when you call a generic function like print(), R knows how to handle them.

Example: The Class of a Data Frame

# Create a data frame
df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))

# Check the class of the data frame
class(df)
# Output: [1] "data.frame"

R automatically assigns the class “data.frame” to the df object when it’s created.

Class Inheritance in S3

In S3, classes can inherit behavior from other classes. If R can’t find a method for a specific class, it will look at the next class in line and try to find a method for that one. This is useful when you have a hierarchy of classes, and you want to reuse methods.

Example: Inheritance in Action

# Create a list representing a hybrid vehicle
hybrid <- list(brand = "Toyota", year = 2020, fuel = "Hybrid")

# Assign two classes: "HybridCar" and "Car"
class(hybrid) <- c("HybridCar", "Car")

# Define a print method for the "Car" class
print.Car <- function(x) {
  cat("Car brand:", x$brand, "\nYear:", x$year, "\n")
}

# Call the print method for the "hybrid" object
print(hybrid)
# Output:
# Car brand: Toyota
# Year: 2020

Here, we assigned two classes to the hybrid object: “HybridCar” and “Car”. Since no print.HybridCar() method exists, R falls back to print.Car() and uses that method.

Exercises

  1. Create a class called “Book” with attributes title, author, and pages. Define a method for the print() generic function that prints the book’s title and author.
  2. Create a class called “Employee” with attributes name, position, and salary. Define a method for a generic function info() that prints the employee’s name and position.
