── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.8 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ ggplot2::%+%() masks crayon::%+%()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Data frames in R
rstudio
rstats
Quick bits for manipulating data frames in R.
Some quick bits of code that I reach for often to manipulate
data.frames
in R.
I ❤️ tidyverse so let’s load it first up!
Matching
Create two data frames match order of each based on a column in common
# data frame 1
producers <- data.frame(
surname = c("Tarantino", "Scorsese", "Spielberg", "Hitchcock", "Polanski"),
nationality = c("US", "US", "US", "UK", "Poland"),
stringsAsFactors = FALSE
)
# data frame 2
movies <- data.frame(
surname = c("Spielberg",
"Scorsese",
"Hitchcock",
"Tarantino",
"Polanski"),
title = c("Super 8",
"Taxi Driver",
"Psycho",
"Reservoir Dogs",
"Chinatown"),
stringsAsFactors = FALSE
)
Then we match data frames using the surname
column.
idx <- match(producers$surname, movies$surname)
movies_matched <- movies[idx, ]
Join data frames
Use the full_join
to merge two data frames based on a common column
Join options include:
- inner_join(): includes all rows in x and y.
- left_join(): includes all rows in x.
- right_join(): includes all rows in y.
- full_join(): includes all rows in x or y.
Rename column
Using dplyr to rename a column with specific name we can call.
Alternative way without using dplyr we call the specific column number
colnames(df)[1] <- "newName"
Say you have a vector with the names we can use
colnames(df) <- vector
Maybe col names are contained within a row 2 of the data frame
colnames(df) <- df[2,]
Find and replace
df["colname"][df["colname"] == "existing value"] <- "new value"
Pivot wide and long
Load the palmerpenguins package for some fun example data.
library(palmerpenguins)
data(package = 'palmerpenguins')
df <- penguins_raw
df <- dplyr::select(df, studyName, `Sample Number`, Species, `Culmen Length (mm)`, `Flipper Length (mm)`, `Body Mass (g)`)
Pivot long
df_long <- df %>%
pivot_longer(
cols = 4:length(df),
names_to = "measurements",
values_to = "value")
DT::datatable(df_long)
Pivot wide
df_wide<- df_long %>%
pivot_wider(
names_from = "measurements",
values_from = "value")
DT::datatable(df_wide)
Misc.
Paste options