We are going to explore four functions from the package tidyr
, which belongs to the tidyverse:
gather()
(to be replaced by pivot_longer()
): Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed.
spread()
(to be replaced by pivot_wider()
): Spread a key-value pair across multiple columns. It takes two columns (key & value) and spreads into multiple columns. It makes “long” data wider.
separate()
: Turns a single character column into multiple columns.
unite()
: Convenience function to paste multiple columns into one.
For more details see this chapter on tidy data. More functions are described here.

The following example is taken from financetrain.com.
Function gather()
: It takes multiple columns as input which are selected and transforms them into key-value pairs. It makes “wide” data longer. Here’s an example of how you might use gather()
on a sample dataset. In this experiment we’ve given sample data for people who applied for loan in a bank.
library(tidyverse)
personalDetails <- data.frame(
name = c("Shankar", "Shashi"),
age = c(30,45),
job_status = c("employed","unemployed"),
education = c("masters","tertiary")
)
personalDetails
R>> name age job_status education
R>> 1 Shankar 30 employed masters
R>> 2 Shashi 45 unemployed tertiary
personalDetails %>%
gather(c("job_status", "education"), key = "key", value = "value")
R>> name age key value
R>> 1 Shankar 30 job_status employed
R>> 2 Shashi 45 job_status unemployed
R>> 3 Shankar 30 education masters
R>> 4 Shashi 45 education tertiary
Function gather()
– Stock Data Example: Let us consider a sample dataframe of stocks. In the dataframe, we have 3 stocks A, B, and C with their date-wise stock closing prices.
stocks <- data.frame(
Date = as.Date('2009-01-01') + 0:6,
A = runif(7, 100, 145),
B = runif(7, 350, 560),
C = runif(7, 34, 89)
)
stocks
R>> Date A B C
R>> 1 2009-01-01 115.9862 385.3819 61.72505
R>> 2 2009-01-02 142.8031 512.1114 78.66316
R>> 3 2009-01-03 107.0498 436.1819 42.42916
R>> 4 2009-01-04 120.5471 394.4558 67.39277
R>> 5 2009-01-05 111.6828 537.5897 63.22199
R>> 6 2009-01-06 100.0423 500.4622 40.56708
R>> 7 2009-01-07 105.2267 534.0351 45.21401
In some situations, we may want to have one column called “stock” and another column called “prices” and then each row containing the observations. This can be done as follows:
stocks_long <- stocks %>%
gather(-"Date", key = "stock", value = "price")
stocks_long
R>> Date stock price
R>> 1 2009-01-01 A 115.98617
R>> 2 2009-01-02 A 142.80315
R>> 3 2009-01-03 A 107.04982
R>> 4 2009-01-04 A 120.54713
R>> 5 2009-01-05 A 111.68283
R>> 6 2009-01-06 A 100.04230
R>> 7 2009-01-07 A 105.22675
R>> 8 2009-01-01 B 385.38192
R>> 9 2009-01-02 B 512.11140
R>> 10 2009-01-03 B 436.18192
R>> 11 2009-01-04 B 394.45576
R>> 12 2009-01-05 B 537.58966
R>> 13 2009-01-06 B 500.46223
R>> 14 2009-01-07 B 534.03513
R>> 15 2009-01-01 C 61.72505
R>> 16 2009-01-02 C 78.66316
R>> 17 2009-01-03 C 42.42916
R>> 18 2009-01-04 C 67.39277
R>> 19 2009-01-05 C 63.22199
R>> 20 2009-01-06 C 40.56708
R>> 21 2009-01-07 C 45.21401
Function spread()
: It spreads a key-value pair across multiple columns. It takes two columns (key & value) and spreads into multiple columns. It makes “long” data wider. We can use our stock prices data to demonstrate the spread()
function. Consider the long-form data that we got after applying the gather()
function. We can bring it back to its original form using the spread()
function.
stocks_long %>%
spread("stock", "price")
R>> Date A B C
R>> 1 2009-01-01 115.9862 385.3819 61.72505
R>> 2 2009-01-02 142.8031 512.1114 78.66316
R>> 3 2009-01-03 107.0498 436.1819 42.42916
R>> 4 2009-01-04 120.5471 394.4558 67.39277
R>> 5 2009-01-05 111.6828 537.5897 63.22199
R>> 6 2009-01-06 100.0423 500.4622 40.56708
R>> 7 2009-01-07 105.2267 534.0351 45.21401
Alternatively, we can also spread it with another column such as “Date”, which will create a column for each date.
stocks_long %>%
spread("Date", "price")
R>> stock 2009-01-01 2009-01-02 2009-01-03 2009-01-04 2009-01-05 2009-01-06 2009-01-07
R>> 1 A 115.98617 142.80315 107.04982 120.54713 111.68283 100.04230 105.22675
R>> 2 B 385.38192 512.11140 436.18192 394.45576 537.58966 500.46223 534.03513
R>> 3 C 61.72505 78.66316 42.42916 67.39277 63.22199 40.56708 45.21401
Function separate()
: It turns a single character column into multiple columns.
In some cases a single column contains multiple variables in the data separated by a special character, if we want to split them into two columns we use separate()
function.
df <- data.frame(Names = c("koti.minnun", "AB.Diwillars", "ST.Joesph"))
df
R>> Names
R>> 1 koti.minnun
R>> 2 AB.Diwillars
R>> 3 ST.Joesph
df_sep <- df %>%
separate("Names", c("firstName", "lastName"))
df_sep
R>> firstName lastName
R>> 1 koti minnun
R>> 2 AB Diwillars
R>> 3 ST Joesph
Function unite()
: It combines multiple columns into single column. This function is the opposite of separate()
, which merges multiple columns into single column.
Consider our earlier example:
df_sep %>%
unite("Name", c("firstName", "lastName"), sep = "_")
R>> Name
R>> 1 koti_minnun
R>> 2 AB_Diwillars
R>> 3 ST_Joesph