Code
library(readxl)
library(dplyr)
library(purrr)
library(janitor)
To read data from multiple sheets in an Excel workbook, combine them into a single dataframe using R and to deal with mixed date formats using the janitor package
Zahier Nasrudin
March 8, 2023
Welcome welcome.
In this blog post, we will be exploring:
janitor
packageBy the end of this post, you & me should have a solid understanding of how to read and combine data from multiple sheets in an Excel workbook using R, and how to handle mixed date formats to ensure consistent data analysis.
First, let’s define the path to the Excel file that we want to read in. In this example, we’ll be using the here
package to make the path relative to the current project directory. This data set can be obtained from my github repository
To read all the sheets:
excel_sheets()
function from the readxl
package is used to extract the names of all the sheets in the Excel file.map_df()
function from the purrr
package meanwhile is used to apply a function to each element of a vector and combine the results into a data framemap_df()
to read in each sheet of the Excel file; using the read_excel()
function from the readxl package
.sheet_name
and contains the name of the sheet that the data came from.The dataframe:
Year | Date | Store | Sales | sheet_name |
---|---|---|---|---|
2012 | 1/26/2012 | A | 2000 | Year1 |
2012 | 1/26/2012 | B | 2500 | Year1 |
2012 | 1/28/2012 | C | 3400 | Year1 |
2012 | Jan 21, 2012 | D | 6000 | Year1 |
2013 | 1/26/2013 | A | 2000 | Year2 |
2013 | 1/21/2013 | B | 2500 | Year2 |
2013 | 1/28/2013 | C | 3400 | Year2 |
2013 | Jan 25, 2013 | D | 6000 | Year2 |
2013 | 41302 | E | 5500 | Year2 |
2014 | 1/26/2014 | A | 2000 | Year3 |
2014 | 1/28/2014 | B | 2500 | Year3 |
2014 | 2/28/2014 | C | 3400 | Year3 |
2014 | March 25, 2014 | D | 6000 | Year3 |
2014 | 41789 | E | 5500 | Year3 |
We have successfully loaded the information into one dataframe. However, it appears that this dataframe contains a column (Date
) with mixed date formats that have been imported from Excel. This can often be a problem when working with date data, as it can make it difficult to perform consistent operations and analyses on the data. To fix this issue, we will need to standardize the date format across the entire column, so that all dates can be treated consistently. One way to accomplish this is to use a function like convert_to_date()
from the janitor
package, which can convert the dates to a consistent format while also handling any errors or inconsistencies in the data:
convert_to_date()
function from the janitor
package is used to convert the Date
column to a consistent date format.string_conversion_failure
parameter is set to “warning”, to indicate that any non-date strings in the Date
column will be converted to NA
values and a warning message will be displayed, just to be safe.character_fun
parameter is set to lubridate::mdy
, which indicates the expected order of month, day, and year in the date string.Please refer to Date_change
for the standardized date column:
Year | Date | Store | Sales | sheet_name | Date_change |
---|---|---|---|---|---|
2012 | 1/26/2012 | A | 2000 | Year1 | 2012-01-26 |
2012 | 1/26/2012 | B | 2500 | Year1 | 2012-01-26 |
2012 | 1/28/2012 | C | 3400 | Year1 | 2012-01-28 |
2012 | Jan 21, 2012 | D | 6000 | Year1 | 2012-01-21 |
2013 | 1/26/2013 | A | 2000 | Year2 | 2013-01-26 |
2013 | 1/21/2013 | B | 2500 | Year2 | 2013-01-21 |
2013 | 1/28/2013 | C | 3400 | Year2 | 2013-01-28 |
2013 | Jan 25, 2013 | D | 6000 | Year2 | 2013-01-25 |
2013 | 41302 | E | 5500 | Year2 | 2013-01-28 |
2014 | 1/26/2014 | A | 2000 | Year3 | 2014-01-26 |
2014 | 1/28/2014 | B | 2500 | Year3 | 2014-01-28 |
2014 | 2/28/2014 | C | 3400 | Year3 | 2014-02-28 |
2014 | March 25, 2014 | D | 6000 | Year3 | 2014-03-25 |
2014 | 41789 | E | 5500 | Year3 | 2014-05-30 |