It is frustrating to see your code choke part of the way via although attempting to utilize a operate in R. You could know that a little something in one of these objects prompted a difficulty, but how do you monitor down the offender?
The purrr package’s potentially()
operate is one straightforward way.
In this case in point, I’ll demo code that imports many CSV documents. Most files’ value columns import as figures, but one of these comes in as quantities. Running a operate that expects figures as enter will result in an mistake.
For setup, the code below loads numerous libraries I need and then works by using base R’s listing.documents()
operate to return a sorted vector with names of all the documents in my data listing.
library(purrr)
library(readr)
library(rio)
library(dplyr)
my_data_documents <- list.files("data_files", full.names = TRUE) {36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}>{36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}
sort()
I can then import the very first file and appear at its composition.
x <- rio::import("data_files/file1.csv") str(x) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : chr "$4,256.48 " "$438.22" "$945.12" $ MonthStarting: chr "12/1/20" "12/1/20" "12/1/20"
The two the Value and Month columns are importing as character strings. What I finally want is Value as quantities and MonthStarting as dates.
I sometimes offer with difficulties like this by crafting a small operate, these types of as the one below, to make modifications in a file just after import. It works by using dplyr’s transmute()
to produce a new Month column from MonthStarting as Day objects, and a new Whole column from Value as quantities. I also make positive to hold the Category column (transmute()
drops all columns not explicity described).
library(dplyr)
library(lubridate)
procedure_file <- function(myfile)
rio::import(myfile) {36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}>{36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Whole = readr::parse_quantity(Value)
)
I like to use readr’s parse_quantity()
operate for changing values that come in as character strings due to the fact it discounts with commas, greenback signals, or p.c signals in quantities. Having said that, parse_quantity()
requires character strings as enter. If a value is presently a quantity, parse_quantity()
will toss an mistake.
My new operate will work good when I test it on the very first two documents in my data listing applying purrr’s map_df()
operate.
my_benefits <- map_df(my_data_files[1:2], process_file)
But if I try jogging my operate on all the documents, such as the one where Value imports as quantities, it will choke.
all_benefits <- map_df(my_data_files, process_file) Error: Problem with `mutate()` input `Total`. x is.character(x) is not TRUE ℹ Input `Total` is `readr::parse_number(Value)`. Run `rlang::last_error()` to see where the error occurred.
That mistake tells me Whole is not a character column in one of the documents, but I’m not positive which one. Preferably, I’d like to run via all the documents, marking the one(s) with challenges as glitches but nonetheless processing all of them instead of stopping at the mistake.
potentially()
allows me do this by building a model new operate from my unique operate:
safer_procedure_file <- possibly(process_file, otherwise = "Error in file")
The very first argument for potentially()
is my unique operate, procedure_file
. The 2nd argument, usually
, tells potentially()
what to return if there’s an mistake.
To utilize my new safer_procedure_file()
operate to all my documents, I’ll use the map()
operate and not purrr’s map_df()
operate. That is due to the fact safer_procedure_file()
requires to return a listing, not a data body. And that’s due to the fact if there’s an mistake, these mistake benefits won’t be a data body they’ll be the character string that I advised usually
to crank out.
all_benefits <- map(my_data_files, safer_process_file)
str(all_benefits, max.level = one) List of five $ :'data.frame':three obs. of three variables: $ :'data.frame':three obs. of three variables: $ :'data.frame':three obs. of three variables: $ : chr "Mistake in file" $ :'data.frame':three obs. of three variables:
You can see here that the fourth product, from my fourth file, is the one with the mistake. That is straightforward to see with only 5 products, but wouldn’t be rather so straightforward if I experienced a thousand documents to import and 3 experienced glitches.
If I identify the listing with my unique file names, it’s a lot easier to discover the difficulty file:
names(all_benefits) <- my_data_files str(all_results, max.level = 1) List of 5 $ data_files/file1.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file2.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file3.csv:'data.frame': 3 obs. of 3 variables: $ data_files/file4.csv: chr "Error in file" $ data_files/file5.csv:'data.frame': 3 obs. of 3 variables:
I can even help save the benefits of str()
to a text file for further evaluation.
str(all_benefits, max.level = one) {36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}>{36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}
seize.output(file = "benefits.txt")
Now that I know file4.csv is the difficulty, I can import just that one and confirm what the challenge is.
x4 <- rio::import(my_data_files[4]) str(x4) 'data.frame': 3 obs. of 3 variables: $ Category : chr "A" "B" "C" $ Value : num 3738 723 5494 $ MonthStarting: chr "9/1/20" "9/1/20" "9/1/20"
Ah, Value is certainly coming in as numeric. I’ll revise my procedure_file()
operate to account for the possibility that Value is not a character string with an ifelse()
look at:
procedure_file2 <- function(myfile)
rio::import(myfile) {36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}>{36a394957233d72e39ae9c6059652940c987f134ee85c6741bc5f1e7246491e6}
dplyr::transmute(
Category = as.character(Category),
Month = lubridate::mdy(MonthStarting),
Whole = ifelse(is.character(Value), readr::parse_quantity(Value), Value)
)
Now if I use purrr’s map_df()
with my new procedure_file2()
operate, it ought to perform and give me a one data body.
all_results2 <- map_df(my_data_files, process_file2) str(all_results2) 'data.frame': 15 obs. of 3 variables: $ Category: chr "A" "B" "C" "A" ... $ Month : Date, format: "2020-12-01" "2020-12-01" "2020-12-01" ... $ Total : num 4256 4256 4256 3156 3156 ...
That is just the data and structure I needed, thanks to wrapping my unique operate in potentially()
to produce a new, mistake-managing operate.
For a lot more R recommendations, head to the “Do Far more With R” page on InfoWorld or look at out the “Do Far more With R” YouTube playlist.
Copyright © 2020 IDG Communications, Inc.