<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Purrr on rdata.lu Blog | Data science with R</title>
    <link>/tags/purrr/</link>
    <description>Recent content in Purrr on rdata.lu Blog | Data science with R</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>Copyright (c) rdata.lu. All rights reserved. &lt;br&gt; Content reblogged by &lt;a href=&#39;https://www.r-bloggers.com/&#39; target=&#39;_blank&#39;&gt;R-bloggers&lt;/a&gt; &amp; &lt;a href=&#39;http://www.rweekly.org/&#39; target=&#39;_blank&#39;&gt;RWeekly&lt;/a&gt;</copyright>
    <lastBuildDate>Thu, 21 Dec 2017 00:00:00 +0000</lastBuildDate>
    
        <atom:link href="/tags/purrr/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Skip errors in R loops by not writing loops</title>
      <link>/post/2017-12-21-skip-errors-in-r-by-not-writing-loops/</link>
      <pubDate>Thu, 21 Dec 2017 00:00:00 +0000</pubDate>
      
      <guid>/post/2017-12-21-skip-errors-in-r-by-not-writing-loops/</guid>
      <description>&lt;p&gt;&lt;img src=&#34;/images/loops_error.jpg&#34; /&gt;&lt;!-- --&gt;&lt;/p&gt;
&lt;p&gt;You probably have encountered situations similar to this one:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;result = vector(&amp;quot;list&amp;quot;, length(some_numbers))

for(i in seq_along(some_numbers)){
  result[[i]] = some_function(some_numbers[[i]])
}

print(result)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First I initialize &lt;code&gt;result&lt;/code&gt;, an empty list of size equal to the length of &lt;code&gt;some_numbers&lt;/code&gt; which will contains the results of applying &lt;code&gt;some_function()&lt;/code&gt; to each element of &lt;code&gt;some_numbers&lt;/code&gt;. Then, using a for loop, I apply the function. This is what I get back:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;NaNs producedError in sqrt(x) : non-numeric argument to mathematical function&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s take a look at &lt;code&gt;some_numbers&lt;/code&gt; and &lt;code&gt;some_function()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(some_numbers)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [[1]]
## [1] -1.9
## 
## [[2]]
## [1] 20
## 
## [[3]]
## [1] &amp;quot;-88&amp;quot;
## 
## [[4]]
## [1] -42&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;some_function&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## function(x){
##   if(x == 0) res = 0
##   if(x &amp;lt; 0) res = -sqrt(-x)
##   if(x &amp;gt; 0) res = sqrt(x)
##   return(res)
## }&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So the function simply returns the square root of &lt;code&gt;x&lt;/code&gt; (or minus the square root of &lt;code&gt;-x&lt;/code&gt; if &lt;code&gt;x&lt;/code&gt; is negative), but the number in third position of the list &lt;code&gt;some_numbers&lt;/code&gt; is actually a character. This type of mistakes can commonly happen. The &lt;code&gt;result&lt;/code&gt; list looks like this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(result)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[[1]]
[1] -1.378405

[[2]]
[1] 4.472136

[[3]]
NULL

[[4]]
NULL&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you see, even though the fourth element could have been computed, the error made the whole loop stop. In such a simple example, you could correct this and then run your function. But what if the list you want to apply your function to is very long and the computation take a very, very long time? Perhaps you simply want to skip these errors and get back to them later. One way of doing that is using &lt;code&gt;tryCatch()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;result = vector(&amp;quot;list&amp;quot;, length(some_numbers))

for(i in seq_along(some_numbers)){
  result[[i]] = tryCatch(some_function(some_numbers[[i]]), 
                         error = function(e) paste(&amp;quot;something wrong here&amp;quot;))
}

print(result)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [[1]]
## [1] -1.378405
## 
## [[2]]
## [1] 4.472136
## 
## [[3]]
## [1] &amp;quot;something wrong here&amp;quot;
## 
## [[4]]
## [1] -6.480741&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This works, but it’s verbose and easy to mess up. My advice here is that if you want to skip errors in loops you don’t write loops! This is quite easy with the &lt;code&gt;purrr&lt;/code&gt; package:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(purrr)

result = map(some_numbers, some_function)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There’s several advantages here already; no need to initialize an empty structure to hold your result, and no need to think about indices, which can sometimes get confusing. This however does not work either; there’s still the problem that we have a character inside &lt;code&gt;some_numbers&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Error in sqrt(x) : non-numeric argument to mathematical function&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, &lt;code&gt;purrr&lt;/code&gt; contains some very amazing functions for error handling, &lt;code&gt;safely()&lt;/code&gt; and &lt;code&gt;possibly()&lt;/code&gt;. Let’s try &lt;code&gt;possibly()&lt;/code&gt; first:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;possibly_some_function = possibly(some_function, otherwise = &amp;quot;something wrong here&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;possibly()&lt;/code&gt; takes a function as argument as well as &lt;code&gt;otherwise&lt;/code&gt;; this is where you define a return value in case something is wrong. &lt;code&gt;possibly()&lt;/code&gt; then returns a new function that skips errors:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;result = map(some_numbers, possibly_some_function)

print(result)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [[1]]
## [1] -1.378405
## 
## [[2]]
## [1] 4.472136
## 
## [[3]]
## [1] &amp;quot;something wrong here&amp;quot;
## 
## [[4]]
## [1] -6.480741&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you use &lt;code&gt;possibly()&lt;/code&gt; on a function, you’re politely telling R “would you kindly apply the function wherever possible, and if not, tell me where there was an issue”. What about &lt;code&gt;safely()&lt;/code&gt;?&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;safely_some_function = safely(some_function)


result = map(some_numbers, safely_some_function)

str(result)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## List of 4
##  $ :List of 2
##   ..$ result: num -1.38
##   ..$ error : NULL
##  $ :List of 2
##   ..$ result: num 4.47
##   ..$ error : NULL
##  $ :List of 2
##   ..$ result: NULL
##   ..$ error :List of 2
##   .. ..$ message: chr &amp;quot;invalid argument to unary operator&amp;quot;
##   .. ..$ call   : language -x
##   .. ..- attr(*, &amp;quot;class&amp;quot;)= chr [1:3] &amp;quot;simpleError&amp;quot; &amp;quot;error&amp;quot; &amp;quot;condition&amp;quot;
##  $ :List of 2
##   ..$ result: num -6.48
##   ..$ error : NULL&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The major difference with &lt;code&gt;possibly()&lt;/code&gt; is that &lt;code&gt;safely()&lt;/code&gt; returns a more complex object: it returns a list of lists. There are as many lists as there are elements in &lt;code&gt;some_numbers&lt;/code&gt;. Let’s take a look at the first one:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(result[[1]])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $result
## [1] -1.378405
## 
## $error
## NULL&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;result[[1]]&lt;/code&gt; is a list with a &lt;code&gt;result&lt;/code&gt; and an &lt;code&gt;error&lt;/code&gt;. If there was no error, we get a value in &lt;code&gt;result&lt;/code&gt; and &lt;code&gt;NULL&lt;/code&gt; in &lt;code&gt;error&lt;/code&gt;. If there was an &lt;code&gt;error&lt;/code&gt;, this is what we see:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(result[[3]])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## $result
## NULL
## 
## $error
## &amp;lt;simpleError in -x: invalid argument to unary operator&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because lists of lists are not easy to handle, I like to use &lt;code&gt;possibly()&lt;/code&gt;, but if you use &lt;code&gt;safely()&lt;/code&gt; you might want to know about &lt;code&gt;transpose()&lt;/code&gt;, which is another function from &lt;code&gt;purrr&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;result2 = transpose(result)

str(result2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## List of 2
##  $ result:List of 4
##   ..$ : num -1.38
##   ..$ : num 4.47
##   ..$ : NULL
##   ..$ : num -6.48
##  $ error :List of 4
##   ..$ : NULL
##   ..$ : NULL
##   ..$ :List of 2
##   .. ..$ message: chr &amp;quot;invalid argument to unary operator&amp;quot;
##   .. ..$ call   : language -x
##   .. ..- attr(*, &amp;quot;class&amp;quot;)= chr [1:3] &amp;quot;simpleError&amp;quot; &amp;quot;error&amp;quot; &amp;quot;condition&amp;quot;
##   ..$ : NULL&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;result2&lt;/code&gt; is now a list of two lists: a &lt;code&gt;result&lt;/code&gt; list holding all the results, and an &lt;code&gt;error&lt;/code&gt; list holding all the error message. You can get results with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;result2$result&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [[1]]
## [1] -1.378405
## 
## [[2]]
## [1] 4.472136
## 
## [[3]]
## NULL
## 
## [[4]]
## [1] -6.480741&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I hope you enjoyed this blog post, and that these functions will make your life easier!&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Functional peace of mind</title>
      <link>/post/2017-11-14-functional-peace-of-mind/</link>
      <pubDate>Tue, 14 Nov 2017 00:00:00 +0000</pubDate>
      
      <guid>/post/2017-11-14-functional-peace-of-mind/</guid>
      <description>&lt;p&gt;I think what I enjoy the most about functional programming is the peace of mind that comes with it. With functional programming, there’s a lot of stuff you don’t need to think about. You can write functions that are general enough so that they solve a variety of problems. For example, imagine for a second that R does not have the &lt;code&gt;sum()&lt;/code&gt; function anymore. If you want to compute the sum of, say, the first 100 integers, you could write a loop that would do that for you:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;numbers = 0

for (i in 1:100){
  numbers = numbers + i
}

print(numbers)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 5050&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem with this approach, is that you cannot reuse any of the code there, even if you put it inside a function. For instance, what if you want to merge 4 datasets together? You would need something like this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dplyr)
data(mtcars)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtcars1 = mtcars %&amp;gt;%
  mutate(id = &amp;quot;1&amp;quot;)

mtcars2 = mtcars %&amp;gt;%
  mutate(id = &amp;quot;2&amp;quot;)

mtcars3 = mtcars %&amp;gt;%
  mutate(id = &amp;quot;3&amp;quot;)

mtcars4 = mtcars %&amp;gt;%
  mutate(id = &amp;quot;4&amp;quot;)

datasets = list(mtcars1, mtcars2, mtcars3, mtcars4)

temp = datasets[[1]]

for(i in 1:3){
  temp = full_join(temp, datasets[[i+1]])
}&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)
## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)
## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;glimpse(temp)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Observations: 128
## Variables: 12
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19....
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ...
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1...
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ...
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9...
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3...
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2...
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ...
## $ carb &amp;lt;dbl&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ...
## $ id   &amp;lt;chr&amp;gt; &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, the logic is very similar as before, but you need to think carefully about the structure holding your elements (which can be numbers, datasets, characters, etc…) as well as be careful about indexing correctly… and depending on the type of objects you are working on, you might need to tweak the code further.&lt;/p&gt;
&lt;p&gt;How would a functional programming approach make this easier? Of course, you could use &lt;code&gt;purrr::reduce()&lt;/code&gt; to solve these problems. However, since I assumed that &lt;code&gt;sum()&lt;/code&gt; does not exist, I will also assume that &lt;code&gt;purrr::reduce()&lt;/code&gt; does not exist either and write my own, clumsy implementation. Here’s the code:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;my_reduce = function(a_list, a_func, init = NULL, ...){

  if(is.null(init)){
    init = `[[`(a_list, 1)
    a_list = tail(a_list, -1)
  }

  car = `[[`(a_list, 1)
  cdr = tail(a_list, -1)
  init = a_func(init, car, ...)

  if(length(cdr) != 0){
    my_reduce(cdr, a_func, init, ...)
  }
  else {
    init
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This can look much more complicated than before, but the idea is quite simple; &lt;em&gt;if you know about recursive functions&lt;/em&gt; (recursive functions are functions that call themselves). I won’t explain how the function works, because it is not the main point of the article (but if you’re curious, I encourage you to play around with it). The point is that now, I can do the following:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;my_reduce(list(1,2,3,4,5), `+`)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 15&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;my_reduce(datasets, full_join) %&amp;gt;% glimpse&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)
## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)
## Joining, by = c(&amp;quot;mpg&amp;quot;, &amp;quot;cyl&amp;quot;, &amp;quot;disp&amp;quot;, &amp;quot;hp&amp;quot;, &amp;quot;drat&amp;quot;, &amp;quot;wt&amp;quot;, &amp;quot;qsec&amp;quot;, &amp;quot;vs&amp;quot;, &amp;quot;am&amp;quot;, &amp;quot;gear&amp;quot;, &amp;quot;carb&amp;quot;, &amp;quot;id&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Observations: 128
## Variables: 12
## $ mpg  &amp;lt;dbl&amp;gt; 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19....
## $ cyl  &amp;lt;dbl&amp;gt; 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, ...
## $ disp &amp;lt;dbl&amp;gt; 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 1...
## $ hp   &amp;lt;dbl&amp;gt; 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, ...
## $ drat &amp;lt;dbl&amp;gt; 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.9...
## $ wt   &amp;lt;dbl&amp;gt; 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3...
## $ qsec &amp;lt;dbl&amp;gt; 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 2...
## $ vs   &amp;lt;dbl&amp;gt; 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ am   &amp;lt;dbl&amp;gt; 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...
## $ gear &amp;lt;dbl&amp;gt; 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, ...
## $ carb &amp;lt;dbl&amp;gt; 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, ...
## $ id   &amp;lt;chr&amp;gt; &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1&amp;quot;, &amp;quot;1...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And if I need to merge another dataset, I don’t need to change anything at all. Plus, because &lt;code&gt;my_reduce()&lt;/code&gt; is very general, I can even use it for situation I didn’t write it for in the first place:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;my_reduce(list(&amp;quot;a&amp;quot;, &amp;quot;b&amp;quot;, &amp;quot;c&amp;quot;, &amp;quot;d&amp;quot;, &amp;quot;e&amp;quot;), paste)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;a b c d e&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, &lt;code&gt;paste()&lt;/code&gt; is vectorized, so you could just as well do &lt;code&gt;paste(1, 2, 3, 4, 5)&lt;/code&gt;, but again, I want to insist on the fact that writing or using such functions allows you to abstract over a lot of thing. There is nothing specific to any type of object in &lt;code&gt;my_reduce()&lt;/code&gt;, whereas the loops have to be tailored for the kind of object you’re working with. As long as the &lt;code&gt;a_func&lt;/code&gt; argument is a binary operator that combines the elements inside &lt;code&gt;a_list&lt;/code&gt;, it’s going to work. And I don’t need to think about indexing, about having temporary variables or thinking about the structure that will hold my results.&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Let&#39;s make ggplot2 purrr again</title>
      <link>/post/2017-10-09-make-ggplot2-purrr-again/</link>
      <pubDate>Mon, 09 Oct 2017 06:45:48 +0200</pubDate>
      
      <guid>/post/2017-10-09-make-ggplot2-purrr-again/</guid>
      <description>&lt;p&gt;&lt;em&gt;Update&lt;/em&gt;: I’ve included another way of saving a separate plot by group in this article, as pointed out by &lt;a href=&#34;https://twitter.com/monitus/status/849033025631297536&#34;&gt;&lt;code&gt;@monitus&lt;/code&gt;&lt;/a&gt;. Actually, this is the preferred solution; using &lt;code&gt;dplyr::do()&lt;/code&gt; is deprecated, according to Hadley Wickham &lt;a href=&#34;https://twitter.com/hadleywickham/status/719542847045636096&#34;&gt;himself&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I’ll be honest: the title is a bit misleading. I will not use &lt;code&gt;purrr&lt;/code&gt; that much in this blog post. Actually, I will use one single &lt;code&gt;purrr&lt;/code&gt; function, at the very end. I use &lt;code&gt;dplyr&lt;/code&gt; much more. However &lt;em&gt;Make ggplot2 purrr&lt;/em&gt; sounds better than &lt;em&gt;Make ggplot dplyr&lt;/em&gt; or whatever the verb for &lt;code&gt;dplyr&lt;/code&gt; would be.&lt;/p&gt;
&lt;p&gt;Also, this blog post was inspired by a stackoverflow question and in particular one of the &lt;a href=&#34;http://stackoverflow.com/a/29035145/1298051&#34;&gt;answers&lt;/a&gt;. So I don’t bring anything new to the table, but I found this stackoverflow answer so useful and so underrated (only 16 upvotes as I’m writing this!) that I wanted to write something about it.&lt;/p&gt;
&lt;p&gt;Basically the idea of this blog post is to show how to create graphs using &lt;code&gt;ggplot2&lt;/code&gt;, but by grouping by a factor variable beforehand. To illustrate this idea, let’s use the data from the &lt;a href=&#34;http://www.rug.nl/ggdc/productivity/pwt/&#34;&gt;Penn World Tables 9.0&lt;/a&gt;. The easiest way to get this data is to install the package called &lt;code&gt;pwt9&lt;/code&gt; with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;pwt9&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then load the data with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(&amp;quot;pwt9.0&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, let’s load the needed packages. I am also using &lt;code&gt;ggthemes&lt;/code&gt; which makes themeing your ggplots very easy. I’ll be making &lt;a href=&#34;https://en.wikipedia.org/wiki/Edward_Tufte&#34;&gt;Tufte&lt;/a&gt;-style plots.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggplot2)
library(ggthemes)
library(dplyr)
library(tidyr)
library(purrr)
library(pwt9)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First let’s select a list of countries:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;country_list &amp;lt;- c(&amp;quot;France&amp;quot;, &amp;quot;Germany&amp;quot;, &amp;quot;United States of America&amp;quot;, &amp;quot;Luxembourg&amp;quot;, &amp;quot;Switzerland&amp;quot;, &amp;quot;Greece&amp;quot;)

small_pwt &amp;lt;- pwt9.0 %&amp;gt;%
  filter(country %in% country_list)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s us also order the countries in the data frame as I have written them in &lt;code&gt;country_list&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;small_pwt &amp;lt;- small_pwt %&amp;gt;%
  mutate(country = factor(country, levels = country_list, ordered = TRUE))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You might be wondering why this is important. At the end of the article, we are going to save the plots to disk. If we do not re-order the countries inside the data frame as in &lt;code&gt;country_list&lt;/code&gt;, the name of the files will not correspond to the correct plots!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update&lt;/em&gt;: While this can still be interesting to know, especially if you want to order the bars of a barplot made with &lt;code&gt;ggplot2&lt;/code&gt;, I included a suggestion by &lt;a href=&#34;https://twitter.com/expersso/status/846986357792739328&#34;&gt;&lt;code&gt;@expersso&lt;/code&gt;&lt;/a&gt; that does not require your data to be ordered!&lt;/p&gt;
&lt;p&gt;Now when you want to plot the same variable by countries, say &lt;code&gt;avh&lt;/code&gt; (&lt;em&gt;Average annual hours worked by persons engaged&lt;/em&gt;), the usual way to do this is with one of &lt;code&gt;facet_wrap()&lt;/code&gt; or &lt;code&gt;facet_grid()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(data = small_pwt) + theme_tufte() +
  geom_line(aes(y = avh, x = year)) +
  facet_wrap(~country)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-10-09-make-ggplot2-purrr-again_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot(data = small_pwt) + theme_tufte() +
  geom_line(aes(y = avh, x = year)) +
  facet_grid(country~.)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-10-09-make-ggplot2-purrr-again_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As you can see, for this particular example, &lt;code&gt;facet_grid()&lt;/code&gt; is not very useful, but do notice its argument, &lt;code&gt;country~.&lt;/code&gt;, which is different from &lt;code&gt;facet_wrap()&lt;/code&gt;’s argument. This way, I get the graphs stacked horizontally. If I had used &lt;code&gt;facet_grid(~country)&lt;/code&gt; the graphs would be side by side and completely unreadable.&lt;/p&gt;
&lt;p&gt;Now, let’s go to the meat of this post: what if you would like to have one single graph for each country? You’d probably think of using &lt;code&gt;dplyr::group_by()&lt;/code&gt; to form the groups and then the graphs. This is the way to go, but you also have to use &lt;code&gt;dplyr::do()&lt;/code&gt;. This is because as far as I understand, &lt;code&gt;ggplot2&lt;/code&gt; is not &lt;code&gt;dplyr&lt;/code&gt;-aware, and using an arbitrary function with groups is only possible with &lt;code&gt;dplyr::do()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update&lt;/em&gt;: As explained in the intro above, I also added the solution that uses &lt;code&gt;tidyr::nest()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Ancient, deprecated way of doing this
plots &amp;lt;- small_pwt %&amp;gt;%
  group_by(country) %&amp;gt;%
  do(plot = ggplot(data = .) + theme_tufte() +
       geom_line(aes(y = avh, x = year)) +
       ggtitle(unique(.$country)) +
       ylab(&amp;quot;Year&amp;quot;) +
       xlab(&amp;quot;Average annual hours worked by persons engaged&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And this is the approach that uses &lt;code&gt;tidyr::nest()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Preferred approach
plots &amp;lt;- small_pwt %&amp;gt;%
  group_by(country) %&amp;gt;%
  nest() %&amp;gt;%
  mutate(plot = map2(data, country, ~ggplot(data = .x) + theme_tufte() +
       geom_line(aes(y = avh, x = year)) +
       ggtitle(.y) +
       ylab(&amp;quot;Year&amp;quot;) +
       xlab(&amp;quot;Average annual hours worked by persons engaged&amp;quot;)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you know &lt;code&gt;dplyr&lt;/code&gt; at least a little bit, the above lines should be easy for you to understand. But notice how we get the title of the graphs, with &lt;code&gt;ggtitle(unique(.$country))&lt;/code&gt;, which was actually the point of the stackoverflow question.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update:&lt;/em&gt; The modern version uses &lt;code&gt;tidyr::nest()&lt;/code&gt;. Its documentation tells us:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;There are many possible ways one could choose to nest columns inside a data frame. &lt;code&gt;nest()&lt;/code&gt; creates a list of data frames containing all the nested variables: this seems to be the most useful form in practice.&lt;/em&gt; Let’s take a closer look at what it does exactly:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;small_pwt %&amp;gt;%
  group_by(country) %&amp;gt;%
  nest() %&amp;gt;%
  head()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   country                  data              
##   &amp;lt;ord&amp;gt;                    &amp;lt;list&amp;gt;            
## 1 Switzerland              &amp;lt;tibble [65 × 46]&amp;gt;
## 2 Germany                  &amp;lt;tibble [65 × 46]&amp;gt;
## 3 France                   &amp;lt;tibble [65 × 46]&amp;gt;
## 4 Greece                   &amp;lt;tibble [65 × 46]&amp;gt;
## 5 Luxembourg               &amp;lt;tibble [65 × 46]&amp;gt;
## 6 United States of America &amp;lt;tibble [65 × 46]&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is why I love lists in R; we get a &lt;code&gt;tibble&lt;/code&gt; where each element of the column &lt;code&gt;data&lt;/code&gt; is itself a &lt;code&gt;tibble&lt;/code&gt;. We can now apply any function that we know works on lists.&lt;/p&gt;
&lt;p&gt;What might be surprising though, is the object that is created by this code. Let’s take a look at &lt;code&gt;plots&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(plots)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 3
##   country                  data               plot    
##   &amp;lt;ord&amp;gt;                    &amp;lt;list&amp;gt;             &amp;lt;list&amp;gt;  
## 1 Switzerland              &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;
## 2 Germany                  &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;
## 3 France                   &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;
## 4 Greece                   &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;
## 5 Luxembourg               &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;
## 6 United States of America &amp;lt;tibble [65 × 46]&amp;gt; &amp;lt;S3: gg&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As &lt;code&gt;dplyr::do()&lt;/code&gt;’s documentation tells us, the return values get stored inside a list. And this is exactly what we get back; a list of plots! Lists are a very flexible and useful class, and you cannot spell &lt;em&gt;list&lt;/em&gt; without &lt;code&gt;purrr&lt;/code&gt; (at least not when you’re a ne&lt;code&gt;R&lt;/code&gt;d).&lt;/p&gt;
&lt;p&gt;Here are the final lines that use &lt;code&gt;purrr::map2()&lt;/code&gt; to save all these plots at once inside your working directory:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Update&lt;/em&gt;: I have changed the code below which does not require your data frame to be ordered according to the variable &lt;code&gt;country_list&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# file_names &amp;lt;- paste0(country_list, &amp;quot;.pdf&amp;quot;)

map2(paste0(plots$country, &amp;quot;.pdf&amp;quot;), plots$plot, ggsave)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As I said before, if you do not re-order the countries inside the data frame, the names of the files and the plots will not match. Try running all the code without re-ordering, you’ll see!&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
