<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Web Data on rdata.lu Blog | Data science with R</title>
    <link>/tags/web-data/</link>
    <description>Recent content in Web Data on rdata.lu Blog | Data science with R</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>Copyright (c) rdata.lu. All rights reserved. &lt;br&gt; Content reblogged by &lt;a href=&#39;https://www.r-bloggers.com/&#39; target=&#39;_blank&#39;&gt;R-bloggers&lt;/a&gt; &amp; &lt;a href=&#39;http://www.rweekly.org/&#39; target=&#39;_blank&#39;&gt;RWeekly&lt;/a&gt;</copyright>
    <lastBuildDate>Tue, 20 Feb 2018 00:00:00 +0000</lastBuildDate>
    
        <atom:link href="/tags/web-data/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>BIKE SERVICES API &#43; SHINY = NICE APP</title>
      <link>/post/2018-02-20-api-shiny-nice-app/</link>
      <pubDate>Tue, 20 Feb 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018-02-20-api-shiny-nice-app/</guid>
      <description>&lt;!--
words 941
&lt;style type=&#34;text/css&#34;&gt;
pre code, pre, code {
white-space: pre !important;
overflow-x: scroll !important;
overflow-y: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
max-height:30vh !important;
}
p img{
width:100%; !important;
}
&lt;/style&gt;
--&gt;
&lt;p&gt;Hi everyone,&lt;/p&gt;
&lt;p&gt;In this blog post, I will be short and I will introduce our &lt;a href=&#34;http://blog.rdata.lu/visualization/bike/&#34;&gt;shiny application&lt;/a&gt; on bike self-service stations. &lt;/br&gt;&lt;/p&gt;
&lt;div class=video&gt;
&lt;video width=&#34;600&#34; height=&#34;400&#34; controls autoplay&gt;
&lt;source src=&#34;/video/demo_bike.mp4&#34; type=&#34;video/mp4&#34;&gt;
Your browser does not support the video tag. &lt;/video&gt;
&lt;/div&gt;
&lt;p&gt;The code is in 2 parts, the &lt;a href=&#34;https://github.com/krosamont/shiny_bike/blob/master/ui.R&#34;&gt;ui.R&lt;/a&gt; file for the interface and the &lt;a href=&#34;https://github.com/krosamont/shiny_bike/blob/master/server.R&#34;&gt;server.R&lt;/a&gt; file for the backend. You can check the code on &lt;a href=&#34;https://github.com/krosamont/shiny_bike/&#34;&gt;GitHub&lt;/a&gt; if you want to download the code and run it on your computer. &lt;/br&gt;&lt;/br&gt;&lt;/br&gt; JCDecaux provides an API that gives us real time information on each bike self-service station. This infomation is:&lt;/br&gt; •Station id &lt;/br&gt; •Station name &lt;/br&gt; •Address &lt;/br&gt; •Position latitude/longitude &lt;/br&gt; •Presence of a payment terminal &lt;/br&gt; •Presence of a bonus station &lt;/br&gt; •If the station is open &lt;/br&gt; •How many bike stands are in the station &lt;/br&gt; •how many bike stands are available in the station (no bikes on the stand) &lt;/br&gt; •How many bikes are available &lt;/br&gt; •Time of the last api update &lt;/br&gt;&lt;/p&gt;
&lt;p&gt;The JCDecaux API gives the data under the following format:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;{
  &amp;quot;number&amp;quot;: 123,
  &amp;quot;contract_name&amp;quot; : &amp;quot;Paris&amp;quot;,
  &amp;quot;name&amp;quot;: &amp;quot;stations name&amp;quot;,
  &amp;quot;address&amp;quot;: &amp;quot;address of the station&amp;quot;,
  &amp;quot;position&amp;quot;: {
    &amp;quot;lat&amp;quot;: 48.862993,
    &amp;quot;lng&amp;quot;: 2.344294
  },
  &amp;quot;banking&amp;quot;: true,
  &amp;quot;bonus&amp;quot;: false,
  &amp;quot;status&amp;quot;: &amp;quot;OPEN&amp;quot;,
  &amp;quot;bike_stands&amp;quot;: 20,
  &amp;quot;available_bike_stands&amp;quot;: 15,
  &amp;quot;available_bikes&amp;quot;: 5,
  &amp;quot;last_update&amp;quot;: &amp;lt;timestamp&amp;gt;
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Hence, our shiny application gets real time information on bike stations in 27 cities.&lt;/p&gt;
&lt;p&gt;We also used the modern open-source library leaflet to display city map (open street map). &lt;/br&gt;&lt;/br&gt;&lt;/p&gt;
&lt;p&gt;This application works better on computer than on smartphone because shiny is not fully smartphone friendly. However, Shiny has a user-friendly interface.&lt;/p&gt;
&lt;p&gt;If you want to try your own Shiny app, I advice you to check this &lt;a href=&#34;https://shiny.rstudio.com/gallery/&#34;&gt;gallery&lt;/a&gt;. It contains a lot of examples that will serve as a good introduction.&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analysis of the Renert - Part 3: Visualizations</title>
      <link>/post/2018-01-26-analysis-of-the-renert-part-3/</link>
      <pubDate>Fri, 26 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018-01-26-analysis-of-the-renert-part-3/</guid>
      <description>&lt;!--```{r, echo=FALSE}
knitr::include_graphics(&#34;/images/renert.jpg&#34;)
```--&gt;
&lt;p style=&#34;text-align:center&#34;&gt;
&lt;img src=&#34;/images/renert.jpg&#34; style=&#34;width:60vh; &#34;&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is part 3 of a 3 part blog post. This post uses the data that was scraped in part 1 and prepared in part 2.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now that we have the data in a nice format, let’s make a frequency plot! First let’s load the data and the packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tidyverse&amp;quot;)
library(&amp;quot;ggthemes&amp;quot;) # To use different themes and colors
renert_tokenized = readRDS(&amp;quot;renert_tokenized.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using the &lt;code&gt;ggplot2&lt;/code&gt; package, I can produce a plot of the most frequent words.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_tokenized %&amp;gt;%
  count(word, sort = TRUE) %&amp;gt;%
  filter(n &amp;gt; 50) %&amp;gt;%
  mutate(word = reorder(word, n)) %&amp;gt;%
  ggplot(aes(word, n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
  theme_minimal() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-01-26-analysis-of-the-renert-part-3_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;So, the most frequent word is &lt;em&gt;kinnek&lt;/em&gt;, meaning &lt;em&gt;King&lt;/em&gt;! &lt;em&gt;kinnek&lt;/em&gt; is mentioned more times than &lt;em&gt;renert&lt;/em&gt;, the name of the hero. Next are &lt;em&gt;här&lt;/em&gt; and &lt;em&gt;wollef&lt;/em&gt; meaning &lt;em&gt;mister&lt;/em&gt; and &lt;em&gt;wolf&lt;/em&gt;. In fifth position we have &lt;em&gt;fuuss&lt;/em&gt;, for &lt;em&gt;fox&lt;/em&gt;. I’ll let you use Google Translate for the other words 😄.&lt;/p&gt;
&lt;p&gt;Now, I’m also doing sentiment analysis by using the AFINN list of words. This list of words have a score that gives its sentiment. You can download the original list from &lt;a href=&#34;http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Because such a list is not available in Luxembourguish, I have translated it using Google’s translate api. Here is the code to do that:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tidyverse&amp;quot;)
library(&amp;quot;translate&amp;quot;) # google translate api
library(&amp;quot;tidytext&amp;quot;) # to load the AFINN dictionary

api_key = &amp;quot;api_key_goes_here&amp;quot;

set.key(api_key)

afinn = get_sentiments(&amp;quot;afinn&amp;quot;)

# I wrap the `translate()` function around `purrr::possibly()` so that in case of an
# error, I get the translations that worked back.

possibly_translate = purrr::possibly(translate::translate, otherwise = &amp;quot;error&amp;quot;)

afinn_lux = afinn %&amp;gt;%
  mutate(lux = map(word, possibly_translate, source = &amp;quot;en&amp;quot;, target = &amp;quot;lb&amp;quot;)) %&amp;gt;%
  mutate(lux = unlist(lux))

write_csv(afinn_lux, &amp;quot;afinn_lux.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For the above code to work, you need to have a Google cloud account, which you can create for free.&lt;/p&gt;
&lt;p&gt;I did not check the quality of the translations, and I’m sure it’s far from perfect. It’s also available on the Github repository &lt;a href=&#34;https://github.com/b-rodrigues/stopwords_lu&#34;&gt;here&lt;/a&gt;. Again, contributions more than welcome!&lt;/p&gt;
&lt;p&gt;Now, I need to merge the dictionary with the data from each song. First, let’s load the dictionary:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;afinn_lux = read.csv(&amp;quot;afinn_lux.csv&amp;quot;)

# I only keep the `lux` column (and rename it to word) and the `score column`
afinn_lux = afinn_lux %&amp;gt;%
  select(word = lux, score)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;How does this dictionary look like? Let’s see:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(afinn_lux)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##             word score
## 1       opzeginn    -2
## 2     verloossen    -2
## 3       opzeginn    -2
## 4 entfouert ginn    -2
## 5       entlooss    -2
## 6      entfouert    -2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s load the tokenized songs, and merge them with the dictionary:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs_tokenized = readRDS(&amp;quot;renert_songs_tokenized.rds&amp;quot;)
  
renert_songs_sentiment = map(renert_songs_tokenized, ~full_join(., afinn_lux))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I can now merge the data in a single data frame and do some further cleaning:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs_sentiment = renert_songs_sentiment %&amp;gt;%
  bind_rows() %&amp;gt;%
  filter(!is.na(score)) %&amp;gt;%
  filter(!is.na(gesank))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What does the final data look like? Here it is:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(renert_songs_sentiment)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 3
##   word  gesank  score
##   &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;   &amp;lt;int&amp;gt;
## 1 rifft éischte    -2
## 2 léiw  éischte    -3
## 3 fest  éischte     2
## 4 fest  éischte     2
## 5 räich éischte     2
## 6 räich éischte     3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We see that there are words that are the same, but with different scores. That’s because the translation of the dictionary was most probably not very good. Oh well, let’s do a boxplot of the sentiment for each song:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;order =  c(&amp;quot;éischte&amp;quot;, &amp;quot;zwete&amp;quot;, &amp;quot;drëtte&amp;quot;, &amp;quot;véierte&amp;quot;, &amp;quot;fënnefte&amp;quot;, &amp;quot;sechste&amp;quot;, &amp;quot;siwente&amp;quot;,
                   &amp;quot;aachte&amp;quot;, &amp;quot;néngte&amp;quot;, &amp;quot;zéngte&amp;quot;, &amp;quot;elefte&amp;quot;, &amp;quot;zwielefte&amp;quot;, &amp;quot;dräizengte&amp;quot;, &amp;quot;véierzengte&amp;quot;)

renert_songs_sentiment %&amp;gt;%
  ggplot(aes(gesank, score)) + 
  scale_x_discrete(limits = order) + 
  geom_boxplot() + 
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-01-26-analysis-of-the-renert-part-3_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As we can see, there is no discernible pattern. This can mean two things; either the general sentiment inside each song is fairly neutral, or the the quality of the translation was too bad for the results to make any sense.&lt;/p&gt;
&lt;p&gt;That’s it for this series of posts! I hope you enjoyed reading it as much as I enjoyed writing it and analyzing the data!&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analysis of the Renert - Part 2: Data Processing</title>
      <link>/post/2018-01-24-analysis-of-the-renert-part-2/</link>
      <pubDate>Wed, 24 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018-01-24-analysis-of-the-renert-part-2/</guid>
      <description>&lt;!--```{r, echo=FALSE}
knitr::include_graphics(&#34;/images/renert.jpg&#34;)
```--&gt;
&lt;p style=&#34;text-align:center&#34;&gt;
&lt;img src=&#34;/images/renert.jpg&#34; style=&#34;width:60vh; &#34;&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is part 2 of a 3 part blog post. This post uses the data that we scraped in &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-22-analysis-of-the-renert-part-1/&#34;&gt;part 1&lt;/a&gt; and prepares it for further analysis, which is quite technical. If you’re only interested in the results of the analysis, skip to &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-26-analysis-of-the-renert-part-3/&#34;&gt;part 3&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;First, let’s load the data that we prepared in &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-22-analysis-of-the-renert-part-1/&#34;&gt;part 1&lt;/a&gt;. Let’s start with the full text:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tidyverse&amp;quot;)
library(&amp;quot;tidytext&amp;quot;)
renert = readRDS(&amp;quot;renert_full.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I want to study the frequencies of words, so for this, I will use a function from the &lt;code&gt;tidytext&lt;/code&gt; package called &lt;code&gt;unnest_tokens()&lt;/code&gt; which breaks the text down into tokens. Each token is a word, which will then make it possible to compute the frequencies of words.&lt;/p&gt;
&lt;p&gt;So, let’s unnest the tokens:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert = renert %&amp;gt;%
  unnest_tokens(word, text)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We still need to do some cleaning before continuing. In Luxembourgish, &lt;em&gt;the&lt;/em&gt; is written &lt;em&gt;d’&lt;/em&gt; for feminine nouns. For example &lt;em&gt;d’Kaz&lt;/em&gt; for the &lt;em&gt;the cat&lt;/em&gt;. There’s also a bunch of &lt;em&gt;’t&lt;/em&gt;s in the text, which is &lt;em&gt;it&lt;/em&gt;. For example, the second line of the first song:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;’T stung Alles an der Bléi,&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Everything (it) was on flower,&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;We can remove these with a couple lines of code:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_tokenized = renert %&amp;gt;%
  mutate(word = str_replace_all(word, &amp;quot;d&amp;#39;&amp;quot;, &amp;quot;&amp;quot;)) %&amp;gt;%
  mutate(word = str_replace_all(word, &amp;quot;&amp;#39;t&amp;quot;, &amp;quot;&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But that’s not all! We still need to remove so called stop words. Stop words are words that are very frequent, such as “and”, and these words usually do not add anything to the analysis. There are no set rules for defining a list of stop words, so I took inspiration for the stop words in English and German, and created my own, which you can get on &lt;a href=&#34;https://github.com/b-rodrigues/stopwords_lu&#34;&gt;Github&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;stopwords = read.csv(&amp;quot;stopwords_lu.csv&amp;quot;, header = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For my Luxembourgish-speaking compatriots, I’d be glad to get help to make this list better! This list is far from perfect, certainly contains typos, or even words that have no reason to be there! Please help 😅.&lt;/p&gt;
&lt;p&gt;Using this list of stop words, I can remove words that don’t add anything to the analysis. Creating a list of stop words for the Luxembourgish language is very challenging, because there might be stop words that come from German, such as “awer”, from the German “aber”, meaning &lt;em&gt;but&lt;/em&gt;, but you could also use “mä”, from the French &lt;em&gt;mais&lt;/em&gt;, meaning also but. Plus, as a kid, we never really learned how to write Luxembourgish. Actually, most Luxembourguians don’t know how to write Luxembourgish 100% correctly. This is because for a very long time, Luxembourgish was used for oral communication, and French for formal written correspondence. This is changing, and more and more people are learning how to write correctly. I definitely have a lot to learn! Thus, I have certainly missed a lot of stop words in the list, but I am hopeful that others will contribute to the list and make it better! In the meantime, that’s what I’m going to use.&lt;/p&gt;
&lt;p&gt;Let’s take a look at some lines of the stop words data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(stopwords, 20)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##          word
## 1           a
## 2           à
## 3         äis
## 4          är
## 5         ärt
## 6        äert
## 7        ären
## 8         all
## 9       allem
## 10      alles
## 11   alleguer
## 12        als
## 13       also
## 14         am
## 15         an
## 16 anerefalls
## 17        ass
## 18        aus
## 19       awer
## 20        bei&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can remove the stop words from our tokens using an &lt;code&gt;anti_join()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_tokenized = renert_tokenized %&amp;gt;%
  anti_join(stopwords)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Joining, by = &amp;quot;word&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Column `word` joining character vector and factor, coercing into
## character vector&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s save this for later use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;saveRDS(renert_tokenized, &amp;quot;renert_tokenized.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I now have to do the same for the data that is stored by song. Because this is a list where each element is a data frame, I have to use &lt;code&gt;purrr::map()&lt;/code&gt; to map each of the functions I used before to each data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs = readRDS(&amp;quot;renert_songs_df.rds&amp;quot;)
renert_songs = map(renert_songs, ~unnest_tokens(., word, text))
renert_songs = map(renert_songs, ~anti_join(., stopwords))
renert_songs = map(renert_songs, ~mutate(., word = str_replace_all(word, &amp;quot;d&amp;#39;&amp;quot;, &amp;quot;&amp;quot;)))
renert_songs = map(renert_songs, ~mutate(., word = str_replace_all(word, &amp;quot;&amp;#39;t&amp;quot;, &amp;quot;&amp;quot;)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s take a look at the object we have:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(renert_songs[[1]])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 1
##   word     
##   &amp;lt;chr&amp;gt;    
## 1 éischte  
## 2 gesank   
## 3 edit     
## 4 päischten
## 5 stung    
## 6 bléi&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looks pretty nice! But I can make it nicer by adding a column containing which song the data refers to. Indeed, the first line of each data frame contains the number of the song. I can extract this information and add it to each data set:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs = map(renert_songs, ~mutate(., gesank = pull(.[1,1])))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s take a look again:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(renert_songs[[1]])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 2
##   word      gesank 
##   &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;  
## 1 éischte   éischte
## 2 gesank    éischte
## 3 edit      éischte
## 4 päischten éischte
## 5 stung     éischte
## 6 bléi      éischte&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I can save this object for later use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;saveRDS(renert_songs, &amp;quot;renert_songs_tokenized.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the final part of this series, I will use the tokenized data as well as the list of songs to create a couple of visualizations!&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analysis of the Renert - Part 1: Scraping</title>
      <link>/post/2018-01-22-analysis-of-the-renert-part-1/</link>
      <pubDate>Mon, 22 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018-01-22-analysis-of-the-renert-part-1/</guid>
      <description>&lt;!--```{r, echo=FALSE}
knitr::include_graphics(&#34;/images/renert.jpg&#34;)
```--&gt;
&lt;p style=&#34;text-align:center&#34;&gt;
&lt;img src=&#34;/images/renert.jpg&#34; style=&#34;width:60vh; &#34;&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is part 1 of a 3 part blog post. This post presents the Luxembourgish language as well as the literary work I am going to analyze using the R programming language. &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-24-analysis-of-the-renert-part-2/&#34;&gt;Part 2&lt;/a&gt; deals with preparing the data for analysis, and finally &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-26-analysis-of-the-renert-part-3/&#34;&gt;part 3&lt;/a&gt; is the analysis. Hope you enjoy!&lt;/em&gt;&lt;/p&gt;
&lt;div id=&#34;luxembourg-and-the-luxembourgish-language&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Luxembourg and the Luxembourgish language&lt;/h2&gt;
&lt;p&gt;Luxembourg is a small European country, squeezed between France, Belgium and Germany. Over the course of its history, it’s been invaded over and over by either France or Prussia (later Germany). It eventually became a state under the personal possession of William I of the Netherlands in 1815, with a… Prussian garrison to guard its capital, Luxembourg City, from further French invasions. After the Belgian revolution of 1839, the purely French-speaking part of the country was ceded to Belgium and the Luxembourgish-speaking part became what is known today as the Grand-Duchy of Luxembourg. What’s a Grand-Duchy you might wonder? &lt;br&gt;&lt;br&gt; Luxembourg is the only remaining Grand-Duchy in the world. A Grand-Duchy is like a Kingdom, but instead of a King, we have a Grand Duke. The current monarch is Henri, which means that Luxembourg is a constitutional monarchy with the head of state being the prime minister, Xavier Bettel. As you can imagine, Luxembourg’s history has had a very important impact on the languages we speak today in the country; there are three official languages, French, German, and Luxembourgish. Unlike other countries with several official languages, in Luxembourg, there is not a French, or German, or Luxembourgish speaking part. In Luxembourg, you use one of the three languages based on context.&lt;br&gt;&lt;br&gt; For example, the laws are all written in French, and French is mostly the language used for official or formal written correspondence.German has traditionally been the language of the press and the police. And finally Luxembourgish is the language Luxembourguians use to speak with one another. This means that on a given day, most people here might switch between these three languages; of course, add English to the pile, which is rapidly growing in the country due to all the English speaking expats that come here to work (&lt;em&gt;cough&lt;/em&gt;brexit&lt;em&gt;cough&lt;/em&gt;).&lt;br&gt;&lt;br&gt; There is also a sizable Portuguese community in Luxembourg, so you’ll hear a lot of Portuguese on the streets too, as well as Italian. Around 50% of the inhabitants of Luxembourg are foreign born, mostly from other EU countries. The Italians, Portuguese and a lot of others have emigrated to Luxembourg starting in the 60s to work in the metallurgic sector, and later, in the construction sector. The children of these emigrants usually speak five languages; their mother tongue, say, Portuguese, the three official languages of the country, and finally English. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;You might wonder what Luxembourgish sounds like? Here is a video of our Prime Minister talking in Luxembourgish:&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;100%&#34; src=&#34;https://www.youtube.com/embed/NnUf6nkZInM&#34; frameborder=&#34;0&#34; allowfullscreen style=&#34;max-width:100%; height:55vh;&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;Here is another video of him speaking French:&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;100%&#34; src=&#34;https://www.youtube.com/embed/U4G8P_z84GU&#34; frameborder=&#34;0&#34; allowfullscreen style=&#34;max-width:100%; height:55vh;&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;Here he’s speaking German :&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;100%&#34; src=&#34;https://www.youtube.com/embed/tblafXTQ2_w?start=120&#34; frameborder=&#34;0&#34; allowfullscreen style=&#34;max-width:100%; height:55vh;&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;And here English :&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;100%&#34; src=&#34;https://www.youtube.com/embed/-ensRTwpjXk?start=185&#34; frameborder=&#34;0&#34; allowfullscreen style=&#34;max-width:100%; height:55vh;&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;On the English video, you might notice the typical accent Luxembourguians have when speaking English 😄&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-text-were-analysing&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The text we’re analysing&lt;/h2&gt;
&lt;p&gt;The text I’ll be analyzing is called &lt;em&gt;Renert oder de Fuuss am Frack an a Maansgréisst&lt;/em&gt;, published in 1872 by Michel Rodange. My high school was named after Michel Rodange by the way! &lt;em&gt;Renert&lt;/em&gt; is a fable featuring a sly fox as the main character, called Renert. He gets in trouble because of his shenanigans and gets sentenced to death by the Lion King. However, through further lies and deceptions, he manages to escape. After some tribulations, he proves his worth to the King by winning a duel against the wolf and becomes an aristocrat. Because it was written in the 19th century, the way some words are written may be different that how we write them in modern Luxembourgish, which might create some problems when analyzing the text.&lt;/p&gt;
&lt;p&gt;Now starts the technical part. If you’re only interested in the results, you can skip to &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-26-analysis-of-the-renert-part-3/&#34;&gt;part 3&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;scraping-the-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Scraping the data&lt;/h2&gt;
&lt;p&gt;First of all, let’s load (or install if you don’t have them) the needed packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(c(&amp;quot;tidyverse&amp;quot;,
                   &amp;quot;tidytext&amp;quot;,
                   &amp;quot;janitor&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(&amp;quot;tidyverse&amp;quot;)
library(&amp;quot;tidytext&amp;quot;)
library(&amp;quot;janitor&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;tidyverse&lt;/code&gt; is a collection of packages that are very useful for a lot of different tasks. If you are not familiar with these packages, check out the tidyverse &lt;a href=&#34;https://www.tidyverse.org/&#34;&gt;website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;tidytext&lt;/code&gt; is a package that uses the same principles than the &lt;code&gt;tidyverse&lt;/code&gt;, but for text analysis. You can learn more about it &lt;a href=&#34;https://www.tidytextmining.com/&#34;&gt;here&lt;/a&gt; which is the book I took inspiration from for this series of blog posts.&lt;/p&gt;
&lt;p&gt;The full text of the Renert is available &lt;a href=&#34;https://wikisource.org/wiki/Renert&#34;&gt;here&lt;/a&gt;, so I’m going to use &lt;code&gt;rvest&lt;/code&gt;, to get the text into R:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_link = &amp;quot;https://wikisource.org/wiki/Renert&amp;quot;

renert_raw = renert_link %&amp;gt;%
  xml2::read_html() %&amp;gt;%
  rvest::html_nodes(&amp;quot;.mw-parser-output&amp;quot;) %&amp;gt;%
  rvest::html_text() %&amp;gt;%
  str_split(&amp;quot;\n&amp;quot;, simplify = TRUE) %&amp;gt;%
  .[1, -c(1:24)]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I download the text using &lt;code&gt;read_html()&lt;/code&gt; from the &lt;code&gt;xml2&lt;/code&gt; package (which gets loaded by the &lt;code&gt;tidyverse&lt;/code&gt;) and then find the nodes that interest me, in this case &lt;code&gt;mw-parser-output&lt;/code&gt;. Then I extract the text from this node, and split it on the &lt;code&gt;\n&lt;/code&gt; character, to get a big vector where each element is a line of text. I also remove the 24 first lines, which are mostly blank. Let’s take a look at the first five lines:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_raw[1:5]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;Éischte Gesank.[edit]&amp;quot;       &amp;quot;&amp;quot;                           
## [3] &amp;quot;Et war esou ëm d&amp;#39;Päischten,&amp;quot; &amp;quot;&amp;#39;T stung Alles an der Bléi,&amp;quot;
## [5] &amp;quot;An d&amp;#39;Villercher di songen&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Renert is divided into 14 songs, so I’d like to create a list with 14 elements, where each element is the text of a song. Every song is titled “First Song”, “Second Song” etc, so I first check on which lines I find the word &lt;em&gt;Gesank&lt;/em&gt;, which identifies the start of a &lt;em&gt;song&lt;/em&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(indices = grepl(&amp;quot;Gesank&amp;quot;, renert_raw) %&amp;gt;% which(isTRUE(.)))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1]    1  605  885 1172 1555 1906 2441 2664 2995 3686 4214 4625 5116 5963&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;indices&lt;/code&gt; contains the indices of where the songs start. So I need to create the indices of when the songs end. If you think about it, the first songs ends where the second song begins, minus 1. So I create a new vector of indices, by first removing the index for the first song, substracting 1, and then adding the index for the last line (using &lt;code&gt;length(renert_raw)&lt;/code&gt;).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(indices2 = c(indices[-1] - 1, length(renert_raw)))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  [1]  604  884 1171 1554 1905 2440 2663 2994 3685 4213 4624 5115 5962 6506&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I can now create a list of sequences, called &lt;code&gt;song_lines&lt;/code&gt; which contains the indices for all the songs:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;song_lines = map2(indices, indices2,  ~seq(.x,.y))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And using this list of indices, I can now extract the songs into a list:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs = map(song_lines, ~`[`(renert_raw, .))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’ll save this object for later use, using &lt;code&gt;saveRDS()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;saveRDS(renert_songs, &amp;quot;renert_songs.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I will also save a version of the above list, but where each element of the list is a data frame. This will make analysis much easier later.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_songs_df = map(renert_songs, ~data_frame(text = .))
saveRDS(renert_songs_df, &amp;quot;renert_songs_df.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I also need to have the full text as a single character object, so I reduce my list into a single object and also save it:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;renert_full = reduce(renert_songs, c)

renert_full = data_frame(text = renert_full) %&amp;gt;%
  filter(!grepl(&amp;quot;Gesank&amp;quot;, text))

saveRDS(renert_full, &amp;quot;renert_full.rds&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is the end of part 1. In &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-24-analysis-of-the-renert-part-2/&#34;&gt;part 2&lt;/a&gt;, we are going to prepare the data for analysis, and in &lt;a href=&#34;http://www.blog.rdata.lu/post/2018-01-26-analysis-of-the-renert-part-3/&#34;&gt;part 3&lt;/a&gt; we are going to analyze it. &lt;br&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Map unemployment using R with ggplot2</title>
      <link>/post/2018-01-03-mapping-unemployment-luxembourg/</link>
      <pubDate>Wed, 03 Jan 2018 00:00:00 +0000</pubDate>
      
      <guid>/post/2018-01-03-mapping-unemployment-luxembourg/</guid>
      <description>&lt;p&gt;In this blog post, I show various ways to create maps using R. You’ll need to install a lot of packages and download two data sets; the unemployment rate in Luxembourg as well as a shapefile.&lt;/p&gt;
&lt;p&gt;To get the unemployment rate in Luxembourg, you can take a look at our &lt;a href=&#34;http://www.blog.rdata.lu/post/2017-08-21-scraping-data-from-statec-s-public-tables/&#34;  target=&#34;_blank&#34;&gt;previous blog post&lt;/a&gt; or simply run the following lines:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rvest)
library(dplyr)
library(purrr)
library(janitor)
library(tidyr)

page_unemp = read_html(&amp;quot;http://www.statistiques.public.lu/stat/TableViewer/tableViewHTML.aspx?ReportId=12950&amp;amp;IF_Language=eng&amp;amp;MainTheme=2&amp;amp;FldrName=3&amp;amp;RFPath=91&amp;quot;)

data_raw = page_unemp %&amp;gt;%
  html_nodes(&amp;quot;.b2020-datatable&amp;quot;) %&amp;gt;% .[[1]] %&amp;gt;% html_table(fill = TRUE)

colnames(data_raw) = data_raw[1, ]

colnames(data_raw)[1:2] = c(&amp;quot;division&amp;quot;, &amp;quot;variable&amp;quot;)

data_raw = data_raw[-c(1,2), ]

unemp_lux = data_raw %&amp;gt;%
  map_df(function(x)(gsub(&amp;quot;,&amp;quot;, &amp;quot;.&amp;quot;, x = x))) %&amp;gt;%
  mutate_at(vars(matches(&amp;quot;\\d{4}&amp;quot;)), as.numeric) %&amp;gt;%
  gather(key=year, value, -division, -variable) %&amp;gt;%
  spread(variable, value) %&amp;gt;%
  clean_names()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These lines scrape the data off STATEC’s (the national institute of statistics) public tables and puts the raw data into a tidy data frame. Let’s take a look:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(unemp_lux)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   division year active_population of_which_non_wage_earners
## 1 Beaufort 2001               688                        85
## 2 Beaufort 2002               742                        85
## 3 Beaufort 2003               773                        85
## 4 Beaufort 2004               828                        80
## 5 Beaufort 2005               866                        96
## 6 Beaufort 2006               893                        87
##   of_which_wage_earners total_employed_population unemployed
## 1                   568                       653         35
## 2                   631                       716         26
## 3                   648                       733         40
## 4                   706                       786         42
## 5                   719                       815         51
## 6                   746                       833         60
##   unemployment_rate_in_percent
## 1                         5.09
## 2                         3.50
## 3                         5.17
## 4                         5.07
## 5                         5.89
## 6                         6.72&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once you have the unemployment data, install the next packages you’ll need to follow the rest of the post:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(
  c(
    &amp;quot;viridis&amp;quot;,  # Optional, but better color scheme than the default
    &amp;quot;broom&amp;quot;,    # For tidy()
    &amp;quot;ggplot2&amp;quot;,  # To create a basic map
    &amp;quot;ggthemes&amp;quot; # To change the theme of the map
    )
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then install two further packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;#39;rgeos&amp;#39;, type=&amp;#39;source&amp;#39;) # Dependency of rgdal
install.packages(&amp;#39;rgdal&amp;#39;, type=&amp;#39;source&amp;#39;) # To read in the shapefile&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;rgdal&lt;/code&gt; might be tricky to install on macOS and Linux. If you’re using Ubuntu, you have to install &lt;code&gt;libgdal-dev&lt;/code&gt;, and on macOS you’ll need to install &lt;code&gt;gdal&lt;/code&gt; using Homebrew.&lt;/p&gt;
&lt;p&gt;There’s a final package to install, but you have to get it from Github (and thus need &lt;code&gt;devtools&lt;/code&gt;):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;devtools::install_github(&amp;quot;dgrtwo/gganimate&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To draw a map, you will need a so-called shapefile. These files contain the geometry of the countries, regions, etc so that it is possible to plot them. The shapefile for Luxembourg can be obtained from &lt;a href=&#34;https://data.public.lu/en/datasets/limites-administratives-du-grand-duche-de-luxembourg/&#34;&gt;Luxembourg’s Open data Portal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Download the zip, and look for the file called &lt;code&gt;LIMADM_COMMUNES.shp&lt;/code&gt;, which contains the geometry of the Luxembourgish communes. Leave it inside the folder &lt;code&gt;Limadmin_SHP&lt;/code&gt;, as it contains other files needed by &lt;code&gt;rgdal::readOGR()&lt;/code&gt; to read in the shapefile.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(broom)
library(dplyr)
library(purrr)
library(ggplot2)
library(viridis)
library(rgdal)
library(ggthemes)
library(gganimate)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can read the data, and do some basic cleaning. I comment every step, but run the code line by line to really understand what’s going on!&lt;/p&gt;
&lt;p&gt;Read the shapefile:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;communes = readOGR(&amp;quot;Limadmin_SHP/LIMADM_COMMUNES.shp&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## OGR data source with driver: ESRI Shapefile
## Source: &amp;quot;Limadmin_SHP/LIMADM_COMMUNES.shp&amp;quot;, layer: &amp;quot;LIMADM_COMMUNES&amp;quot;
## with 105 features
## It has 4 fields&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in readOGR(&amp;quot;Limadmin_SHP/LIMADM_COMMUNES.shp&amp;quot;): Z-dimension
## discarded&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;“Convert” it to a data frame using broom::tidy(). In the past, this was made with &lt;code&gt;ggplot2::fortify()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;communes_df = broom::tidy(communes, region = &amp;quot;COMMUNE&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Remove the &lt;em&gt;cantons&lt;/em&gt; from the data, as well as the unemployment rate for the whole country. Then only select the relevant columns and rename them in one go:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;unemp_lux = unemp_lux %&amp;gt;%
  filter(!grepl(&amp;quot;Canton&amp;quot;, division), division != &amp;quot;Grand Duchy of Luxembourg&amp;quot;) %&amp;gt;%
  select(commune = division, year, unemp_rate = unemployment_rate_in_percent)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The name of two communes are written differently in the shapefile than in the data. We can change that using &lt;code&gt;gsub()&lt;/code&gt;. Change “Haute-Sûre” to “Haute Sûre” in the unemployment data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;unemp_lux$commune = gsub(&amp;quot;Haute-Sûre&amp;quot;, &amp;quot;Haute Sûre&amp;quot;, unemp_lux$commune)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Change “Redange” to “Redange-sur-Attert” in the data frame containing the geometry of the communes:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;communes_df$id = gsub(&amp;quot;Redange&amp;quot;, &amp;quot;Redange-sur-Attert&amp;quot;, communes_df$id)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Select relevant columns from the communes data frame, and rename them:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;communes_df = communes_df %&amp;gt;%
    select(long, lat, commune = id) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, join the communes data frame (containing the geometry) with the unemployment data, by communes:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;final_data = left_join(communes_df, unemp_lux, by = &amp;quot;commune&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s plot the unemployment rate for the latest available year:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;final_data2016 = final_data %&amp;gt;%
  filter(year == 2016)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, let’s plot a basic map using &lt;code&gt;ggplot2&lt;/code&gt;. Even if you’re not familiar with &lt;code&gt;ggplot2&lt;/code&gt; the code below should be very straightforward:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ggplot_map = ggplot() +
  geom_polygon(data = final_data2016,
               aes(x = long, y = lat, group = commune, fill = unemp_rate)) +
    labs(title = &amp;quot;Unemployment rate in Luxembourg in 2016&amp;quot;,
         y = &amp;quot;&amp;quot;, x = &amp;quot;&amp;quot;, fill = &amp;quot;Unemployment rate&amp;quot;) +
    theme_tufte() +
    theme(axis.text.x = element_blank(),
          axis.ticks.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank()) +
    scale_fill_viridis()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, print the map:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;print(ggplot_map)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-01-03-mapping-unemployment-luxembourg_files/figure-html/unnamed-chunk-18-1.jpg&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;It is also possible to create a map per year using &lt;code&gt;facet_wrap()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;facet_map = ggplot() +
  geom_polygon(data = final_data,
               aes(x = long, y = lat,
                   group = commune, fill = unemp_rate)) +
    labs(title = &amp;quot;Unemployment rate in Luxembourg&amp;quot;, y = &amp;quot;&amp;quot;, x = &amp;quot;&amp;quot;, fill = &amp;quot;Unemployment rate&amp;quot;) +
    theme_tufte() +
    theme(axis.text.x = element_blank(),
          axis.ticks.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank()) +
    facet_wrap(~year) +
    scale_fill_viridis()

print(facet_map)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2018-01-03-mapping-unemployment-luxembourg_files/figure-html/unnamed-chunk-19-1.jpg&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We clearly see that unemployment has risen in Luxembourg these past 15 years. This series of maps are great for printing, but since you’re reading this on a screen, why not try to animate these maps? This is possible with &lt;code&gt;gganimate()&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(gganimate)

map_anim = ggplot() +
    geom_polygon(data = final_data,
                 aes(x = long, y = lat, group = group, fill = unemp_rate, frame = year)) +
    labs(title = &amp;quot;Unemployment rate in Luxembourg&amp;quot;, y = &amp;quot;&amp;quot;, x = &amp;quot;&amp;quot;, fill = &amp;quot;Unemployment rate&amp;quot;) +
    theme_tufte() +
    theme(axis.text.x = element_blank(),
          axis.ticks.x = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank()) +
    scale_fill_viridis()


gganimate(map_anim, &amp;quot;map_lux.mp4&amp;quot;, interval = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can create an &lt;code&gt;.mp4&lt;/code&gt; video as well as a &lt;code&gt;.gif&lt;/code&gt;. Just change the extension inside the &lt;code&gt;gganimate()&lt;/code&gt; function. &lt;/p&gt;
  &lt;p style=&#34;text-align:center;&#34;&gt;&lt;img style=&#34;width: 20rem;&#34; src=&#34;/images/map_lux.gif&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That’s it for now. You can also check our &lt;a href=&#34;http://blog.rdata.lu/visualization/unemployment/&#34; target=&#34;_blank&#34;&gt;interactive map of unemployment&lt;/a&gt;  in our visualization.
  &lt;!-- In the next post, I will show you how to create interactive maps using R and javascript! --&gt;
&lt;/p&gt;
&lt;p&gt;Don&#39;t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;@rdata_lu&lt;/a&gt;
  &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt;
  and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt;
  You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Visualizing box office revenue by genre</title>
      <link>/post/2017-12-04-visualizing-box-office-revenue-by-genre/</link>
      <pubDate>Mon, 04 Dec 2017 06:34:55 +0200</pubDate>
      
      <guid>/post/2017-12-04-visualizing-box-office-revenue-by-genre/</guid>
      <description>&lt;p&gt;After having watched Justice League in cinema, I was impressed by all of the special effects and how good they were. I started wondering myself: How much does a movie like that cost? And most importantly, how big is the box-office revenue for this kind of blockbuster? I found an answer in &lt;a href=&#34;http://www.the-numbers.com/movie/budgets/all&#34;&gt;The Numbers&lt;/a&gt;. I have then decided to make a database from the data available on this website. I have retrieved the 500th biggest movie budgets. Initially I just had a database with 5 variables on movies:&lt;br&gt; • the release date&lt;br&gt; • the name &lt;br&gt; • the production budget &lt;br&gt; • the dosmestic gross &lt;br&gt; • the worldwide gross &lt;br&gt; Thereafter, I crossed sources to get more variables. Data was scrapped on Wikipedia and IMDb. We finally get a dataset with 30 variables such as lists of actors, affiches url, distributions, rate and the number of raters from IMDb , etc…&lt;br&gt; You can find a complete description of the dataset on &lt;a href=&#34;https://github.com/krosamont/Cinema&#34;&gt;GitHub&lt;/a&gt;. All the data was scrapped via the package &lt;code&gt;rvest&lt;/code&gt;.&lt;br&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;In this post, I describe the different steps leading to the treemap: &lt;br&gt;&lt;/p&gt;
&lt;div id=&#34;tmp1&#34; class=&#34;tmap&#34;&gt;

&lt;/div&gt;
&lt;div id=&#34;starting-point&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;STARTING POINT&lt;/h1&gt;
&lt;p&gt;First of all we read the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;db = read.csv(&amp;quot;https://cdn.rawgit.com/krosamont/Cinema/dd7eca65/moviedb500.csv&amp;quot;,
              stringsAsFactors = FALSE)
#You can excecute the following line to have more information about the variable type.
#str(db) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we want to transform variables related to money in numeric variables and the movie realease dates in date variable using &lt;code&gt;tidyverse&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyverse)

db = db %&amp;gt;%
        mutate( Release.Date = as.Date(Release.Date, &amp;quot;%m/%d/%Y&amp;quot;), 
                Running.time = as.numeric(stringr::str_sub(Running.time,1,3)),
                Rate = as.numeric(Rate),
                Raters = as.numeric(gsub(&amp;quot;,&amp;quot;, &amp;quot;&amp;quot;, Raters)),
                Production.Budget = as.numeric(gsub(&amp;quot;[,$]&amp;quot;, &amp;quot;&amp;quot;,
                                                 Production.Budget)),
                Domestic.Gross = as.numeric(gsub(&amp;quot;[,$]&amp;quot;, &amp;quot;&amp;quot;,
                                                 Domestic.Gross)),
                Worldwide.Gross = as.numeric(gsub(&amp;quot;[,$]&amp;quot;, &amp;quot;&amp;quot;,
                                                 Worldwide.Gross)) ) %&amp;gt;%
        arrange(desc(Worldwide.Gross))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The dataset looks better. As you have seen on top of this post. We want to design a treemap chart to visualize box-office revenue by genre. Let’s see how many movie genres are present in the data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;UniqueGenres = unique(db$Genres)
length(UniqueGenres)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 224&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(UniqueGenres, 5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;Action Adventure Fantasy Sci-Fi&amp;quot;                  
## [2] &amp;quot;Action Adventure Sci-Fi&amp;quot;                          
## [3] &amp;quot;Action Crime Thriller&amp;quot;                            
## [4] &amp;quot;Adventure Drama Fantasy Mystery&amp;quot;                  
## [5] &amp;quot;Animation Adventure Comedy Family Fantasy Musical&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 224 combinations of genres, which is way too many combinations. We need to reduce them in a way that each movie has 2 genres at the most: A main genre and a subgenre.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;main-genres&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;MAIN GENRES&lt;/h1&gt;
&lt;p&gt;Let’s start with a simple barplot to visualize the most-represented genre from the 224 combinations.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggthemes)

all_genres = separate_rows(db %&amp;gt;% 
                           group_by(Genres) %&amp;gt;% 
                           select(Genres) %&amp;gt;% 
                           filter(row_number() ==1),
                           Genres, sep=&amp;quot;[[:space:]]&amp;quot;)

name_order = names(sort(table(all_genres)))

ggplot(all_genres, aes(Genres)) +
                theme_minimal( ) + 
        geom_bar( stat = &amp;quot;count&amp;quot;, fill=&amp;quot;#007acc&amp;quot; ) +
        coord_flip() +
        scale_x_discrete(limits = name_order)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-12-04-visualizing-box-office-revenue-by-genre_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We see that Adventure and Action are the most important genres, followed by those between Comedy and Sci-fi. The genres that come after Sci-fi are present in less than 60 combinations of genres. Hence we will consider them as subgenres. We have 8 main genres: &lt;br&gt; • Adventure &lt;br&gt; • Action &lt;br&gt; • Comedy &lt;br&gt; • Drama &lt;br&gt; • Family &lt;br&gt; • &lt;del&gt;Fantasy&lt;/del&gt; &lt;br&gt; • Thriller &lt;br&gt; • &lt;del&gt;Sci-Fi&lt;/del&gt; &lt;br&gt; But we also know that Sci-Fi and Fantasy can be seen as subgenres from Adventure or Action. Therefore, we finally keep 6 genres. &lt;br&gt; We have to check that all movies can have a main genre from the 6 genres that we have choosen. For that, we simply check that each combination have at least one of the main genre :&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mainGenres= paste(c(&amp;quot;Adventure&amp;quot;, &amp;quot;Action&amp;quot;,  &amp;quot;Comedy&amp;quot;, 
                    &amp;quot;Drama&amp;quot;, &amp;quot;Family&amp;quot;, &amp;quot;Thriller&amp;quot;),
                  collapse=&amp;quot;|&amp;quot;)

# grepl returns true for each genre combination if at least one of the main genre is present
length(grepl(mainGenres, db$Genres))/length(db$Genres)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Apparently, this is the case :)&lt;/p&gt;
&lt;div id=&#34;first-reduction&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;FIRST REDUCTION&lt;/h3&gt;
&lt;p&gt;We finally add a main genre to all movies.&lt;br&gt; &lt;strong&gt;Be careful, The main genre of each movie will depend on the order in which you attribute the main genre. So the final shape of the output will depend on this step.&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#1
db$Genresl1=ifelse(grepl(&amp;quot;Family&amp;quot;,db$Genres),
                   &amp;quot;Family&amp;quot;, db$Genres)

#2
db$Genresl1=ifelse(grepl(&amp;quot;Drama&amp;quot;, db$Genresl1),
                   &amp;quot;Drama&amp;quot;, db$Genresl1)

#3
db$Genresl1=ifelse( grepl(&amp;quot;Thriller&amp;quot;, db$Genresl1),
                    &amp;quot;Thriller&amp;quot;, db$Genresl1)

#4
db$Genresl1=ifelse(grepl(&amp;quot;Action&amp;quot;, db$Genresl1),
                   &amp;quot;Action&amp;quot;, db$Genresl1)

#5
db$Genresl1 =ifelse(grepl(&amp;quot;Adventure&amp;quot;, db$Genresl1),
                    &amp;quot;Adventure&amp;quot;, db$Genresl1)

#6
db$Genresl1=ifelse(grepl(&amp;quot;Comedy&amp;quot;, db$Genresl1),
                   &amp;quot;Comedy&amp;quot;, db$Genresl1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that the main genre were attributed, let’s focus on the subgenre.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;subgenres&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;SUBGENRES&lt;/h1&gt;
&lt;p&gt;We have seen that only 6 genres could be considered as main genres. However, in this part we will consider that all genres can be considered as subgenres. Now one of the difficulties is to decide which subgenre to select when there is more than one option. Association rules can help us in this task. We can see which subgenres are the most present for each genre and their level of dependency.&lt;/p&gt;
&lt;div id=&#34;association-rules&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;ASSOCIATION RULES&lt;/h2&gt;
&lt;p&gt;Let’s analyze the different genre combinations through an association rule analysis. We need first to read data as transaction. For that we use the package &lt;code&gt;arules&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(arules)
#no duplicate combinations!
item_genres = read.transactions(&amp;quot;https://cdn.rawgit.com/krosamont/Cinema/dd7eca65/itemGenres.csv&amp;quot;,
                                format = &amp;quot;basket&amp;quot;, sep=&amp;quot;:&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this post, we will focus ourselves on 2 association rule indicators: &lt;strong&gt;the support&lt;/strong&gt; and &lt;strong&gt;the confidence&lt;/strong&gt;. &lt;br&gt; Support and confidence are displayed like the result bellow when the function &lt;code&gt;arules::rules&lt;/code&gt; is used. &lt;br&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;##     lhs              rhs      support     confidence lift     count
## [1] {Documentary} =&amp;gt; {Drama}  0.004444444 1.0000000  2.777778  1   
## [2] {War}         =&amp;gt; {Drama}  0.057777778 0.9285714  2.579365 13   
## [3] {History}     =&amp;gt; {Drama}  0.080000000 0.9473684  2.631579 18   
## [4] {Animation}   =&amp;gt; {Family} 0.208888889 0.9591837  2.731852 47&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;• &lt;strong&gt;Support&lt;/strong&gt; indicates how frequently genres in columns lhs and rhs appear together in the 224 combinations. The first row of the result above means that War and Drama appear together in 5,78% of combinations.&lt;/p&gt;
&lt;p&gt;• &lt;strong&gt;Confidence&lt;/strong&gt; is an indication of how often the rule has been found to be true. It can also be seen as a conditional probability. { X =&amp;gt; Y } means P(Y | X). This is the probability that the genre Y is also present when we already know that genre X is present. { War =&amp;gt; Drama } = 0.929 from the second line of the result above means that Drama will be present in 92,9% of combination where War is present.&lt;br&gt; &lt;strong&gt;But be carefull, this relation is not neccesarly true in the opposite direction!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To see all association rules starting from a confidence level of 30% between 2 genres we write: &lt;br&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rules = apriori(item_genres, 
                parameter=list(support=(1/nrow(item_genres)), 
                confidence=0.3, minlen=2, maxlen=2)  )
ins_rules = inspect(rules) 

ins_rules&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we want to focus on the relationship between subgenres and main genres, we can filter the rhs columns.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mainGenres = unlist(strsplit(mainGenres, &amp;quot;|&amp;quot;, fixed = TRUE))
ins_rules = ins_rules %&amp;gt;% 
        #removing the arrow =&amp;gt;
        .[,-2] %&amp;gt;%
        #removing the brackets for both columns, lhs and rhs
        mutate(lhs = trimws(gsub(&amp;quot;\\{|\\}&amp;quot;,&amp;quot;&amp;quot;,lhs)),
               rhs = trimws(gsub(&amp;quot;\\{|\\}&amp;quot;,&amp;quot;&amp;quot;,rhs))) %&amp;gt;%
        filter(rhs %in% mainGenres) %&amp;gt;%
        group_by(lhs) %&amp;gt;%
        filter(row_number() == 3) %&amp;gt;%
        arrange(lhs, desc(confidence))

ins_rules&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 17 x 6
## # Groups:   lhs [17]
##    lhs       rhs       support confidence  lift count
##    &amp;lt;chr&amp;gt;     &amp;lt;chr&amp;gt;       &amp;lt;dbl&amp;gt;      &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
##  1 Adventure Action     0.293       0.574 1.17  66.0 
##  2 Animation Adventure  0.169       0.776 1.52  38.0 
##  3 Biography Adventure  0.0133      0.333 0.652  3.00
##  4 Comedy    Adventure  0.187       0.512 1.00  42.0 
##  5 Crime     Comedy     0.0489      0.355 0.974 11.0 
##  6 Drama     Adventure  0.133       0.370 0.725 30.0 
##  7 Family    Adventure  0.249       0.709 1.39  56.0 
##  8 Fantasy   Action     0.129       0.387 0.791 29.0 
##  9 History   Adventure  0.0356      0.421 0.824  8.00
## 10 Musical   Family     0.0622      0.875 2.49  14.0 
## 11 Mystery   Adventure  0.0667      0.500 0.978 15.0 
## 12 Romance   Family     0.0533      0.300 0.854 12.0 
## 13 Sci-Fi    Family     0.0889      0.312 0.890 20.0 
## 14 Sport     Family     0.0222      0.500 1.42   5.00
## 15 Thriller  Adventure  0.138       0.431 0.842 31.0 
## 16 War       Adventure  0.0222      0.357 0.699  5.00
## 17 Western   Adventure  0.0222      0.625 1.22   5.00&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;barplot&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;BARPLOT&lt;/h3&gt;
&lt;p&gt;We create a new variable that we named: &lt;code&gt;withoutMainGenres&lt;/code&gt;. This variable is the combination of genres without the main genre. If a movie has the combination: “Drama War Action Biography” and his main genre is “Drama”, then value of &lt;code&gt;withoutMainGenres&lt;/code&gt; will be “War Action Biography”. If it’s not clear enough, I suggest that you run the code and to compare the variables &lt;code&gt;withoutMainGenres&lt;/code&gt; and &lt;code&gt;Genres&lt;/code&gt;. Once this new variable is made, we draw again a barplot to see the ditribution of genres.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;db$withoutMainGenres = trimws(mapply(gsub, db$Genresl1, &amp;quot;&amp;quot;, db$Genres))

all_genres = separate_rows(db %&amp;gt;% 
                           group_by(withoutMainGenres) %&amp;gt;% 
                           select(withoutMainGenres) %&amp;gt;% 
                           filter(row_number() ==1),
                           withoutMainGenres, 
                           sep=&amp;quot;[[:space:]]&amp;quot;) %&amp;gt;% 
             rename( Genres=withoutMainGenres) %&amp;gt;%
             filter(nchar(Genres)&amp;gt;0)

name_order = names(sort(table(all_genres)))

ggplot(all_genres, aes(Genres)) +
                theme_minimal( ) + 
        geom_bar( stat = &amp;quot;count&amp;quot;, fill=&amp;quot;#007acc&amp;quot; ) +
        coord_flip() +
        scale_x_discrete(limits = name_order)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-12-04-visualizing-box-office-revenue-by-genre_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We see that there are still a lot of adventure movies. We use the result seen in the association rules and the barplot to make the subgenres.&lt;br&gt; We begin with the genre Animation because we want to regroup all of these movies in the same category. Then we add subgenres in an ascending order, from the less important to the most one.&lt;br&gt; However, movies from musical, music and horror genres are added at the end of the script because the attribution of these genres for the movie in our dataset is questionable.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;db$Genresl2=ifelse(grepl(&amp;quot;Animation&amp;quot;,db$withoutMainGenres), 
                   &amp;quot;Animation&amp;quot;, db$withoutMainGenres)
db$Genresl2=ifelse(grepl(&amp;quot;Documentary&amp;quot;,db$Genresl2),
                   &amp;quot;Documentary&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Biography&amp;quot;, db$Genresl2), 
                   &amp;quot;Biography&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Western&amp;quot;,db$Genresl2), 
                   &amp;quot;Western&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Sport&amp;quot;,db$Genresl2), 
                   &amp;quot;Sport&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;War&amp;quot;,db$Genresl2), 
                   &amp;quot;War&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Mystery&amp;quot;,db$Genresl2), 
                   &amp;quot;Mystery&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Romance&amp;quot;,db$Genresl2), 
                   &amp;quot;Romance&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Crime&amp;quot;,db$Genresl2), 
                   &amp;quot;Crime&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Drama&amp;quot;,db$Genresl2), 
                   &amp;quot;Drama&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Fantasy&amp;quot;,db$Genresl2), 
                   &amp;quot;Fantasy&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Sci-Fi&amp;quot;,db$Genresl2), 
                   &amp;quot;Sci-Fi&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Comedy&amp;quot;,db$Genresl2), 
                   &amp;quot;Comedy&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Thriller&amp;quot;,db$Genresl2), 
                   &amp;quot;Thriller&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Adventure&amp;quot;,db$Genresl2), 
                   &amp;quot;Adventure&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Musical&amp;quot;,db$Genresl2), 
                   &amp;quot;Musical&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Music&amp;quot;,db$Genresl2), 
                   &amp;quot;Music&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(grepl(&amp;quot;Horror&amp;quot;,db$Genresl2), 
                   &amp;quot;Horror&amp;quot;, db$Genresl2)
db$Genresl2=ifelse(db$Genresl2==&amp;quot;&amp;quot;,
                   db$Genresl1, db$Genresl2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have our 2 levels of genres. We can build our treemap!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;treemap-with-treemapify&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;TREEMAP WITH TREEMAPIFY&lt;/h1&gt;
&lt;p&gt;To design the treemap, we need to regroup movies by main genres and subgenres, then we sum their Worlwide Gross revenue.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;summary.Genre = db %&amp;gt;%
        group_by(Genresl1, Genresl2) %&amp;gt;%
        summarise(Sum_Gross = sum(Worldwide.Gross))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally we design the treemap using &lt;code&gt;ggplot2&lt;/code&gt; and &lt;code&gt;treemapify&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(treemapify)

ggplot(summary.Genre, aes(area = Sum_Gross ,
                          fill = Genresl1, label = Genresl2,
                          subgroup =Genresl1)) +
        geom_treemap() +
        geom_treemap_subgroup_border() +
        geom_treemap_subgroup_text(place = &amp;quot;centre&amp;quot;, 
                                   grow = T, 
                                   alpha = 0.5, 
                                   colour = &amp;quot;black&amp;quot;, 
                                   fontface = &amp;quot;italic&amp;quot;, 
                                   min.size = 0) +
        geom_treemap_text(colour = &amp;quot;white&amp;quot;, 
                          place = &amp;quot;topleft&amp;quot;, 
                          reflow = T)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-12-04-visualizing-box-office-revenue-by-genre_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;672&#34; /&gt; &lt;br&gt;&lt;/p&gt;
&lt;p&gt;Here we have a first result but we can do better by adding some interactivity.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;treemap-with-highcharter&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;TREEMAP WITH HIGHCHARTER&lt;/h1&gt;
&lt;p&gt;Let’s add some interactivity using the package &lt;code&gt;highcharter&lt;/code&gt;. We use the github version (there are more functions).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;devtools::install_github(&amp;quot;jbkunst/highcharter&amp;quot;)

library(highcharter)
hctreemap2(data = db,
           group_vars = c(&amp;quot;Genresl1&amp;quot;, &amp;quot;Genresl2&amp;quot;),
           size_var = &amp;quot;Worlwide.Gross&amp;quot;,
           color_var = &amp;quot;Genresl2&amp;quot;,
           layoutAlgorithm = &amp;quot;squarified&amp;quot;,
           levelIsConstant = FALSE,
           levels = list(
                   list(level = 1, 
                        dataLabels = list(enabled = TRUE)),
                   list(level = 2, 
                        dataLabels = list(enabled = FALSE))
           )) %&amp;gt;% 
        hc_tooltip(pointFormat = &amp;quot;&amp;lt;b&amp;gt;{point.name}&amp;lt;/b&amp;gt;:&amp;lt;br&amp;gt;
                   Worlwide Gross: $ {point.value:,.0f}&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following error message appears:&lt;br&gt; &lt;font color=&#34;red&#34;&gt;&lt;strong&gt;Error in hctreemap2(data = db, group_vars = c(“Genresl1”, “Genresl2”) :&lt;br&gt; Treemap data uses same label at multiple levels.&lt;/strong&gt; &lt;/font&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;We can’t design a 2 levels treemap with &lt;code&gt;highcharter&lt;/code&gt; because main genres and subgenres share some genres. Hence, R is a great tool for data manipulation but javascript is a better tool for visualization. &lt;br&gt;&lt;/p&gt;
&lt;p&gt;We can easily design a 2 levels responsive treemap with the library &lt;a href=&#34;https://www.highcharts.com/&#34;&gt;highchart&lt;/a&gt; in javascript.&lt;/p&gt;
&lt;div id=&#34;tmp2&#34; class=&#34;tmap&#34;&gt;

&lt;/div&gt;
&lt;script
  src=&#34;https://code.jquery.com/jquery-3.2.1.min.js&#34;
  integrity=&#34;sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4=&#34;
  crossorigin=&#34;anonymous&#34;&gt;&lt;/script&gt;
&lt;script
  src=&#34;https://code.jquery.com/ui/1.12.1/jquery-ui.min.js&#34;
  integrity=&#34;sha256-VazP97ZCwtekAsvgPBSUwPFKdrwD3unUfSGVYrahUqU=&#34;
  crossorigin=&#34;anonymous&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://code.highcharts.com/highcharts.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://code.highcharts.com/modules/treemap.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://cdn.rawgit.com/krosamont/Cinema/dd7eca65/treemap/js/cinemaTreemap.js&#34;&gt;&lt;/script&gt;
&lt;p&gt;&lt;link rel=&#34;stylesheet&#34; href=&#34;https://cdn.rawgit.com/krosamont/Cinema/dd7eca65/treemap/css/styleSheet.css&#34;&gt;&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Scraping data from the local elections</title>
      <link>/post/2017-10-27-scraping-data-from-the-local-elections/</link>
      <pubDate>Fri, 27 Oct 2017 06:34:55 +0200</pubDate>
      
      <guid>/post/2017-10-27-scraping-data-from-the-local-elections/</guid>
      <description>&lt;p&gt;One of my journalist friend was looking at the result of the local election in Luxembourg and he was dissatisfied because he was unable to compare the results of all the communes. In fact, he wanted to compare the number of women that were candidates in each commune. So I asked him to hold on and I came back one hour later with this script that enables him to collect results of all communes in one table.&lt;/p&gt;
&lt;p&gt;At the beginning, it was private code but I thought that it could be another great scraping example after the excellent post written by my colleague Bruno Rodrigues about scraping data from STATEC public tables.&lt;/p&gt;
&lt;p&gt;So let’s get started. First, let’s load some packages:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#we have to load the packages if they are not installed on your computer,
#begin with the commented following lines:
#install.packages( &amp;quot;rvest&amp;quot; )
#install.packages( &amp;quot;tidyverse&amp;quot; )
#install.packages( &amp;quot;stringr&amp;quot; )

library(rvest) #to scrap
library(dplyr) #to manipulate data
library(stringr) #to manipulate string&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We want to collect results of all communes in one data frame. We go to &lt;a href=&#34;http://www.elections.public.lu&#34; class=&#34;uri&#34;&gt;http://www.elections.public.lu&lt;/a&gt; and we collect data that is seen in the GIF bellow:&lt;/p&gt;
&lt;p&gt;After clicking on different communes, we notice that the URLs have the same format. They have 3 parts :&lt;br&gt; 1: “&lt;a href=&#34;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/&#34; class=&#34;uri&#34;&gt;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/&lt;/a&gt;”, the first part of the URL.&lt;br&gt; 2: “communes_names” the name of the city is the second part of the URL.&lt;br&gt; 3: “.html” is the last part.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;For example, the complete URL for the commune of Luxembourg will be: &lt;a href=&#34;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/luxembourg.html&#34; class=&#34;uri&#34;&gt;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/luxembourg.html&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;
&lt;p&gt;There are 103 communes, so we have to put all of them in a list. We scrape the 103 communes in one list via the script bellow :&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;url = &amp;quot;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/bech.html&amp;quot;
communes = read_html(url) %&amp;gt;% 
        html_nodes(&amp;quot;#communes #communes-az li&amp;quot;) %&amp;gt;%
        html_text() &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We verify that we have a list of 103 vectors and then we check the 5 first rows.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;length(communes)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 103&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(communes,5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;\r\n        Beaufort\r\n                a rendu l&amp;#39;ensemble de ses résultats\r\n    &amp;quot; 
## [2] &amp;quot;\r\n        Bech\r\n                a rendu l&amp;#39;ensemble de ses résultats\r\n    &amp;quot;     
## [3] &amp;quot;\r\n        Beckerich\r\n                a rendu l&amp;#39;ensemble de ses résultats\r\n    &amp;quot;
## [4] &amp;quot;\r\n        Berdorf\r\n                a rendu l&amp;#39;ensemble de ses résultats\r\n    &amp;quot;  
## [5] &amp;quot;\r\n        Bertrange\r\n                a rendu l&amp;#39;ensemble de ses résultats\r\n    &amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It seems that the 103 communes are present but we still have to clean the data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Data is not cleaned and there are some useless characters that need to be removed.
#We need to clean data.
communes = gsub(&amp;quot;a rendu l&amp;#39;ensemble de ses résultats&amp;quot;,&amp;quot; &amp;quot;, communes )
communes = trimws(gsub(&amp;quot;\r\n&amp;quot;,&amp;quot;&amp;quot;,communes ))
communes = gsub(&amp;quot;/Attert&amp;quot;,&amp;quot;-sur-attert&amp;quot;, communes )
communes = gsub(&amp;quot; - &amp;quot;, &amp;quot;-&amp;quot;,communes )
communes = gsub(&amp;quot; &amp;quot;, &amp;quot;-&amp;quot;,communes )
communes = gsub(&amp;quot;&amp;#39;&amp;quot;, &amp;quot;-&amp;quot;,communes )
communes = gsub(&amp;quot;é&amp;quot;, &amp;quot;e&amp;quot;,communes )
communes = gsub(&amp;quot;û&amp;quot;, &amp;quot;u&amp;quot;,communes )
communes = gsub(&amp;quot;ä&amp;quot;, &amp;quot;a&amp;quot;,communes )

#Lower case
communes = tolower(communes)

head(communes, 5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;beaufort&amp;quot;  &amp;quot;bech&amp;quot;      &amp;quot;beckerich&amp;quot; &amp;quot;berdorf&amp;quot;   &amp;quot;bertrange&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that we have the list of the 103 communes, we will write a function that will enable us to collect data that we want to display in our data frame.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Function to have the result for one commune.
  result = function(x){
  #scrapping data
  vote = read_html(paste(&amp;quot;http://www.elections.public.lu/fr/elections-communales/2017/resultats/communes/&amp;quot;,x,&amp;quot;.html&amp;quot;, sep=&amp;quot;&amp;quot;)) %&amp;gt;% 
          html_nodes(&amp;quot;#lux-number .lux-number ul li&amp;quot;) %&amp;gt;%
          html_text()%&amp;gt;%.[-1]
  
  #Conditions need to be added to have clean data.
  #Here we add a trick, the vector 14 and 15 are the only ones that   haven&amp;#39;t the string &amp;quot;\r\n&amp;quot;.
  #So we add &amp;quot;\r\n&amp;quot; to these vectors.
  if(nchar(vote[14]) &amp;gt; 21){
          vote[14] = gsub(&amp;quot;ble&amp;quot;,&amp;quot;ble\r\n&amp;quot;, vote[14],perl = FALSE)
  }
  if(nchar(vote[15]) &amp;gt; 21){
          vote[15] = gsub(&amp;quot;mé&amp;quot;,&amp;quot;mé\r\n&amp;quot;, vote[15],perl = FALSE)
  }
  #We split vectors to dissociate the results (numbers) and the titles   (letters).
  vote = unlist(str_split(vote, &amp;quot;\r\n&amp;quot;))
  vote = trimws(vote)
  vote = vote[vote != &amp;quot;&amp;quot;]
  
  #Here we have similar title so we change them to not be confused.
  #Candidat Lux means Luxemburgish Candidates  &amp;amp; Electeur Lux means   Luxemburgish voters. 
  vote[7] = gsub(&amp;quot;Lux&amp;quot;, &amp;quot;Candidat Lux&amp;quot;, vote[7] )
  vote[9] = gsub(&amp;quot;Non Lu&amp;quot;, &amp;quot;Candidat Non Lu&amp;quot;, vote[9] )
  vote[13] = gsub(&amp;quot;Lux&amp;quot;, &amp;quot;Electeur Lux&amp;quot;, vote[13] )
  vote[15] = gsub(&amp;quot;Non Lu&amp;quot;, &amp;quot;Electeur Non Lu&amp;quot;, vote[15]  )
  
  #We create the data frame.
  #Vector with pair indice value are the results (the column val).
  pair = (1:15)*2
  
  #Vectors with odd index value are the titles (the column title).
  impair = (1:15)*2 - 1
  res = data.frame(communes = rep(x,15), title = vote[impair], val =   vote[pair])
  return(res)
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we use the lapply() function to apply this function to the 103 communes and bind them in one data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#We use the result function on all the communes.
res = lapply(communes, result)
#We bind the rows to have a complete data frame with all the results from all communes
#then we bind all the result.
df = do.call(rbind, res)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To see the 5 first rows of our new data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Now the data that we have look like this:
head(df,5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   communes                       title val
## 1 beaufort                       Total  10
## 2 beaufort                      Femmes   5
## 3 beaufort                      Hommes   5
## 4 beaufort     Candidat Luxembourgeois  10
## 5 beaufort Candidat Non Luxembourgeois   0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, it’s looking good but we are not completely satisfied because we would like to transpose data to have one result in one column. To this end, we use the &lt;code&gt;tidyr&lt;/code&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(tidyr)

#Transposing data will enables us to make analysis faster.
#We transform val in a numeric variable then we transpose data.

tdf = df %&amp;gt;%
        mutate(val = as.numeric(gsub(&amp;quot; &amp;quot;, &amp;quot;&amp;quot;, val))) %&amp;gt;%
        spread(title, val)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To see the 5 first rows of our new data frame:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#Now the data that we have look like this:
head(tdf,5)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    communes Blancs Candidat Luxembourgeois Candidat Non Luxembourgeois
## 1  beaufort     78                      10                           0
## 2      bech      0                       8                           0
## 3 beckerich     67                      10                           1
## 4   berdorf     23                      20                           0
## 5 bertrange     75                      45                           7
##   Dans l&amp;#39;urne Electeur Luxembourgeois Electeur Non Luxembourgeois Femmes
## 1        1144                    1170                         114      5
## 2           0                     699                          95      1
## 3        1396                    1393                         137      2
## 4         857                     850                          79      7
## 5        2935                    2972                         478     22
##   Grand total exprimé Grand total possible Hommes Inscrits Nuls Total
## 1                4021                 9234      5     1284   40    10
## 2                   0                    0      7      794    0     8
## 3                5792                11637      9     1530   36    11
## 4                4929                 7371     13      929   15    20
## 5               32880                35620     30     3450  120    52
##   Valables Votes par correspondance
## 1     1026                       71
## 2        0                        0
## 3     1293                      109
## 4      819                       57
## 5     2740                      269&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now you can export the table in excel or play with your data in R!&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;readr::write_excel_csv(tdf,&amp;quot;election_lux.csv&amp;quot;)
#To know where your file is saved, we use the following function:
#setwd()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Barplot with ggplot2/plotly</title>
      <link>/post/2017-10-16-barplot-ggplotly/</link>
      <pubDate>Mon, 16 Oct 2017 00:00:00 +0000</pubDate>
      
      <guid>/post/2017-10-16-barplot-ggplotly/</guid>
      <description>&lt;script src=&#34;/rmarkdown-libs/htmlwidgets/htmlwidgets.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/plotly-binding/plotly.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/typedarray/typedarray.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;/rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/crosstalk/css/crosstalk.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/crosstalk/js/crosstalk.min.js&#34;&gt;&lt;/script&gt;
&lt;link href=&#34;/rmarkdown-libs/plotlyjs/plotly-htmlwidgets.css&#34; rel=&#34;stylesheet&#34; /&gt;
&lt;script src=&#34;/rmarkdown-libs/plotlyjs/plotly-latest.min.js&#34;&gt;&lt;/script&gt;


&lt;!--
words: 268/180

```css
pre code, pre, code {
white-space: pre !important;
overflow-x: scroll !important;
overflow-y: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
height:25vh !important;
}
p img{
width:100%; !important;
}
```


&lt;style type=&#34;text/css&#34;&gt;
pre code, pre, code {
white-space: pre !important;
overflow-x: scroll !important;
overflow-y: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
height:25vh !important;
}
p img{
width:100%; !important;
}
&lt;/style&gt;
--&gt;
&lt;p&gt;Hello everyones,&lt;/p&gt;
&lt;p&gt;I just finished my MOOC on Foundations of strategic business analitycs. It was interresting and at the end of this course, I had to present a graph that was suppose to be relevent for a business organization. Different datasets were availables: &lt;a href=&#34;http://www.stat.columbia.edu/~gelman/arm/examples/speed.dating/&#34;&gt;speed dating&lt;/a&gt;, &lt;a href=&#34;https://www.eea.europa.eu/data-and-maps/data/co2-cars-emission-8&#34;&gt;Co2 emissons&lt;/a&gt;, &lt;a href=&#34;https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset&#34;&gt;bike sharing&lt;/a&gt;, &lt;a href=&#34;https://www.lendingclub.com/info/download-data.action&#34;&gt;loans&lt;/a&gt;, &lt;a href=&#34;http://www.kdd.org/kdd-cup/view/kdd-cup-2009/Data&#34;&gt;telecom churn&lt;/a&gt;, &lt;a href=&#34;https://www.data.gouv.fr/en/datasets/prix-des-carburants-en-france/&#34;&gt;fuel prices&lt;/a&gt;, &lt;a href=&#34;http://www.ameli.fr/fileadmin/user_upload/documents/Medic_AM_mensuel_2016_-_2e_semestre_tous_regimes.zip&#34;&gt;medical expense refunds&lt;/a&gt; and more. I have chosen to work on the medical expense refunds. This dataset gives amount of refunded drugs, number of refunded drugs, drugs name and drugs category for each month from july to december 2016. There are 84 categories of drugs.&lt;/p&gt;
&lt;p&gt;As the french health insurance is a public institution, it may be more interesting to find a way to monitore data than finding a way to refund less drugs… Hence, it may not be readable to show the 84 categories, so I have decided to select just some of them.&lt;/p&gt;
&lt;p&gt;First of all, I wanted to make an analysis about the five drugs categories the most refunded per month. But quickly, I realized that I had to use a line chart instead of the barplot because the chart was not really explicit (see below).&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-10-16-barplot-ggplotly_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;1344&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I was not happy with my first result, so I have decided to make a new graph about the fifteen drugs categories the most refunded in the whole 2nd semester of 2016.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#We need to modify some of our previous table because we select 15 categories.
res_all2 = tous_presc %&amp;gt;%
        group_by(label) %&amp;gt;%
        summarise_each(funs(sum)) %&amp;gt;%
        filter(!is.na(label)) %&amp;gt;%
        arrange(desc(`Montant remboursé \n2016-07`)) %&amp;gt;%
        filter( row_number() %in% c(1:15) ) %&amp;gt;%
        as.data.frame()

top_med = res_all2$label

res_city2 = city_presc %&amp;gt;%
        group_by(label) %&amp;gt;%
        summarise_each(funs(sum)) %&amp;gt;%
        filter(!is.na(label) &amp;amp; label %in% top_med) %&amp;gt;%
        arrange(desc(`Montant remboursé \n2016-07`)) %&amp;gt;%
        as.data.frame() 
res_city2$`type of prescriber` = &amp;quot;private practitioner&amp;quot;

res_hop2 = hop_presc %&amp;gt;%
        group_by(label) %&amp;gt;%
        summarise_each(funs(sum)) %&amp;gt;%
        filter(!is.na(label) &amp;amp; label %in% top_med) %&amp;gt;%
        arrange(desc(`Montant remboursé \n2016-07`))  %&amp;gt;%
        as.data.frame() 
res_hop2$`type of prescriber` = &amp;quot;salaried practitioner&amp;quot;

df2 = rbind(res_city2, res_hop2)
df2$`type of prescriber` = toupper(df2$`type of prescriber`)
df2$`type of drugs` = df2$label
#translate in english
df2$`type of drugs` = gsub(&amp;quot;IMMUNOSUPPRESSEURS&amp;quot;,&amp;quot;IMMUNOSUPPRESSIVES&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;MEDICAMENTS DU DIABETE&amp;quot;,&amp;quot;DIABETES MEDICINES&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;ANTITHROMBOTIQUES&amp;quot;,&amp;quot;ANTITHROMBOTICS&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;ANTIVIRAUX A USAGE SYSTEMIQUE&amp;quot;,&amp;quot;ANTIVIRALS FOR SYSTEMIC USE&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;ANTINEOPLASIQUES&amp;quot;,&amp;quot;ANTINEOPLASTICS&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;AGENTS MODIFIANT LES LIPIDES&amp;quot;,&amp;quot;LIPID MODIFYING AGENT&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;ANTIBACTERIENS A USAGE SYSTEMIQUE&amp;quot;,&amp;quot;SYSTEMIC ANTIBACTERIAL&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;IMMUNOSTIMULANTS&amp;quot;,&amp;quot;IMMUNOSTIMULANTS&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;MEDICAMENTS AGISSANT SUR LE SYSTEME RENINE-ANGIOTENSINE&amp;quot;,&amp;quot;DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;MEDICAMENTS OPHTALMOLOGIQUES&amp;quot;,&amp;quot;OPHTHALMIC DRUGS&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;MEDICAMENTS POUR LES SYNDROMES OBSTRUCTIFS DES VOIES AERIENNES&amp;quot;,&amp;quot;DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;MEDICAMENTS POUR LES TROUBLES DE L&amp;#39;ACIDITE&amp;quot;,&amp;quot;DRUGS AGAINST ACIDITY TROUBLE&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;PSYCHOLEPTIQUES&amp;quot;,&amp;quot;PSYCHOLEPTICS&amp;quot;, df2$`type of drugs`)
df2$`type of drugs` = gsub(&amp;quot;THERAPEUTIQUE ENDOCRINE&amp;quot;,&amp;quot;ENDOCRINE THERAPY&amp;quot;, df2$`type of drugs`)

colnames(df2) = c(&amp;quot;label&amp;quot;, &amp;quot;JULY&amp;quot;, &amp;quot;AUGUST&amp;quot;, &amp;quot;SEPTEMBER&amp;quot;, &amp;quot;OCTOBER&amp;quot;, &amp;quot;NOVEMBER&amp;quot;, &amp;quot;DECEMBER&amp;quot;, &amp;quot;PRESCRIBERS&amp;quot;, &amp;quot;DRUGS&amp;quot; )
dfdata2 = melt( df2[,-1], id.vars=c(&amp;quot;DRUGS&amp;quot;, &amp;quot;PRESCRIBERS&amp;quot;)) %&amp;gt;%
        rename(montant=value, date=variable) %&amp;gt;%
        arrange(date, DRUGS, PRESCRIBERS) %&amp;gt;% 
        group_by(DRUGS, PRESCRIBERS) %&amp;gt;%
        summarise(refund=sum(montant)) %&amp;gt;%
        as.data.frame() 

dfdata2$DRUGS = reorder(dfdata2$DRUGS, desc(dfdata2$refund))
#t=The total amount of refunded drugs
global_amout = sum(t(
        tous_presc %&amp;gt;%
                group_by(label) %&amp;gt;%
                filter(is.na(label)) %&amp;gt;%
                .[13,-7]))

#the percentage of the total refunded drugs that represents each category
dfdata2 = dfdata2 %&amp;gt;%
        group_by(DRUGS) %&amp;gt;%
        mutate( total = sum(refund),
                perct = paste(round(100*sum(refund)/global_amout,2),&amp;quot;%&amp;quot;, sep=&amp;quot;&amp;quot;),
                perct = ifelse(PRESCRIBERS==&amp;quot;SALARIED PRACTITIONER&amp;quot;, &amp;quot; &amp;quot;, perct )) %&amp;gt;%
        as.data.frame()



q = ggplot(dfdata2, aes(x=DRUGS, y=refund, group=PRESCRIBERS, fill=DRUGS, alpha=PRESCRIBERS))+
        geom_bar(stat=&amp;quot;identity&amp;quot;,position=&amp;quot;stack&amp;quot;,color=&amp;quot;black&amp;quot;)+ 
        ggtitle(&amp;quot;Top 15 of refunded drugs categories for the 2nd semester of 2016&amp;quot;)+
        scale_alpha_manual(values=c(0.2,0.75))+
        geom_text(aes(label=perct, y=total+2),alpha=1, color=&amp;quot;black&amp;quot;, position=position_dodge(width=0.2), vjust=-0.6, size=4) + 
        scale_y_continuous(labels = function(x) paste0(formatC(x/1000000, format=&amp;quot;d&amp;quot;, digits=0, big.mark = &amp;quot;,&amp;quot;), &amp;quot; €&amp;quot;))+
        labs(x=&amp;quot; &amp;quot;, y=&amp;quot;refunded amount (in million €)&amp;quot;) + 
        annotate(&amp;quot;text&amp;quot;, x=4.25, y=821000000, label= &amp;quot;(Percentage of total refunded amount)&amp;quot;, size=4.5) +
        annotate(&amp;quot;text&amp;quot;, x=11.3, y=890000000, label= &amp;quot;Total Amount of refunded drugs: 9,384,395,518 €&amp;quot;, size=6) + 
        theme_minimal(base_size = 15)+
        theme(  
                panel.grid.major.x = element_blank(),
                panel.grid.minor.x = element_blank(),
                legend.text = element_text(size = 10),
                plot.title = element_text(size=23,face=&amp;quot;bold&amp;quot;, hjust=0.5),
                axis.text.x = element_blank(),
                axis.ticks.x = element_blank(),
                axis.title.x = element_text(size=12, face=&amp;quot;bold&amp;quot;),
                axis.title.y = element_text(size=14,face=&amp;quot;bold&amp;quot;),
                strip.text.x = element_text(face=&amp;quot;italic&amp;quot;, size=11))


print(q)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;/post/2017-10-16-barplot-ggplotly_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;1344&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Great, now we can add an interactive touch with the &lt;code&gt;library(plotly)&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#We apply this library to add some interactivity in the previous graph
library(plotly)

#We add some new variables to add to the tooltip
dfdata2 = dfdata2 %&amp;gt;%
        group_by(DRUGS) %&amp;gt;%
        mutate( total = sum(refund),
                perct = paste(round(100*refund/global_amout,2),&amp;quot;%&amp;quot;, sep=&amp;quot;&amp;quot;),
                perct_in_cat = paste(round(100*refund/sum(refund),2),&amp;quot;%&amp;quot;, sep=&amp;quot;&amp;quot;),
                perct_total_cat =  paste(round(100*sum(refund)/global_amout,2),&amp;quot;%&amp;quot;, sep=&amp;quot;&amp;quot;) ) %&amp;gt;%
        as.data.frame()

q = ggplot(dfdata2, aes(x=DRUGS, y=refund, group=PRESCRIBERS, fill=DRUGS,
                         alpha=PRESCRIBERS,
                         #here we custom the tooltip
                         text = paste(&amp;quot;&amp;lt;b&amp;gt;type of drugs:&amp;lt;/b&amp;gt; &amp;quot;, tolower(DRUGS),&amp;quot;&amp;lt;/br&amp;gt;&amp;quot;,

                                                                                                     &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;type of prescribers:&amp;lt;/b&amp;gt; &amp;quot;, tolower(PRESCRIBERS),
                                                                                                           &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;refunded amount:&amp;lt;/b&amp;gt; &amp;quot;, paste0(formatC(refund, format=&amp;quot;d&amp;quot;, digits=0, big.mark = &amp;quot;,&amp;quot;), &amp;quot; €&amp;quot;),
                                                                                                           &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;total refunded amount:&amp;lt;/b&amp;gt; &amp;quot;, paste0(formatC(total, format=&amp;quot;d&amp;quot;, digits=0, big.mark = &amp;quot;,&amp;quot;), &amp;quot; €&amp;quot;),
                                                                                                           &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;percentage of total refunded amount for the prescriber:&amp;lt;/b&amp;gt; &amp;quot;, perct,
                                                                                                           &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;percentage of total refunded amount for the category:&amp;lt;/b&amp;gt; &amp;quot;, perct_total_cat, 
                                                                                                           &amp;quot;&amp;lt;/br&amp;gt;&amp;lt;b&amp;gt;percentage of refunded amount in this category:&amp;lt;/b&amp;gt; &amp;quot;, perct_in_cat )
                         
))+
        geom_bar(stat=&amp;quot;identity&amp;quot;,position=&amp;quot;stack&amp;quot;, colour=&amp;quot;black&amp;quot;, size=0.2)+ 
        scale_alpha_manual(values=c(0.2,0.75))+
        scale_y_continuous(labels = function(x) paste0(formatC(x/1000000, format=&amp;quot;d&amp;quot;, digits=0, big.mark = &amp;quot;,&amp;quot;), &amp;quot; €&amp;quot;))+
        labs(x=&amp;quot; &amp;quot;, y=&amp;quot;refunded amount (in million €)&amp;quot;) + 
        annotate(&amp;quot;text&amp;quot;, x= 8, y=930000000, label= &amp;quot;Top 15 of refunded drugs categories for the 2nd semester of 2016&amp;quot;, size=5, face=&amp;quot;bold&amp;quot;) + 
        annotate(&amp;quot;text&amp;quot;, x=8, y=890000000, label= &amp;quot;Total Amount of refunded drugs: 9,384,395,518 €&amp;quot;, size=4) + 
        theme_minimal(base_size = 15)+
        theme(  
                panel.grid.major.x = element_blank(),
                panel.grid.minor.x = element_blank(),
                legend.text = element_text(size = 10),
                #we remove the legend.
                legend.position = &amp;quot;none&amp;quot;,
                plot.title = element_text(size=12,face=&amp;quot;bold&amp;quot;, hjust=0.1),
                axis.text.x = element_blank(),
                axis.ticks.x = element_blank(),
                axis.title.x = element_text(size=12),
                axis.title.y = element_text(size=14),
                strip.text.x = element_text(face=&amp;quot;italic&amp;quot;, size=11))


ggplotly(q, tooltip = c(&amp;quot;text&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;div id=&#34;111fd3d56bc3&#34; style=&#34;width:100%;height:480px;&#34; class=&#34;plotly html-widget&#34;&gt;&lt;/div&gt;
&lt;script type=&#34;application/json&#34; data-for=&#34;111fd3d56bc3&#34;&gt;{&#34;x&#34;:{&#34;data&#34;:[{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:507172733.46,&#34;x&#34;:[1],&#34;y&#34;:[301150544.0365],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  immunosuppressives &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  301,150,544 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  808,323,277 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  3.21% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  8.61% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  37.26%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(248,118,109,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,IMMUNOSUPPRESSIVES)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,IMMUNOSUPPRESSIVES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:85415486.642,&#34;x&#34;:[2],&#34;y&#34;:[569785475.295],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  diabetes medicines &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  569,785,475 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  655,200,961 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  6.07% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  6.98% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  86.96%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(229,135,0,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,DIABETES MEDICINES)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,DIABETES MEDICINES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:448083837.821,&#34;x&#34;:[3],&#34;y&#34;:[138147767.184],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antivirals for systemic use &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  138,147,767 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  586,231,605 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.47% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  6.25% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  23.57%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(201,152,0,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,ANTIVIRALS FOR SYSTEMIC USE)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,ANTIVIRALS FOR SYSTEMIC USE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:114551640.954,&#34;x&#34;:[4],&#34;y&#34;:[421671876.858],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antithrombotics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  421,671,876 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  536,223,517 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  4.49% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.71% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  78.64%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(163,165,0,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,ANTITHROMBOTICS)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,ANTITHROMBOTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:406105072.4015,&#34;x&#34;:[5],&#34;y&#34;:[101128764.1995],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antineoplastics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  101,128,764 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  507,233,836 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.08% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.41% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  19.94%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(107,177,0,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,ANTINEOPLASTICS)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,ANTINEOPLASTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:87842827.247,&#34;x&#34;:[6],&#34;y&#34;:[418060300.1295],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs against obstructive pulmonary disease &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  418,060,300 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  505,903,127 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  4.45% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.39% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  82.64%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,186,56,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:117019516.503,&#34;x&#34;:[7],&#34;y&#34;:[340881230.08],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  ophthalmic drugs &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  340,881,230 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  457,900,746 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  3.63% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.88% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  74.44%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,191,125,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,OPHTHALMIC DRUGS)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,OPHTHALMIC DRUGS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:27557931.5335,&#34;x&#34;:[8],&#34;y&#34;:[384844525.495],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  lipid modifying agent &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  384,844,525 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  412,402,457 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  4.1% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.39% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  93.32%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,192,175,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,LIPID MODIFYING AGENT)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,LIPID MODIFYING AGENT)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:54821807.5275,&#34;x&#34;:[9],&#34;y&#34;:[353356216.3365],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  analgesiques &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  353,356,216 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  408,178,023 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  3.77% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.35% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  86.57%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,188,216,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,ANALGESIQUES)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,ANALGESIQUES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:20801887.606,&#34;x&#34;:[10],&#34;y&#34;:[326505248.53],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs affect the renin-angiotensin system &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  326,505,248 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  347,307,136 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  3.48% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.7% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  94.01%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,176,246,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:142980237.259,&#34;x&#34;:[11],&#34;y&#34;:[168576939.7785],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  endocrine therapy &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  168,576,939 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  311,557,177 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.8% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.32% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  54.11%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(97,156,255,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,ENDOCRINE THERAPY)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,ENDOCRINE THERAPY)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:139312170.0425,&#34;x&#34;:[12],&#34;y&#34;:[148198825.947],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  psycholeptics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  148,198,825 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  287,510,995 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.58% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.06% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  51.55%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(185,131,255,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,PSYCHOLEPTICS)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,PSYCHOLEPTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:54468175.8535,&#34;x&#34;:[13],&#34;y&#34;:[210355427.535],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  systemic antibacterial &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  210,355,427 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  264,823,603 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  2.24% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.82% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  79.43%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(231,107,243,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,SYSTEMIC ANTIBACTERIAL)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,SYSTEMIC ANTIBACTERIAL)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:165054852.0425,&#34;x&#34;:[14],&#34;y&#34;:[77049785.86],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  immunostimulants &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  77,049,785 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  242,104,637 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.82% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.58% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  31.82%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(253,97,209,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,IMMUNOSTIMULANTS)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,IMMUNOSTIMULANTS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:24503942.332,&#34;x&#34;:[15],&#34;y&#34;:[200514932.377],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs against acidity trouble &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  private practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  200,514,932 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  225,018,874 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  2.14% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.4% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  89.11%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(255,103,164,0.2)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AGAINST ACIDITY TROUBLE)&#34;,&#34;legendgroup&#34;:&#34;(PRIVATE PRACTITIONER,DRUGS AGAINST ACIDITY TROUBLE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[1],&#34;y&#34;:[507172733.46],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  immunosuppressives &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  507,172,733 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  808,323,277 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  5.4% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  8.61% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  62.74%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(248,118,109,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,IMMUNOSUPPRESSIVES)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,IMMUNOSUPPRESSIVES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[2],&#34;y&#34;:[85415486.642],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  diabetes medicines &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  85,415,486 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  655,200,961 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.91% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  6.98% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  13.04%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(229,135,0,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,DIABETES MEDICINES)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,DIABETES MEDICINES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[3],&#34;y&#34;:[448083837.821],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antivirals for systemic use &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  448,083,837 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  586,231,605 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  4.77% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  6.25% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  76.43%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(201,152,0,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,ANTIVIRALS FOR SYSTEMIC USE)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,ANTIVIRALS FOR SYSTEMIC USE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[4],&#34;y&#34;:[114551640.954],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antithrombotics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  114,551,640 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  536,223,517 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.22% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.71% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  21.36%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(163,165,0,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,ANTITHROMBOTICS)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,ANTITHROMBOTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[5],&#34;y&#34;:[406105072.4015],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  antineoplastics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  406,105,072 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  507,233,836 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  4.33% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.41% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  80.06%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(107,177,0,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,ANTINEOPLASTICS)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,ANTINEOPLASTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[6],&#34;y&#34;:[87842827.247],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs against obstructive pulmonary disease &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  87,842,827 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  505,903,127 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.94% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  5.39% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  17.36%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,186,56,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.9,&#34;base&#34;:0,&#34;x&#34;:[7],&#34;y&#34;:[117019516.503],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  ophthalmic drugs &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  117,019,516 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  457,900,746 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.25% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.88% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  25.56%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,191,125,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,OPHTHALMIC DRUGS)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,OPHTHALMIC DRUGS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[8],&#34;y&#34;:[27557931.5335],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  lipid modifying agent &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  27,557,931 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  412,402,457 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.29% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.39% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  6.68%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,192,175,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,LIPID MODIFYING AGENT)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,LIPID MODIFYING AGENT)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[9],&#34;y&#34;:[54821807.5275],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  analgesiques &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  54,821,807 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  408,178,023 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.58% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  4.35% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  13.43%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,188,216,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,ANALGESIQUES)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,ANALGESIQUES)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[10],&#34;y&#34;:[20801887.606],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs affect the renin-angiotensin system &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  20,801,887 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  347,307,136 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.22% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.7% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  5.99%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(0,176,246,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[11],&#34;y&#34;:[142980237.259],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  endocrine therapy &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  142,980,237 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  311,557,177 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.52% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.32% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  45.89%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(97,156,255,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,ENDOCRINE THERAPY)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,ENDOCRINE THERAPY)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[12],&#34;y&#34;:[139312170.0425],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  psycholeptics &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  139,312,170 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  287,510,995 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.48% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  3.06% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  48.45%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(185,131,255,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,PSYCHOLEPTICS)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,PSYCHOLEPTICS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[13],&#34;y&#34;:[54468175.8535],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  systemic antibacterial &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  54,468,175 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  264,823,603 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.58% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.82% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  20.57%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(231,107,243,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,SYSTEMIC ANTIBACTERIAL)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,SYSTEMIC ANTIBACTERIAL)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[14],&#34;y&#34;:[165054852.0425],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  immunostimulants &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  165,054,852 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  242,104,637 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  1.76% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.58% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  68.18%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(253,97,209,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,IMMUNOSTIMULANTS)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,IMMUNOSTIMULANTS)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;orientation&#34;:&#34;v&#34;,&#34;width&#34;:0.899999999999999,&#34;base&#34;:0,&#34;x&#34;:[15],&#34;y&#34;:[24503942.332],&#34;text&#34;:&#34;&lt;b&gt;type of drugs:&lt;\/b&gt;  drugs against acidity trouble &lt;\/br&gt; &lt;\/br&gt;&lt;b&gt;type of prescribers:&lt;\/b&gt;  salaried practitioner &lt;\/br&gt;&lt;b&gt;refunded amount:&lt;\/b&gt;  24,503,942 € &lt;\/br&gt;&lt;b&gt;total refunded amount:&lt;\/b&gt;  225,018,874 € &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the prescriber:&lt;\/b&gt;  0.26% &lt;\/br&gt;&lt;b&gt;percentage of total refunded amount for the category:&lt;\/b&gt;  2.4% &lt;\/br&gt;&lt;b&gt;percentage of refunded amount in this category:&lt;\/b&gt;  10.89%&#34;,&#34;type&#34;:&#34;bar&#34;,&#34;marker&#34;:{&#34;autocolorscale&#34;:false,&#34;color&#34;:&#34;rgba(255,103,164,0.75)&#34;,&#34;line&#34;:{&#34;width&#34;:0.755905511811024,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;}},&#34;name&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AGAINST ACIDITY TROUBLE)&#34;,&#34;legendgroup&#34;:&#34;(SALARIED PRACTITIONER,DRUGS AGAINST ACIDITY TROUBLE)&#34;,&#34;showlegend&#34;:true,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[8],&#34;y&#34;:[930000000],&#34;text&#34;:&#34;Top 15 of refunded drugs categories for the 2nd semester of 2016&#34;,&#34;hovertext&#34;:&#34;&#34;,&#34;textfont&#34;:{&#34;size&#34;:18.8976377952756,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;},&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;text&#34;,&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null},{&#34;x&#34;:[8],&#34;y&#34;:[890000000],&#34;text&#34;:&#34;Total Amount of refunded drugs: 9,384,395,518 €&#34;,&#34;hovertext&#34;:&#34;&#34;,&#34;textfont&#34;:{&#34;size&#34;:15.1181102362205,&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;},&#34;type&#34;:&#34;scatter&#34;,&#34;mode&#34;:&#34;text&#34;,&#34;hoveron&#34;:&#34;points&#34;,&#34;showlegend&#34;:false,&#34;xaxis&#34;:&#34;x&#34;,&#34;yaxis&#34;:&#34;y&#34;,&#34;hoverinfo&#34;:&#34;text&#34;,&#34;frame&#34;:null}],&#34;layout&#34;:{&#34;margin&#34;:{&#34;t&#34;:30.9439601494396,&#34;r&#34;:9.9626400996264,&#34;b&#34;:35.865504358655,&#34;l&#34;:73.3914487339145},&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:19.9252801992528},&#34;xaxis&#34;:{&#34;domain&#34;:[0,1],&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;tickmode&#34;:&#34;array&#34;,&#34;range&#34;:[0.4,15.6],&#34;ticktext&#34;:[&#34;IMMUNOSUPPRESSIVES&#34;,&#34;DIABETES MEDICINES&#34;,&#34;ANTIVIRALS FOR SYSTEMIC USE&#34;,&#34;ANTITHROMBOTICS&#34;,&#34;ANTINEOPLASTICS&#34;,&#34;DRUGS AGAINST OBSTRUCTIVE PULMONARY DISEASE&#34;,&#34;OPHTHALMIC DRUGS&#34;,&#34;LIPID MODIFYING AGENT&#34;,&#34;ANALGESIQUES&#34;,&#34;DRUGS AFFECT THE RENIN-ANGIOTENSIN SYSTEM&#34;,&#34;ENDOCRINE THERAPY&#34;,&#34;PSYCHOLEPTICS&#34;,&#34;SYSTEMIC ANTIBACTERIAL&#34;,&#34;IMMUNOSTIMULANTS&#34;,&#34;DRUGS AGAINST ACIDITY TROUBLE&#34;],&#34;tickvals&#34;:[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:4.9813200498132,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:false,&#34;tickfont&#34;:{&#34;color&#34;:null,&#34;family&#34;:null,&#34;size&#34;:0},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:false,&#34;gridcolor&#34;:null,&#34;gridwidth&#34;:0,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;y&#34;,&#34;title&#34;:&#34; &#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:15.9402241594022},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;yaxis&#34;:{&#34;domain&#34;:[0,1],&#34;type&#34;:&#34;linear&#34;,&#34;autorange&#34;:false,&#34;tickmode&#34;:&#34;array&#34;,&#34;range&#34;:[-46500000,976500000],&#34;ticktext&#34;:[&#34;0 €&#34;,&#34;250 €&#34;,&#34;500 €&#34;,&#34;750 €&#34;],&#34;tickvals&#34;:[0,250000000,500000000,750000000],&#34;ticks&#34;:&#34;&#34;,&#34;tickcolor&#34;:null,&#34;ticklen&#34;:4.9813200498132,&#34;tickwidth&#34;:0,&#34;showticklabels&#34;:true,&#34;tickfont&#34;:{&#34;color&#34;:&#34;rgba(77,77,77,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:15.9402241594022},&#34;tickangle&#34;:-0,&#34;showline&#34;:false,&#34;linecolor&#34;:null,&#34;linewidth&#34;:0,&#34;showgrid&#34;:true,&#34;gridcolor&#34;:&#34;rgba(235,235,235,1)&#34;,&#34;gridwidth&#34;:0.66417600664176,&#34;zeroline&#34;:false,&#34;anchor&#34;:&#34;x&#34;,&#34;title&#34;:&#34;refunded amount (in million €)&#34;,&#34;titlefont&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:18.5969281859693},&#34;hoverformat&#34;:&#34;.2f&#34;},&#34;shapes&#34;:[{&#34;type&#34;:&#34;rect&#34;,&#34;fillcolor&#34;:null,&#34;line&#34;:{&#34;color&#34;:null,&#34;width&#34;:0,&#34;linetype&#34;:[]},&#34;yref&#34;:&#34;paper&#34;,&#34;xref&#34;:&#34;paper&#34;,&#34;x0&#34;:0,&#34;x1&#34;:1,&#34;y0&#34;:0,&#34;y1&#34;:1}],&#34;showlegend&#34;:false,&#34;legend&#34;:{&#34;bgcolor&#34;:null,&#34;bordercolor&#34;:null,&#34;borderwidth&#34;:0,&#34;font&#34;:{&#34;color&#34;:&#34;rgba(0,0,0,1)&#34;,&#34;family&#34;:&#34;&#34;,&#34;size&#34;:13.2835201328352}},&#34;barmode&#34;:&#34;stack&#34;,&#34;hovermode&#34;:&#34;closest&#34;},&#34;source&#34;:&#34;A&#34;,&#34;attrs&#34;:{&#34;111fd6695e1a6&#34;:{&#34;x&#34;:{},&#34;y&#34;:{},&#34;fill&#34;:{},&#34;alpha&#34;:{},&#34;text&#34;:{},&#34;type&#34;:&#34;ggplotly&#34;},&#34;111fd7a118de7&#34;:{&#34;x&#34;:{},&#34;y&#34;:{}},&#34;111fd167b774d&#34;:{&#34;x&#34;:{},&#34;y&#34;:{}}},&#34;cur_data&#34;:&#34;111fd6695e1a6&#34;,&#34;visdat&#34;:{&#34;111fd6695e1a6&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;111fd7a118de7&#34;:[&#34;function (y) &#34;,&#34;x&#34;],&#34;111fd167b774d&#34;:[&#34;function (y) &#34;,&#34;x&#34;]},&#34;config&#34;:{&#34;modeBarButtonsToAdd&#34;:[{&#34;name&#34;:&#34;Collaborate&#34;,&#34;icon&#34;:{&#34;width&#34;:1000,&#34;ascent&#34;:500,&#34;descent&#34;:-50,&#34;path&#34;:&#34;M487 375c7-10 9-23 5-36l-79-259c-3-12-11-23-22-31-11-8-22-12-35-12l-263 0c-15 0-29 5-43 15-13 10-23 23-28 37-5 13-5 25-1 37 0 0 0 3 1 7 1 5 1 8 1 11 0 2 0 4-1 6 0 3-1 5-1 6 1 2 2 4 3 6 1 2 2 4 4 6 2 3 4 5 5 7 5 7 9 16 13 26 4 10 7 19 9 26 0 2 0 5 0 9-1 4-1 6 0 8 0 2 2 5 4 8 3 3 5 5 5 7 4 6 8 15 12 26 4 11 7 19 7 26 1 1 0 4 0 9-1 4-1 7 0 8 1 2 3 5 6 8 4 4 6 6 6 7 4 5 8 13 13 24 4 11 7 20 7 28 1 1 0 4 0 7-1 3-1 6-1 7 0 2 1 4 3 6 1 1 3 4 5 6 2 3 3 5 5 6 1 2 3 5 4 9 2 3 3 7 5 10 1 3 2 6 4 10 2 4 4 7 6 9 2 3 4 5 7 7 3 2 7 3 11 3 3 0 8 0 13-1l0-1c7 2 12 2 14 2l218 0c14 0 25-5 32-16 8-10 10-23 6-37l-79-259c-7-22-13-37-20-43-7-7-19-10-37-10l-248 0c-5 0-9-2-11-5-2-3-2-7 0-12 4-13 18-20 41-20l264 0c5 0 10 2 16 5 5 3 8 6 10 11l85 282c2 5 2 10 2 17 7-3 13-7 17-13z m-304 0c-1-3-1-5 0-7 1-1 3-2 6-2l174 0c2 0 4 1 7 2 2 2 4 4 5 7l6 18c0 3 0 5-1 7-1 1-3 2-6 2l-173 0c-3 0-5-1-8-2-2-2-4-4-4-7z m-24-73c-1-3-1-5 0-7 2-2 3-2 6-2l174 0c2 0 5 0 7 2 3 2 4 4 5 7l6 18c1 2 0 5-1 6-1 2-3 3-5 3l-174 0c-3 0-5-1-7-3-3-1-4-4-5-6z&#34;},&#34;click&#34;:&#34;function(gd) { \n        // is this being viewed in RStudio?\n        if (location.search == &#39;?viewer_pane=1&#39;) {\n          alert(&#39;To learn about plotly for collaboration, visit:\\n https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;);\n        } else {\n          window.open(&#39;https://cpsievert.github.io/plotly_book/plot-ly-for-collaboration.html&#39;, &#39;_blank&#39;);\n        }\n      }&#34;}],&#34;cloud&#34;:false},&#34;highlight&#34;:{&#34;on&#34;:&#34;plotly_click&#34;,&#34;persistent&#34;:false,&#34;dynamic&#34;:false,&#34;selectize&#34;:false,&#34;opacityDim&#34;:0.2,&#34;selected&#34;:{&#34;opacity&#34;:1}},&#34;base_url&#34;:&#34;https://plot.ly&#34;},&#34;evals&#34;:[&#34;config.modeBarButtonsToAdd.0.click&#34;],&#34;jsHooks&#34;:{&#34;render&#34;:[{&#34;code&#34;:&#34;function(el, x) { var ctConfig = crosstalk.var(&#39;plotlyCrosstalkOpts&#39;).set({\&#34;on\&#34;:\&#34;plotly_click\&#34;,\&#34;persistent\&#34;:false,\&#34;dynamic\&#34;:false,\&#34;selectize\&#34;:false,\&#34;opacityDim\&#34;:0.2,\&#34;selected\&#34;:{\&#34;opacity\&#34;:1}}); }&#34;,&#34;data&#34;:null}]}}&lt;/script&gt; And now it’s done! I hope you enjoy this post.&lt;br&gt;&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Scraping data from STATEC&#39;s public tables</title>
      <link>/post/2017-08-21-scraping-data-from-statec-s-public-tables/</link>
      <pubDate>Fri, 21 Apr 2017 06:34:55 +0200</pubDate>
      
      <guid>/post/2017-08-21-scraping-data-from-statec-s-public-tables/</guid>
      <description>&lt;p&gt;A lot of open data is available in Luxembourg’s &lt;a href=&#34;https://data.public.lu/en/&#34;&gt;open data portal&lt;/a&gt;, but sometimes, it is not very easy to download. In the video below, I give you an example of such data and show how you can use &lt;code&gt;rvest&lt;/code&gt; to get the data easily.&lt;/p&gt;
&lt;p&gt;After watching the video, take a look at the code below. This code does two things; first it scrapes the data, and then it puts the data in a tidy format fur further processing.&lt;/p&gt;
&lt;iframe width=&#34;100%&#34; height=&#34;100%&#34; src=&#34;https://youtube.com/embed/902cgrdxZUc&#34; frameborder=&#34;0&#34; allowfullscreen style=&#34;max-width:100%; height:55vh;&#34;&gt;
&lt;/iframe&gt;
&lt;p&gt;So to summarize the idea of the video; instead of clicking the buttons to download each year’s data (which you would have to do 15 times), it is easier to simple turn off javascript and then scrape the html version of the table. It would be possible, albeit with much more effort, to scrape the tables with javascript enabled, by using a tool such as &lt;a href=&#34;http://phantomjs.org/&#34;&gt;phantomjs&lt;/a&gt;. But since we have the possibility to view the table in html, why not take advantage of it?&lt;/p&gt;
&lt;p&gt;To scrape the data, you will need first to install the &lt;code&gt;rvest&lt;/code&gt; and then load it (and let’s also load the other needed packages)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rvest)
library(dplyr)
library(tidyr)
library(purrr)
library(janitor)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, using &lt;code&gt;rvest::read_html()&lt;/code&gt;, we can download the whole html page:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;page_unemp &amp;lt;- read_html(&amp;quot;http://www.statistiques.public.lu/stat/TableViewer/tableViewHTML.aspx?ReportId=12950&amp;amp;IF_Language=eng&amp;amp;MainTheme=2&amp;amp;FldrName=3&amp;amp;RFPath=91&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, we need to extract the table from the html page, and we do this by using &lt;code&gt;rvest::html_nodes()&lt;/code&gt; and by providing this function with the name of the class of the object we’re interested in, namely, the table.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;page_unemp %&amp;gt;%
  html_nodes(&amp;quot;.b2020-datatable&amp;quot;) %&amp;gt;% .[[1]] %&amp;gt;% html_table(fill = TRUE) -&amp;gt; data_raw


head(data_raw)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                          X1                         X2      X3      X4
## 1                      Year                       Year    2001    2002
## 2             Specification                       Year    2001    2002
## 3 Grand Duchy of Luxembourg  Total employed population 180,084 182,004
## 4 Grand Duchy of Luxembourg     of which: Wage-earners 162,407 164,277
## 5 Grand Duchy of Luxembourg of which: Non-wage-earners  17,677  17,727
## 6 Grand Duchy of Luxembourg                 Unemployed   5,393   6,773
##        X5      X6      X7      X8      X9     X10     X11     X12     X13
## 1    2003    2004    2005    2006    2007    2008    2009    2010    2011
## 2    2003    2004    2005    2006    2007    2008    2009    2010    2011
## 3 183,419 186,325 187,380 192,095 197,486 202,203 204,127 207,923 214,094
## 4 165,509 168,214 169,194 174,045 179,176 183,705 185,369 188,983 194,893
## 5  17,910  18,111  18,186  18,050  18,310  18,498  18,758  18,940  19,201
## 6   8,359   9,426  10,653  10,297   9,670  11,496  14,816  15,567  16,159
##       X14     X15     X16     X17      X18
## 1    2012    2013    2014    2015     2016
## 2    2012    2013    2014    2015 Measures
## 3 219,168 223,407 228,423 233,130  236,100
## 4 199,741 203,535 208,238 212,530  215,430
## 5  19,427  19,872  20,185  20,600   20,670
## 6  16,963  19,287  19,362  18,806   18,185&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, we got the data in quite a nice format, but it still needs to be cleaned a bit. Let’s do this.&lt;/p&gt;
&lt;p&gt;First, let’s use the first row as the header of the data set and then remove it:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;colnames(data_raw) &amp;lt;- data_raw[2, ]
colnames(data_raw)[1:2] &amp;lt;- c(&amp;quot;division&amp;quot;, &amp;quot;variable&amp;quot;)
data_raw &amp;lt;- data_raw[-c(1,2), ]
head(data_raw)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                    division                   variable    2001    2002
## 3 Grand Duchy of Luxembourg  Total employed population 180,084 182,004
## 4 Grand Duchy of Luxembourg     of which: Wage-earners 162,407 164,277
## 5 Grand Duchy of Luxembourg of which: Non-wage-earners  17,677  17,727
## 6 Grand Duchy of Luxembourg                 Unemployed   5,393   6,773
## 7 Grand Duchy of Luxembourg          Active population 185,477 188,777
## 8 Grand Duchy of Luxembourg   Unemployment rate (in %)    2.91    3.59
##      2003    2004    2005    2006    2007    2008    2009    2010    2011
## 3 183,419 186,325 187,380 192,095 197,486 202,203 204,127 207,923 214,094
## 4 165,509 168,214 169,194 174,045 179,176 183,705 185,369 188,983 194,893
## 5  17,910  18,111  18,186  18,050  18,310  18,498  18,758  18,940  19,201
## 6   8,359   9,426  10,653  10,297   9,670  11,496  14,816  15,567  16,159
## 7 191,778 195,751 198,033 202,392 207,156 213,699 218,943 223,490 230,253
## 8    4.36    4.82    5.38    5.09    4.67    5.38    6.77    6.97    7.02
##      2012    2013    2014    2015 Measures
## 3 219,168 223,407 228,423 233,130  236,100
## 4 199,741 203,535 208,238 212,530  215,430
## 5  19,427  19,872  20,185  20,600   20,670
## 6  16,963  19,287  19,362  18,806   18,185
## 7 236,131 242,694 247,785 251,936  254,285
## 8    7.18    7.95    7.81    7.46     7.15&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is starting to look nice, but we need to replace the “,” with “.” and then convert the columns to numeric.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data_raw %&amp;gt;%
  map_df(function(x)(gsub(&amp;quot;,&amp;quot;, &amp;quot;.&amp;quot;, x = x))) %&amp;gt;%
  mutate_at(vars(matches(&amp;quot;\\d{4}&amp;quot;)), as.numeric
            ) -&amp;gt; clean_unemp

head(clean_unemp)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 18
##   division    variable    `2001` `2002` `2003` `2004` `2005` `2006` `2007`
##   &amp;lt;chr&amp;gt;       &amp;lt;chr&amp;gt;        &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;  &amp;lt;dbl&amp;gt;
## 1 Grand Duch… Total empl… 180    182    183    186    187    192    197   
## 2 Grand Duch… of which: … 162    164    166    168    169    174    179   
## 3 Grand Duch… of which: …  17.7   17.7   17.9   18.1   18.2   18.0   18.3 
## 4 Grand Duch… Unemployed    5.39   6.77   8.36   9.43  10.7   10.3    9.67
## 5 Grand Duch… Active pop… 185    189    192    196    198    202    207   
## 6 Grand Duch… Unemployme…   2.91   3.59   4.36   4.82   5.38   5.09   4.67
## # ... with 9 more variables: `2008` &amp;lt;dbl&amp;gt;, `2009` &amp;lt;dbl&amp;gt;, `2010` &amp;lt;dbl&amp;gt;,
## #   `2011` &amp;lt;dbl&amp;gt;, `2012` &amp;lt;dbl&amp;gt;, `2013` &amp;lt;dbl&amp;gt;, `2014` &amp;lt;dbl&amp;gt;, `2015` &amp;lt;dbl&amp;gt;,
## #   Measures &amp;lt;chr&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This line: &lt;code&gt;map_df(function(x)(gsub(&amp;quot;,&amp;quot;, &amp;quot;.&amp;quot;, x = x)))&lt;/code&gt; calls &lt;code&gt;purrr::map_df()&lt;/code&gt;, which maps a function to each column of a data frame. The function in question is &lt;code&gt;function(x)(gsub(&amp;quot;,&amp;quot;, &amp;quot;.&amp;quot;, x = x))&lt;/code&gt;, which is an anonymous function (meaning it does not have a name) wrapped around &lt;code&gt;gsub&lt;/code&gt;. This function looks for the string “,” and replaces it with “.” in a single column of the data frame. But because we’re mapping this function to all the columns of the data frame with &lt;code&gt;purrr::map_df()&lt;/code&gt;, this substitution happens in each column. We’ not done yet, because these columns are still holding characters. We need to convert each column to a numeric vector and this is what happens in the next line, &lt;code&gt;mutate_at(vars(matches(&amp;quot;\\d{4}&amp;quot;)), as.numeric)&lt;/code&gt;. Each column that contains exactly for digits (hence the &lt;code&gt;&amp;quot;\\d{4}&amp;quot;&lt;/code&gt;) is converted to numeric with &lt;code&gt;dplyr::mutate_at()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now, one last step to really have the data in a nice format:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;clean_unemp %&amp;gt;% 
    gather(key=year, value, -division, -variable) %&amp;gt;%
    spread(variable, value) %&amp;gt;%
    clean_names(
           ) -&amp;gt; clean_unemp

head(clean_unemp)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 6 x 8
##   division year  active_population of_which_non_wage_e… of_which_wage_ear…
##   &amp;lt;chr&amp;gt;    &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt;             &amp;lt;chr&amp;gt;                &amp;lt;chr&amp;gt;             
## 1 Beaufort 2001  688               85                   568               
## 2 Beaufort 2002  742               85                   631               
## 3 Beaufort 2003  773               85                   648               
## 4 Beaufort 2004  828               80                   706               
## 5 Beaufort 2005  866               96                   719               
## 6 Beaufort 2006  893               87                   746               
## # ... with 3 more variables: total_employed_population &amp;lt;chr&amp;gt;,
## #   unemployed &amp;lt;chr&amp;gt;, unemployment_rate_in_percent &amp;lt;chr&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using &lt;code&gt;tidyr::gather()&lt;/code&gt; and then &lt;code&gt;tidyr::spread()&lt;/code&gt; we get a nice data set where each column is a variable and each row is an observation. I advise you run the above code line by line and try to understand what each function does. We finish by cleaning the names of the variables with &lt;code&gt;janitor::clean_names()&lt;/code&gt; and that’s it.&lt;/p&gt;
&lt;p&gt;
Don’t hesitate to follow us on twitter &lt;a href=&#34;https://twitter.com/rdata_lu&#34; target=&#34;_blank&#34;&gt;&lt;span class=&#34;citation&#34;&gt;@rdata_lu&lt;/span&gt;&lt;/a&gt; &lt;!-- or &lt;a href=&#34;https://twitter.com/brodriguesco&#34;&gt;@brodriguesco&lt;/a&gt; --&gt; and to &lt;a href=&#34;https://www.youtube.com/channel/UCbazvBnJd7CJ4WnTL6BI6qw?sub_confirmation=1&#34; target=&#34;_blank&#34;&gt;subscribe&lt;/a&gt; to our youtube channel. &lt;br&gt; You can also contact us if you have any comments or suggestions. See you for the next post!
&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
