How to get your very own RStudio Server and Shiny Server with DigitalOcean

May 9, 2015, 9:30 pm

≫ Next: Exchange data between R and the Google Maps API using Shiny

≪ Previous: In case you missed it: April 2015 roundup

(This article was first published on Dean Attali's R Blog, and kindly contributed to R-bloggers)

If you've always wanted to have an RStudio Server of your own so that you can access R from anywhere, or your own Shiny Server to host your awesome shiny apps, DigitalOcean (DO) can help you get there easily.

DigitalOcean provides virtual private servers (they call each server a droplet), which means that you can pay $5/month to have your own server "in the cloud" that you can access from anywhere and host anything on. Check out my DO droplet to see it in action! Use my referral link to get $10 in credits, which is enough to give you a private server for your first 2 months.

I only found out about DO a couple of months ago when I asked my supervisor, Jenny Bryan, if there was a way for me to do some R-ing when I'm away from my machine. She told me that she doesn't have a solution for me, but that I should check out DigitalOcean, so I did. And it turns out that it's very convenient for hosting my own RStudio Server and anything else I'd like to host, and very affordable even for my student self. :)

This post will cover how to set up a machine from scratch, setup R, RStudio Server, Shiny Server, and a few other helpful features on a brand new DO droplet (remember: droplet = your machine in the cloud). The tutorial might seem lengthy, but it's actually very simple, I'm just breaking up every step into very fine details.

Go to DigitalOcean (use this referral link to get 2 months!) and sign up. Registration is quick and painless, but I think in order to verify your account (and to get your $10) you have to provide credit card details. If you go to your Billing page hopefully you will see the $10 credit.

Step 2: Create a new droplet

Now let's claim one of DO's machines as our own! It's so simple that you definitely don't need my instructions, just click on the big "Create Droplet" button and choose your settings. I chose the smallest/weakest machine ($5/month plan) and it's good enough for me. I also chose San Francisco because it's the closest to me, though it really wouldn't make much of a noticeable difference where the server is located. For OS, I chose to go with the default Ubuntu 14.04 x64. I highly recommend you add an SSH key at the last step if you know how to do that. If not, either read up on it or just proceed without an SSH key.

Note: all subsequent steps assume that you are also using the weakest server possible with Ubuntu 14.04 x64. If you chose different settings, the general instructions will still apply but some of the specific commands/URLs might need to change.

Even though you probably don't need it, here's a short GIF showing me creating a new droplet: Create droplet

Once the droplet is ready (can take a few minutes), you'll be redirected to a page that shows you information about the new droplet, including its IP. From now on, I'll use the random IP address 123.456.1.2 for the rest of this post, but remember to always substitute your actual droplet's IP with this one.

One option to log into your droplet is through the "Access" tab on the page you were redirected to, but it's slow and ugly, so I prefer logging in on my own machine. If you're on a unix machine, you can just use ssh 123.456.1.2. I'm on Windows, so I use PuTTY to SSH ("login") into other machines. Use the IP that you see on the page, with the username root. If you used an SSH key then you don't need to provide a password; otherwise, a password was sent to your email.

You should be greeted with a welcome message and some stats about the server that look like this: Login screen

Step 4: Ensure you don't shoot yourself in the foot

The first thing I like to do is add a non-root user so that we won't accidentally do something stupid as "root". Let's add a user named "dean" and give him admin power. You will be asked to give some information for this new user.

adduser dean
gpasswd -a dean sudo

From now on I will generally log into this server as "dean" instead of "root". If I'll need to run any commands requiring admin abilities, I just have to prepend the command with sudo. Let's say goodbye to "root" and switch to "dean".

su - dean

Step 5: See your droplet in a browser

Right now if you try to visit http://123.456.1.2 in a browser, you'll get a "webpage not available error". Let's make our private server serve a webpage there instead, as a nice visual reward for getting this far. Install nginx:

sudo apt-get update
sudo apt-get install nginx

Now if you visit http://123.456.1.2, you should see a welcome message to nginx. Instant gratification!

Quick nginx references

The default file that is served is located at /usr/share/nginx/html/index.html, so if you want to change what that webpage is showing, just edit that file with sudo vim /usr/share/nginx/html/index.html. For example, I just put a bit of text redirecting to other places in my index page. The configuration file is located at /etc/nginx/nginx.conf.

When you edit an HTML file, you will be able to see the changes immediately when you refresh the page, but if you make configuration changes, you need to restart nginx. In the future, you can stop/start/restart nginx with

sudo service nginx stop
sudo service nginx start
sudo service nginx restart

Step 6: Install R

To ensure we get the most recent version of R, we need to first add trusty to our sources.list:

sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list'

Now add the public keys:

gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -

Now we're ready to install R

sudo apt-get update
sudo apt-get install r-base

You should now be able to run R and hopefully be greeted with a message containing the latest R version.

R welcome

Now you need to quit R (quit()) because there are a couple small things to adjust on the server so that R will work well.

If you also chose the weakest machine type like I did, many packages won't be able to install because of not enough memory. We need to add 1G of swap space:

sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
sudo /sbin/mkswap /var/swap.1
sudo /sbin/swapon /var/swap.1
sudo sh -c 'echo "/var/swap.1 swap swap defaults 0 0 " >> /etc/fstab'

Now installing most packages will work, but before installing any package, I always like having devtools available so that I can install GitHub packages. devtools will currently not be able to get installed (though, annoyingly enough, it will not throw any errors, it will simply not install) because it needs libcurl support. So let's install it:

sudo apt-get install libcurl4-gnutls-dev

Ok, now we can start installing R packages, both from CRAN and from GitHub!

R
install.packages("devtools", repos='http://cran.rstudio.com/')
devtools::install_github("daattali/shinyjs")

Feel free to play around in the R console now.

Step 7: Install RStudio Server

Great, R is working, but RStudio has become such an integral part of our lives that we can't do any R without it!

Quit R (quit()) and install some pre-requisites:

sudo apt-get install libapparmor1 gdebi-core

Download the latest RStudio Server - consult RStudio Downloads page to get the URL for the latest version. Then install the file you downloaded. These next two lines are using the latest version as of writing this post.

wget http://download2.rstudio.org/rstudio-server-0.98.1103-amd64.deb
sudo gdebi rstudio-server-0.98.1103-amd64.deb

Done! By default, RStudio uses port 8787, so to access RStudio go to http://107.170.217.55:8787 and you should be greeted with an RStudio login page. (If you forgot what your droplet's IP is, you can find out by running hostname -I)

RStudio

You can log in to RStudio with any user/password that are available on the droplet. For example, I would log in with username dean and my password. If you want to let your friend Joe have access to your RStudio, you can create a new user for them with adduser joe.

Go ahead and play around in R a bit, to make sure it works fine. I usually like to try out a ggplot2 function, to ensure that graphics are working properly.

Step 8: Install Shiny Server

You can safely skip this step if you don't use shiny and aren't interested in being able to host Shiny apps yourself.

To install Shiny Server, first install the shiny package:

sudo su - -c "R -e "install.packages('shiny', repos='http://cran.rstudio.com/')""

Just like when we installed RStudio, again we need to get the URL of the latest Shiny Server from the Shiny Server downloads page, download the file, and then install it. These are the two commands using the version that is most up-to-date right now:

wget http://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.3.0.403-amd64.deb
sudo gdebi shiny-server-1.3.0.403-amd64.deb

Shiny Server is now installed and running. Assuming there were no problems, if you go to http://107.170.217.55:3838/ you should see Shiny Server's default homepage, which includes some instructions and two Shiny apps:

Shiny Server

If you see an error on the bottom Shiny app, it's probably because you don't have the rmarkdown R package installed (the instructions on the default Shiny Server page mention this). After installing rmarkdown in R, the bottom Shiny app should work as well. I suggest you read through the instructions page at http://107.170.217.55:3838/. A few important points as reference:

Shiny Server log is at /var/log/shiny-server.log.
The default Shiny Server homepage you're seeing is located at /srv/shiny-server/index.html - you can edit it or remove it.
Any Shiny app directory that you place under /srv/shiny-server/ will be served as a Shiny app. For example, there is a default app at /srv/shiny-server/sample-apps/hello/, which means you can run the app by going to http://107.170.217.55:3838/sample-apps/hello/.
The config file for Shiny Server is at /etc/shiny-server/shiny-server.conf.
Important! If you look in the config file, you will see that by default, apps are ran as user "shiny". It's important to understand which user is running an app because things like file permissions and personal R libraries will be different for each user and it might cause you some headaches until you realize it's because the app should not be run as "shiny". Just keep that in mind.

Step 9: Make pretty URLs for RStudio Server and Shiny Server

This is optional and a little more advanced. You might have noticed that to access both RStudio and Shiny Server, you have to remember weird port numbers (:8787 and :3838). Not only is it hard and ugly to remember, but some workplace environments often block access to those ports, which means that many people/places won't be able to access these pages. The solution is to use a reverse proxy, so that nginx will listen on port 80 (default HTTP port) at the URL /shiny and will internally redirect that to port 3838. Same for RStudio - we can have nginx listen at /rstudio and redirect it to port 8787. This is why my Shiny apps can be reached at daattali.com/shiny/ which is an easy URL to type, but also at daattali.com:3838.

You need to edit the nginx config file /etc/nginx/sites-enabled/default:

sudo vim /etc/nginx/sites-enabled/default

Add the following lines right after the line that reads server_name localhost;:

location /shiny/ {
  proxy_pass http://127.0.0.1:3838/;
}

location /rstudio/ {
  proxy_pass http://127.0.0.1:8787/;
}

Since we changed the nginx config, we need to restart nginx for it to take effect.

sudo service nginx restart

Now you should be able to go to http://107.170.217.55/shiny/ or http://107.170.217.55/rstudio/. Much better!

Step 10: Custom domain name

If you have a custom domain that you want to host your droplet on, that's not too hard to set up. For example, my main droplet's IP is 198.199.117.12, but I also purchased the domain daattali.com so that it would be able to host my droplet with a much simpler URL.

There are two main steps involved: you need to configure your domain on DO, and to change your domain servers from your registrar to point to DO.

Configure your domain

In the DO website, click on "DNS" at the top, and then we want to add a domain. Select your droplet from the appropriate input box, and put in your domain name in the URL field. Do not add "www" to the beginning of your domain. Then click on "Create Domain".

You will now get to a page where you can enter more details.

The "A" row should have @ in the first box and the droplet's IP in the second box.
In the three "NS" fields, you should have the values ns1.digitalocean.com., ns2.digitalocean.com., ns3.digitalocean.com.
You also need to add a "CNAME" record, so click on "Add Record", choose "CNAME", enter www in the first box, and your domain's name in the second box. You need to append a dot (.) to the end of the domain name.

Here is what my domain settings look like, make sure yours look similar (note the dot suffix on all the domain names):

DigitalOcean DNS settings

Change your domain servers to DigitalOcean

You also need to configure your domain registrar by adding the 3 nameservers ns1.digitalocean.com, ns2.digitalocean.com, ns3.digitalocean.com. It's fairly simple, but the exact instructions are different based on your registrar, so here is a guide with all the common registrars and how to do this step with each of them.

I use Namecheap, so this is what my domain configuration needs to look like: Namecheap domain servers

And that's it! Now you have a nicely configured private web server with your very own RStudio and Shiny Server, and you can do anything else you'd like on it.

Resources

This is a list of the main blog/StackOverflow/random posts I had to consult while getting all this to work.

Disclaimer

I'm not a sysadmin and a lot of this stuff was learned very quickly from random Googling, so it's very possible that some steps here are not the very best way of performing some tasks. If anyone has any comments on anything in this document, I'd love to hear about it!

To leave a comment for the author, please follow the link and comment on his blog: Dean Attali's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

↧

Exchange data between R and the Google Maps API using Shiny

May 10, 2015, 2:39 am

≫ Next: Git pushing Shiny Apps with Docker & Dokku

≪ Previous: How to get your very own RStudio Server and Shiny Server with DigitalOcean

(This article was first published on R Video tutorial for Spatial Statistics, and kindly contributed to R-bloggers)

A couple of years ago I wrote a post about using Shiny to exchange data between the Google Maps API and R: http://r-video-tutorial.blogspot.ch/2013/07/interfacing-r-and-google-maps.html

Back then as far as I remember Shiny did not allow a direct exchange of data between javascript, therefore I had to improvise and extract data indirectly using an external table. In other words, that work was not really good!!

The new versions of Shiny however features a function to send data directly from javascript to R:
Shiny.onInputChange

This function can be used to communicate any data from the Google Maps API to R. Starting from this I thought about creating an example where I use the Google Maps API to draw a rectangle on the map, send the coordinates of the rectangle to R, create a grid of random point inside it and then plot them as markers on the map. This was I can exchange data back and forth from the two platforms.

For this experiment we do not need an ui.R file, but a custom html page. Thus we need to create a folder named "www" in the shiny-server folder and add an index.html file.
Let's look at the HTML and javascript code for this page:

 <!DOCTYPE html>  
 <html>  
 <head>  
 <title>TEST</title>  
   
 <!--METADATA-->    
 <meta name="author" content="Fabio Veronesi">  
 <meta name="copyright" content="©Fabio Veronesi">  
 <meta http-equiv="Content-Language" content="en-gb">  
 <meta charset="utf-8"/>  
   
 <style type="text/css">  
   
 html { height: 100% }  
 body { height: 100%; margin: 0; padding: 0 }  
 #map-canvas { height: 100%; width:100% }  
   
 </style>  
   
      
   
 <script type="text/javascript"  
    src="https://maps.googleapis.com/maps/api/js?&sensor=false&language=en">  
   </script>  
        
 <script type="text/javascript" src="http://google-maps-utility-library-v3.googlecode.com/svn/tags/markerclusterer/1.0/src/markerclusterer.js"></script>  
   
 <script src="https://maps.googleapis.com/maps/api/js?v=3.exp&signed_in=true&libraries=drawing"></script>  
   
   
   
        
 <script type="text/javascript">  
      //We need to create the variables map and cluster before the function  
      var cluster = null;  
      var map = null;  
        
      //This function takes the variable test, which is the json we will create with R and creates markers from it  
      function Cities_Markers() {  
           if (cluster) {  
                cluster.clearMarkers();  
                }  
           var Gmarkers = [];  
           var infowindow = new google.maps.InfoWindow({ maxWidth: 500,maxHeight:500 });  
   
           for (var i = 0; i < test.length; i++) {   
                var lat = test[i][2]  
                var lng = test[i][1]  
                var marker = new google.maps.Marker({  
                     position: new google.maps.LatLng(lat, lng),  
                     title: 'test',  
                     map: map  
                });  
         
           google.maps.event.addListener(marker, 'click', (function(marker, i) {  
                return function() {  
                     infowindow.setContent('test');  
                     infowindow.open(map, marker);  
                }  
                })(marker, i));  
           Gmarkers.push(marker);  
           };  
           cluster = new MarkerClusterer(map,Gmarkers);  
           $("div#field_name").text("Showing Cities");  
      };  
   
   
      //Initialize the map  
      function initialize() {  
           var mapOptions = {  
           center: new google.maps.LatLng(54.12, -2.20),  
           zoom: 5  
      };  
   
      map = new google.maps.Map(document.getElementById('map-canvas'),mapOptions);  
        
   
      //This is the Drawing manager of the Google Maps API. This is the standard code you can find here:https://developers.google.com/maps/documentation/javascript/drawinglayer  
  var drawingManager = new google.maps.drawing.DrawingManager({  
   drawingMode: google.maps.drawing.OverlayType.MARKER,  
   drawingControl: true,  
   drawingControlOptions: {  
    position: google.maps.ControlPosition.TOP_CENTER,  
    drawingModes: [  
     google.maps.drawing.OverlayType.RECTANGLE  
    ]  
   },  
   
   rectangleOptions: {   
    fillOpacity: 0,  
    strokeWeight: 1,  
    clickable: true,  
    editable: false,  
    zIndex: 1  
   }  
        
  });  
   
//This function listen to the drawing manager and after you draw the rectangle it extract the coordinates of the NE and SW corners
  google.maps.event.addListener(drawingManager, 'rectanglecomplete', function(rectangle) {  
   var ne = rectangle.getBounds().getNorthEast();  
      var sw = rectangle.getBounds().getSouthWest();  
   
      //The following code is used to import the coordinates of the NE and SW corners of the rectangle into R  
      Shiny.onInputChange("NE1", ne.lat());  
      Shiny.onInputChange("NE2", ne.lng());  
      Shiny.onInputChange("SW1", sw.lat());  
      Shiny.onInputChange("SW2", sw.lng());  
        
 });  
   
   
  drawingManager.setMap(map);  
    
 }  
   
   
 google.maps.event.addDomListener(window, 'load', initialize);  
</script>  
        
   
        
   
 <script type="application/shiny-singletons"></script>  
 <script type="application/html-dependencies">json2[2014.02.04];jquery[1.11.0];shiny[0.11.1];bootstrap[3.3.1]</script>  
 <script src="shared/json2-min.js"></script>  
 <script src="shared/jquery.min.js"></script>  
 <link href="shared/shiny.css" rel="stylesheet" />  
 <script src="shared/shiny.min.js"></script>  
 <meta name="viewport" content="width=device-width, initial-scale=1" />  
 <link href="shared/bootstrap/css/bootstrap.min.css" rel="stylesheet" />  
 <script src="shared/bootstrap/js/bootstrap.min.js"></script>  
 <script src="shared/bootstrap/shim/html5shiv.min.js"></script>  
 <script src="shared/bootstrap/shim/respond.min.js"></script>  
   
 </head>  
   
   
 <body>  
   
  <div id="json" class="shiny-html-output"></div>  
  <div id="map-canvas"></div>  
    
 </body>  
 </html>

As you know an HTML page has two main elements: head and body.
In the head we put all the style of the page, the metadata and the javascript code. In the body we put the elements that would be visible to the user.

After some basic metadata (written in orange), such as Title, Author and Copyright, we find a style section (in yellow) with the style of the Google Maps API. This is standard code that you can find here, where they explain how to create a simple page with google maps: Getting Started

Below we have some script calls (in blue) where we import some elements we would need to run the rest of the code. We have here the scripts to run the Google Maps API itself, plus the script to run the drawing manager, which is used to draw a rectangle onto the map, and the js script to create the clusters from the markers, otherwise we would have too many overlapping icons.

Afterward we can write the core script of the Google Maps API; here I highlighted the start and the end of the script in red and all the comments in pink so that you can work out the subdivision I made.

First of all we need to declare two variables, map and cluster as null. This is because these two variables are used in the subsequent function and if we do not declare them the function will not work. Then we can define a function, which I call Cities_Marker() because I have taken the code directly from Audioramio. This function takes a json, stored into a variable called test, loops through it and creates a mark for each pair of coordinates in the json. Then it cluster the markers.

Afterward there is the code to initialize the map and the drawing manager. The code for the drawing manager can be found here: Drawing Manager

The crucial part of the whole section is the Listener function. This code, as soon as you draw a rectangle on the map, extracts the coordinates of the NE and SW corners and store them in two variables. Then we can use the function Shiny.onInputChange to transfer these variable from javascript to R.

The final step to allow the communication back to javascript from R is create a div element in the body of the page (in blue) of the class "shiny-html-output" with the ID "json". The ID is the part that allow Shiny to identify this element.

Now we can look at the server.R script:

 # server.R  
 library(sp)  
 library(rjson)  
    
 shinyServer(function(input, output, session) {  
    
 output$json <- reactive({  
 if(length(input$NE1)>0){  
   
 #From the Google Maps API we have 4 inputs with the coordinates of the NE and SW corners  
 #using these coordinates we can create a polygon  
 pol <- Polygon(coords=matrix(c(input$NE2,input$NE1,input$NE2,input$SW1,input$SW2,input$SW1,input$SW2,input$NE1),ncol=2,byrow=T))  
 polygon <- SpatialPolygons(list(Polygons(list(pol),ID=1)))  
   
 #Then we can use the polygon to create 100 points randomly  
 grid <- spsample(polygon,n=100,type="random")  
   
 #In order to use the function toJSON we first need to create a list  
 lis <- list()  
 for(i in 1:100){  
 lis[[i]] <- list(i,grid$x[i],grid$y[i])  
 }  
   
 #This code creates the variable test directly in javascript for export the grid in the Google Maps API  
 #I have taken this part from:http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny  
    paste('<script>test=',   
       RJSONIO::toJSON(lis),  
       ';Cities_Markers();', # print 1 data line to console  
       '</script>')  
      }  
  })  
 })

For this script we need two packages: sp and rjson.
The first is needed to create the polygon and the grid, the second to create the json that we need to export to the webpage.

Shiny communicates with the page using the IDs of the elements in the HTML body. In this case we created a div called "json", and in Shiny we use output$json to send code to this element.
Within the reactive function I first inserted an if sentence to avoid the script to start if no polygon has been drawn yet. As soon as the user draws a polygon onto the map, the four coordinates are transmitted to R and used to create a polygon (in blue). Then we can create a random grid within the polygon area with the function spSample (in orange).

Subsequently we need to create a list with the coordinates of the points, because the function toJSON takes a list as main argument.
The crucial part of the R script is the one written in red. Here we basically take the list of coordinates, we transform it into a json file and we embed it into the div element as HTML code.
This part was taken from this post: http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny

This allow R to transmit its results to the Google Maps API as a variable named test, which contains a json file. As you can see from the code, right after the json file we run the function Cities_Markers(), which takes the variable test and creates markers on the map.

Conclusion
This way we have demonstrated how to exchanged data back and forth between R and the Google Maps API using Shiny.
The page is publicly accessible at this address, from an Amazon Ubuntu machine:
Point Grid

To leave a comment for the author, please follow the link and comment on his blog: R Video tutorial for Spatial Statistics.

↧

Git pushing Shiny Apps with Docker & Dokku

May 11, 2015, 5:35 pm

≫ Next: Analyzing R-Bloggers’ posts via Twitter

≪ Previous: Exchange data between R and the Google Maps API using Shiny

(This article was first published on Flavio Barros » r-bloggers, and kindly contributed to R-bloggers)

At this post i will show you how to deploy Shiny Apps easily with a simple git push. But, what’s a git push? I’m referring to the git command used with remote repositories. With this command you can deploy apps easily with a PaaS (Platform as a Service) like Heroku. If you never heard about Heroku or know nothing about PaaS, i will show you what is it and how can we use a similar resource to easily deploy a Shiny App on Digital Ocean with Docker.

1.Heroku

Anyone who have worked with web know, or at least heard about Heroku. Heroku is a PaaS, an acronym for Platform as a Service. The idea behind Heroku is that the developer does not need to worry about the problems related to the implementation of its software, it simply develops, adds some files in the project, gives a git push to Heroku and he takes care of the rest.

Heroku is an excellent service, can scale to large apps, is easy to learn and use, BUT it can be very expensive! Just to understand how easy is work with Heroku, i will deploy an example web app made with Django; the code can be found here and you can visit the app here: http://flavio-django-blog.herokuapp.com/

2.Docker and Dokku

I wrote some time ago about Dockerizing a Shiny App: Dockerizing a Shiny App (read before continue). In fact, i did that because i was researching about a method to easily deploy Shiny Apps just like any regular web app, (ex. Node or Django). Some months ago, i found this project, where the author claims to be able to run Shiny Apps on Heroku. I tried, but never was able to make it work. Another problem, was that i was searching something that i could host on my own server.

At the end, after discovering Docker, i started to use Dokku instead of Heroku. Dokku in turn, is a kind of Heroku clone, made from Docker. It works the same way as Heroku, such that for web apps, like those made with Django, deployment is identical. Just so you see how it works, I installed Dokku on Digital Ocean and implemented the same previous application made in Django. You can check the app here: http://djangogirls.flaviobarros.net/

In my opinian Dokku is one of the best apps made with Docker. With it we can deploy multiple technologies at the very same server. In fact, you can build a cheaper version of Heroku, for about U$5,00 as is the case of Digital Ocean. Right now, i’m running a Digital Ocean Server, hosting this WordPress installation, two Django Web Apps and a Shiny App and there is plenty of space for much more, all for a 10 bucks VPS on Digital Ocean

3.Git pushing the Wordcloud Shiny App

Some time ago, Dokku featured Dockerfile build support. With this feature becomes possible to git push any app that could be builded from a Dockerfile. A Dockerfile, is nothing more then a recipe to build Docker Images, something like the Word Cloud image that i released at Docker Hub.

With this in mind, i thought: why don’t change the official Shiny Server docker image to host a single Shiny App? I just needed an image that:

1) Exposes 80 PORT;

2) Serves just an app;

3) Could be builded from a Dockerfile;

In fact, to build shiny-wordcloud, i forked rocker/shiny and implemented this features by means of some modifications on the Dockerfile (commits: 1, 2 and 3) and a conf file. Now i have fully working Dockerfile that i can use to git push Shiny Apps to Dokku! From now on, i will show you how you can install Dokku at Digital Ocean and how i deployed the Wordcloud Shiny App at my server with a simple git push!

3.1 Git pushing to my server

Just follow the video. You can visit the app here: http://wordcloud.flaviobarros.net/

3.2 Installing Dokku on Digital Ocean

Follow this screencast. In the end you will have your server available at an IP. If you want a domain, will have to register it (ex. Godaddy) and point it to Digital Ocean DNS. You can follow this tutorial to setup a DNS on Digital Ocean. If you have any problem let me know.

3.3 Important details

I have used a SSH key that i have stored at Digital Ocean. Usually, when you spin up your Digital Ocean droplets, you get an email as soon as the process completes, letting you know the droplet’s IP address and password. Although this email is convenient, there is a more secure (and faster) way of gaining access to your server without the need for email. This can be done by setting up SSH keys. Follow this tutorial to get this done.

Conclusion

Recently i saw two interesting blog posts about Shiny App deployment:

1) Run Shiny app on a Ubuntu server on the Amazon Cloud

2) How to get your very own RStudio Server and Shiny Server with DigitalOcean

In both scenarios this approach has several advantages:

– You can replicate this dokku installation on Amazon and have the same functionality.

– When you are running multiple Shiny Apps on the same Shiny Server, you are using a single R instance. SO, if you have more then one app deployed your server can slow down. With this solution, each app is isolated with your own Shiny Server instance, which is more reliable.

– The deployment process is easier. Once dokku is installed you don’t need to connect to the server to deploy an app. Just use git push!

– You can deploy multiple Shiny Apps, and multiple Web Apps. You can have Shiny, WordPress, Django and etc, on the same server.

– With dokku-alt (an improved fork of Dokku) you can setup passwords to access your Shiny Apps, something that is only available on Shiny Server Pro.

IMPORTANT: through any link to Digital Ocean in this post, you will earn U$10.00 credit without commitment to keep up the service. With this credit you can keep a simple VPS with 512MB RAM, for two months for free!

The post Git pushing Shiny Apps with Docker & Dokku appeared first on Flavio Barros.

To leave a comment for the author, please follow the link and comment on his blog: Flavio Barros » r-bloggers.

↧

Analyzing R-Bloggers’ posts via Twitter

May 17, 2015, 11:30 pm

≫ Next: The perfect t-test

≪ Previous: Git pushing Shiny Apps with Docker & Dokku

(This article was first published on Dean Attali's R Blog, and kindly contributed to R-bloggers)

For those who don’t know, every time a new blog post gets added to R-Bloggers, it gets a corresponding tweet by @Rbloggers, which gets seen by Rbloggers’ ~20k followers fairly fast. And every time my post gets published, I can’t help but check up on how many people gave that tweet some Twitter love, ie. “favorite”d or “retweet”ed it. It’s even more exciting than getting a Facebook “like” on a photo from Costa Rica!

Seeing all these tweets and how some tweets get much more attention than others has gotten me thinking. Are there some power users who post almost all the content, or do many blogs contribute equally? Which posts were the most shared? Which blog produces the highest quality posts consistently? Are there more posts during the weekdays then weekends? And of course the holy grail of bloggers - is there a day when it’s better to post to get more shares?

To answer these questions, I of course turned to R. I used the twitteR package to get information about the latest 3200 tweets made by Rbloggers, Hadley’s httr to scrape each blog post to get the post’s author, and ggplot2 to visualize some cool aspects of the data. Unfortunately Twitter does not allow us to fetch any tweets older than that (if you know of a workaround, please let me know), so the data here will be looking at tweets made from September 2013 until now (mid May 2015). That’s actually a nice start date because it’s exactly when I started grad school and when I first used R. So you can think of this analysis as “R-Bloggers’ tweets since Dean’s R life started” :)

I’m going to use some terminology very loosely and interchangeably throughout this post:

“blog” == “author” == “contributor”
“tweet” == “post”
“successful” post == “highly shared” == “high score” == “high quality”

It’s clear that all those terms not necessarily the same thing (for example, virality does not necessarily mean high quality), but I’ll be using them all as the same.

There is also an accompanying interactive document to supplement this post. That document has a few interactive plots/tables for data that is better explored interactively rather than as an image, and it also contains all the source code that was used to make the analysis and all the figures here. The source code is also available on GitHub as the raw text version of the interactive document. In this post I will not be including too much lengthy code, especially not for the plots.

Before going any further, I’d like to say that this is not intended to be a comprehensive analysis and definitely has many weaknesses. It’s just for fun. I’m not even going to be making any statistical significance tests at any point or do any quantitative analysis. Maybe titling this as “Analyzing” is wrong and should instead be “Exploring”? This post looks exclusively at data directly related to @Rbloggers tweets; I am not looking at data from any other social media or how many times the post was shared via R-Bloggers website rather than through Twitter. I’m also not looking at how much discussion (replies) a tweet generates. I wanted to include data from the number of times the “Tweet” button was pressed directly on R-Bloggers, but it looks like most older posts have 0 (maybe the button is a recent addition to R-Bloggers), so it’ll introduce an unfair bias towards new posts. And of course the biggest criticism here is that you simply can’t judge a blog post by the number of times it’s shared on Twitter, but here we go.

Data preparation

This is the boring part - get the data from Twitter, fill in missing pieces of information, clean up… Feel free to skip to the more exciting part.

Get data from Twitter

As mentioned above, I could only grab the last 3200 tweets made by @Rbloggers, which equates to all tweets since Sept 2013. For each tweet I kept several pieces of information: tweet ID, tweet date, day of the week the tweet was made, number of times tweet was favorited, number of times tweet was retweeted, tweet text, and the last URL in the tweet text. The tweet text is essentially a blog post’s title that has been truncated. I keep the last URL that appears in every tweet because that URL always points to the article on R-Bloggers. I’m only storing the date but losing the actual time, which is another weakness. Also, the dates are according to UTC, and many Rbloggers followers are in America, so it might not be the most correct.

Anyway, after authenticating with Twitter, here’s the code to get this information using twitteR:

MAX_TWEETS <- 3200
tweets_raw <- userTimeline('Rbloggers', n = MAX_TWEETS,
                           includeRts = FALSE, excludeReplies = TRUE)

tweets <- 
  ldply(tweets_raw, function(x) {
    data_frame(id = x$id,
               date = as.Date(x$created),
               day = weekdays(date),
               favorites = x$favoriteCount,
               retweets = x$retweetCount,
               title = x$text,
               url = x$urls %>% .[['url']] %>% tail(1)
    )
  })

rm(tweets_raw)  # being extremely memory conscious

Remember the full source code can be viewed here.

Scrape R-Bloggers to get author info

Since I had some questions about the post authors and a tweet doesn’t give that information, I resorted to scraping the R-Bloggers post linked in each tweet using httr to find the author. This part takes a bit of time to run. There were a few complications, mostly with authors whose name is their email and R-Bloggers attemps to hide it, but here is how I accomplished this step:

# Get the author of a single post given an R-Bloggers post URL
get_post_author_single <- function(url) {
  if (is.null(url)) {
    return(NA)
  }

  # get author HTML node
  author_node <- 
    GET(url) %>%
    httr::content("parsed") %>%
    getNodeSet("//a[@rel='author']")
  if (author_node %>% length != 1) {
    return(NA)
  }

  # get author name
  author <- author_node %>% .[[1]] %>% xmlValue

  # r-bloggers hides email address names so grab the email a different way
  if (nchar(author) > 100 && grepl("document\.getElementsByTagName", author)) {
    author <- author_node %>% .[[1]] %>% xmlGetAttr("title")
  }

  author  
}

# Get a list of URL --> author for a list of R-Bloggers post URLs
get_post_author <- function(urls) {
  lapply(urls, get_post_author_single) %>% unlist
}

# Add the author to each tweet.
# This will take several minutes because we're scraping r-bloggers 3200 times (don't run this frequently - we don't want to overwork our beloved R-Bloggers server)
tweets %<>% mutate(author = get_post_author(url))  

# Remove NA author (these are mostly jobs postings, plus a few blog posts that have been deleted)
tweets %<>% na.omit

The last line there removed any tweets without an author. That essentially removes all tweets that are advertising job postings and a few blog posts that have been deleted.

Clean up data

It’s time for some clean up:

Remove the URL and #rstats hashtag from every tweet’s title
Older posts all contain the text “This article was originally posted on … and kindly contributed by …” - try to remove that as well
Order the day factor levels in order from Monday - Sunday
Truncate very long author names with an ellipsis
Merge duplicate tweets (tweets with the same author and title that are posted within a week)

After removing duplicates and previously removing tweets about job postings, we are left with 2979 tweets (down from 3200). You can see what the data looks like here or see the code for the cleanup on that page as well.

Add a score metric

Now that we have almost all the info we need for the tweets, there is one thing missing. It’d be useful to have a metric for how successful a tweet is using the very little bit of information we have. This is of course very arbitrary. I chose to score a tweet’s success as a linear combination of its “# of favorites” and “# of retweets”. Since there are roughly twice as many favorites as retweets in total, retweets get twice the weight. Very simple formula :)

sum(tweets$favorites) / sum(tweets$retweets)   # result = 2.1
tweets$score <- tweets$favorites + tweets$retweets * 2

Exploration

Time for the fun stuff! I’m only going to make a few plots, you can get the data from GitHub if you want to play around with it in more depth.

Scores of all tweets

First I’d like to see a simple scatterplot showing the number of favorites and retweets for each blog post.

Looks like most posts are close to the (0, 0) area, with 20 favorites and 10 retweets being the maximum boundary for most. A very small fraction of tweets make it past the 40 favorites or 20 retweets.

Most successful posts

From the previous plot it seems like there are about 10 posts that are much higher up than everyone else, so let’s see what the top 10 most shared Rbloggers posts on Twitter were since Sep 2013.

title	date	author	favorites	retweets	score
A new interactive interface for learning R online, for free	2015.04.14	DataCamp	78	49	176
Introducing Radiant: A shiny interface for R	2015.05.04	R(adiant) news	85	44	173
Choosing R or Python for data analysis? An infographic	2015.05.12	DataCamp	59	54	167
Why the Ban on P-Values? And What Now?	2015.03.07	Nicole Radziwill	47	47	141
Machine Learning in R for beginners	2015.03.26	DataCamp	68	29	126
Free Stanford online course on Statistical Learning (with R) starting on 19 Jan 2015	2014.11.22	Yanchang Zhao	54	35	124
Four Beautiful Python, R, MATLAB, and Mathematica plots with LaTeX	2014.12.20	Plotly	57	29	115
In-depth introduction to machine learning in 15 hours of expert videos	2014.09.24	Kevin Markham	64	20	104
Learn Statistics and R online from Harvard	2015.01.17	David Smith	49	27	103
R Tutorial on Reading and Importing Excel Files into R	2015.04.04	DataCamp	61	20	101

8/10 of the top 10 posts have “R” in their title… correlation or causation or random? Maybe I should start doing that too then!

Looks like the DataCamp blog is a pretty major Rbloggers contributor with 4/10 of the most tweeted posts. Which leads me perfectly into the next section.

Highest scoring authors

So as I just said, DataCamp looks like it contributes very high quality posts. I wanted to see which blogs contribute the most successful posts consistently. The following shows the authors with the highest average score per tweet.

author	num_tweets	avg_favorites	avg_retweets	avg_score
R(adiant) news	1	85.0	44.0	173.0
Martin Schneider	1	34.0	25.0	84.0
Juuso Parkkinen	1	34.0	22.0	78.0
Tal Yarkoni	1	15.0	25.0	65.0
Kevin Markham	6	35.8	13.0	61.8
Dean Attali’s R Blog	5	32.2	13.2	58.6
filip Schouwenaars	1	26.0	16.0	58.0
Christian Groll	1	13.0	21.0	55.0
Josh Paulson	1	23.0	15.0	53.0
Kushan Shah	1	33.0	10.0	53.0

First impression: Woo, I’m in there! :D

Posts by highest scoring authors

Now that I know which blogs have the best posts on average, I wanted to see what each of their tweets looked like.

It’ll be nice to see how these compare to all other posts. The following figure shows the scores of all tweets, and highlights the posts made by any of the top-10 authors.

Pretty. But it looks like the list of top 10 authors is dominated by one-hit wonders, which makes sense because it’s much easier to put a lot of effort into one single post than to constantly pump out great articles over and over again. So let’s try again seeing who has the highest average score, but only consider blogs that contributed more than one post.

author	num_tweets	avg_favorites	avg_retweets	avg_score
Kevin Markham	6	35.8	13.0	61.8
Dean Attali’s R Blog	5	32.2	13.2	58.6
Matt	2	27.5	12.0	51.5
Bruno Rodrigues	4	23.8	13.2	50.2
Plotly	7	25.6	11.6	48.7
DataCamp	32	23.3	12.4	48.0
Tim Phan	2	27.5	9.0	45.5
Slawa Rokicki	3	21.0	11.3	43.7
Jan Górecki - R	3	25.3	8.7	42.7
Nicole Radziwill	13	21.3	10.3	41.9

Ah, there’s DataCamp - by far more posts than the rest of us, and still a very high average score. Respect.

Who contributes the most?

I also wanted to know how many blogs contribute and how much each one contributes. R-Bloggers says on its frontpage that there are 573 blogs. According to my data, there are 420 unique authors since Sept 2013, so about 1/4 of the blogs have not posted since then. Here is the distribution of how many blog posts different blogs made:

Seems like a lot of people only posted once in the past 1.5 years. That graph is actually cut off at 50 because there are a few outliers (individuals who posted way more than 50). Let’s see who these power users are, so we know who to thank for most of the content.

author	num_tweets	avg_favorites	avg_retweets	avg_score
David Smith	166	7.9	5.6	19.2
Thinking inside the box	118	1.8	0.9	3.6
Joseph Rickert	117	8.3	3.8	15.9
xi’an	88	3.1	1.2	5.6
Tal Galili	71	5.7	4.2	14.1

There you have it - the 5 people who single-handedly (or.. quintuple-handedly?) are responsible for 1/6 of the posts we’ve seen since I learned what R is.

Post success by day of week

One of my main questions was whether there is some correlation between when a post is posted and how successful it is. I also wanted to see if there are certain days of the week that are more/less active. Here is a table summarizing the number of posts made on each day of the week and how successful each day was on average.

day	num_tweets	favorites_per_post	retweets_per_post	avg_score
Monday	451	7.6	3.7	15.0
Tuesday	551	7.2	3.3	13.8
Wednesday	461	7.1	3.4	13.9
Thursday	487	7.6	3.6	14.8
Friday	429	7.5	3.7	14.9
Saturday	323	8.7	4.3	17.3
Sunday	277	8.9	3.8	16.5

Cool! This actually produced some non-boring results. I’m not going to make any significance tests, but I do see two interesting pieces of information here. First of all, it looks like the weekend (Sat-Sun) is quieter than weekdays in terms on number of posts made. Second of all, the two days with the highest average score are also Sat-Sun. I won’t go into whether or not having ~1 more favorite and < 1 more retweet on average is significant, but it’s at least something. Maybe because there are less posts on the weekend, each post gets a bit more visibility and stays at the top of the feed longer, thereby having a small advantage? Or maybe the small difference in score we’re seeing is just because there are less posts in total and it’ll even out once n is large enough?

Whatever the case might be, here’s a plot that shows the score of every tweet grouped by day. The large points show the average of all posts made on that day.

Significant or not, at least it looks pretty.

Wordcloud

I must admit I’m not the biggest fan of wordclouds, but it feels like no amateur R analysis can be complete without one of these bad boys these days. Here you go wordcloud-lovers - the 100 most popular terms in R-Bloggers posts’ titles since Sept 2013.

I actually don’t have much to comment on that, there isn’t anything that strikes me as surprising here.

Remember to check out the accompanying interactive doc + source code!

To leave a comment for the author, please follow the link and comment on his blog: Dean Attali's R Blog.

↧

The perfect t-test

May 18, 2015, 1:54 am

≫ Next: Interactive charts in R

≪ Previous: Analyzing R-Bloggers’ posts via Twitter

(This article was first published on Daniel Lakens, and kindly contributed to R-bloggers)

I've created an easy to use R script that will import your data, and performs and writes up a state-of-the-art dependent or independent t-test. The goal of this script is to examine whether more researcher-centered statistical tools (i.e., a one-click analysis script that checks normality assumptions, calculates effect sizes and their confidence intervals, creates good figures, calculates Bayesian and robust statistics, and writes the results section) increases the use of novel statistical procedures. Download the script here: https://github.com/Lakens/Perfect-t-test. For comments, suggestions, or errors, e-mail me at D.Lakens@tue.nl. The script will likely be updated - check back for updates or follow me @Lakens to be notified of updates.

Correctly comparing two groups is remarkably challenging. When performing a t-test researchers rarely manage to follow all recommendations that statisticians have made over the years. Where statisticians update their recommendations, statistical textbooks often do not. Even though reporting effect sizes and their confidence intervals has been recommended for decades (e.g., Cohen, 1990), statistical software (e.g., SPSS 22) often does not provide these statistics. Progress is slow, and Sharpe (2013) points to a lack of awareness, a lack of time, a lack of easily usable software, and a lack of education as some of the main reasons for the resistance to adopting statistical innovations.

Here, I propose a way to speed up the widespread adoption of the state-of-the-art statistical techniques by providing researchers with an easy to use script in free statistical software (R) that will perform and report all statistical analyses, practically with a single button press. The script (Lakens, 2015, available at https://github.com/Lakens/Perfect-t-test) follows state-of-the-art recommendations (see below), creates plots of the data, and writes the results section, including a minimally required interpretation of the statistical results.

Automated analyses might strike readers as a bad idea because it facilitates mindless statistics. Having performed statistics mindlessly for most of my professional career, I sincerely doubt access to this script would have reduced my level of understanding. If anything, reading an automatically generated results section of your own data that includes statistics you are not accustomed to calculate or report is likely to make you think more about the usefulness of these statistics, not less. However, the goal of this script is not to educate people. The main goal is to get researchers to perform and report the analyses they should, and make this as efficient as possible.

Comparing two groups

Keselman, Othman, Wilcox, and Fradette (2004) proposed the a more robust two-sample t-test that provides better Type 1 error control in situations of variance heterogeneity and nonnormality, but their recommendations have not been widely implemented. Researchers might in general be unsure whether it is necessary to change the statistical tests they use to analyze and report comparisons between groups. As Wilcox, Granger, and Clark (2013, p. 29) remark: “All indications are that generally, the safest way of knowing whether a more modern method makes a practical difference is to actually try it.” Making sure conclusions based on multiple statistical approaches converge is an excellent way to gain confidence in your statistical inferences. This R script calculates traditional Frequentist statistics, Bayesian statistics, and robust statistics, using both a hypothesis testing as an estimation approach, to invite researchers to examine their data from different perspectives.

Since Frequentist and Bayesian statistics are based on assumptions of equal variances and normally distributed data, the R script provides boxplots and histograms with kernel density plots overlaid with a normal distribution curve to check for outliers and normality. Kernel density plots are a non-parametric technique to visualize the distribution of a continuous variable. They are similar to a histogram, but less dependent on the specific choice of bins used when creating a histogram. The graphs plot both the normal distribution, as the kernel density function, making it easier to visually check whether the data is normally distributed or not. Q-Q plots are provided as an additional check for normality.

Yap and Sim (2011) show that no single test for normality will perform optimally for all possible distributions. They conclude (p. 2153): “If the distribution is symmetric with low kurtosis values (i.e. symmetric short-tailed distribution), then the D'Agostino-Pearson and Shapiro-Wilkes tests have good power. For symmetric distribution with high sample kurtosis (symmetric long-tailed), the researcher can use the JB, Shapiro-Wilkes, or Anderson-Darling test." All four normality tests are provided in the R script. Levene’s test for the equality of variances is provided, although for independent t-tests, Welch’s t-test (which does not require equal variances) is provided by default, following recommendations by Ruxton (2006). A short explanation accompanies all plots and assumption checks to help researchers to interpret the results.

The script also creates graphs that, for example, visualize the distribution of the datapoints, and provide both within as between confidence intervals:

The script provides interpretations for effect sizes based on the classifications ‘small’, ‘medium’, and ‘large’. Default interpretations of the size of an effect based on these three categories should only be used as a last resort, and it is preferable to interpret the size of the effect in relation to other effects in the literature, or in terms of its practical significance. However, since researchers often do not interpret effect sizes (if they are reported to begin with), the default interpretation (and the suggestion to interpret effect sizes in relation to other effects in the literature) should at least function as a reminder that researchers are expected to interpret effect sizes. The common language effect size (McGraw & Wong, 1992) is provided as an additional way to communicate the effect size.

Similarly, the Bayes Factor is classified into anecdotal, moderate, strong, very strong, and decisive evidence for the alternative or null hypothesis, following Jeffreys (1961), even though researchers are reminded that default interpretations of the strength of the evidence should not distract from the fact that strength of evidence is a continuous function of the Bayes Factor. We can expect researchers will rely less on default interpretations, the more acquainted they become with these statistics, but for novices some help in interpreting effect sizes and Bayes Factors will guide their interpretation.

Running the Markdown script

R Markdown scripts provide a way to create fully reproducible reports from data files. The script combines the commands to perform all statistical analyses with the written sections of the final output. Calculated statistics and graphs are inserted into the written report at specified locations. After installing the required packages, preparing the data, and specifying some variables in the Markdown document, the report can be generated (and thus, the analysis procedure can be performed) with a single mouse-click (scroll down for an example of the output).

The R Markdown script and the ReadMe file contain detailed instructions on how to run the script, and how to install required packages, including the PoweR package (Micheaux & Tran, 2014) to perform the normality tests, HLMdiag to create the Q-Q plots (Loy & Hofmann, 2014). ggplot2 for all plots (Wickham, 2009), car (Fox & Weisberg, 2011) to perform Levene's test, MBESS(Kelley, 2007) to calculate effect sizes and their confidence intervals, WRS for the robust statistics (Wilcox & Schönbrodt, 2015), BootsES to calculate a robust effect size for the independent t-test (Kirby & Gerlanc, 2013), BayesFactor for the bayes factor (Morey & Rouder, 2015), and BEST (Kruschke & Meredith, 2014) to calculate the Bayesian highest density interval.

The data file (which should be stored in the same folder that contains the R markdown script) needs to be tab delimited with a header at the top of the file (which can easily be created from SPSS by saving data through the 'save as' menu and selecting 'save as type: Tab delimited (*.dat)', or in Excel by saving the data as ‘Text (Tab delimited) (.txt)’. For the independent t-test the data file needs to contain at least two columns (one specifying the independent variable, and one specifying the dependent variable, and for the dependent t-test the data file needs to contain three columns, one subject identifier column, and two columns for the two dependent variables. The script for dependent t-tests allows you to select a subgroup for the analysis, as long as the data file contains an additional grouping variable (see the demo data). The data files can contain irrelevant data, which will be ignored by the script. Finally, researchers need to specify the names (or headers) of the independent and dependent variables, as well as grouping variables. Finally, there are some default settings researchers can change, such as the sidedness of the test, the alpha level, the percentage for the confidence intervals, and the scalar on the prior for the Bayes Factor.

The script can be used to create either a word document or a html document. The researchers can easily interpret all the assumption checks, look at the data for possible outliers, and (after minor adaptations) copy-paste the result sections into their article.

The statistical results the script generates has been compared against the results provided by SPSS, JASP, ESCI, online Bayes Factor calculators, and BEST online. Minor variations in the HDI calculation between BEST online and this script are possible depending on the burn-in samples and number of samples, and for huge t-values there are minor variations between JASP and the latest version of the Bayes Factor package used in this script. This program is distributed in the hope that it will be useful, but without any warranty. If you find an error, please contact me at D.Lakens@tue.nl.

Promoting Statistical Innovations

Statistical software is built around individual statistical tests, while researchers perform a set of procedures. Although it is not possible to create standardized procedures for all statistical analyses, most, if not all, of the steps researchers have to go through when they want to report correlations, regression analyses, ANOVA’s, and meta-analyses are sufficiently structured. These tests make up a large portion of analyses reported in journal articles. Demonstrating this, David Kennyhas created R scripts that will perform and report mediation and moderator analyses. Felix Schönbrodt has created a Shiny app that performs several meta-analytic techniques. Making statistical innovations more accessible has a high potential to substantially improve the quality of the statistical tests researchers perform and report. Statisticians who take the application of generated knowledge seriously should try to experiment with the best way to get researchers to use state-of-the-art techniques. R markdown scripts are an excellent method to combine statistical analyses and a written report in free software. Shiny apps might make these analyses even more accessible, because they no longer require users to install R and R packages.

Despite the name of this script, there is probably not such a thing as a ‘perfect’ report of a statistical test. Researchers might prefer to report standard errors instead of standard deviations, perform additional checks for normality, different Bayesian or robust statistics, or change the figures. The benefit of markdown scripts with a GNU license stored on GitHub is that they can be forked(copied to a new repository) where researchers are free to remove, add, or change sections of the script to create their own ideal test. After some time, a number of such scripts may be created, allowing researchers to choose an analysis procedure that most closely matches their desires. Alternatively, researchers can post feature requests or errors that can be incorporated in future versions of this script.

It is important that researchers attempt to draw the best possible statistical inferences from their data. As a science, we need to seriously consider the most efficient way to accomplish this. Time is scarce, and scientists need to master many skills in addition to statistics. I believe that some of the problems in adopting new statistical procedures discussed by Sharpe (2013) such as lack of time, lack of awareness, lack of education, and lack of easy to use software can be overcome by scripts that combine traditional and more novel statistics, are easy to use, and provide a brief explanation of what is calculated while linking to the relevant literature. This approach might be a small step towards a better understanding of statistics for individual researchers, but a large step towards better reporting practices.

References

Baguley, T. (2012). Calculating and graphing within-subject confidence intervals for ANOVA. Behavior research methods, 44, 158-175.

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge.

Fox, J. & Weisberg, S. (2011). An R Companion to Applied Regression, Second edition. Sage, Thousand Oaks CA.

Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford: Oxford University Press, Clarendon Press.

Kelley, K. (2007). Confidence intervals for standardized effect sizes: Theory, application, and implementation. Journal of Statistical Software, 20, 1-24.

Kirby, K. N., & Gerlanc, D. (2013). BootES: An R package for bootstrap confidence intervals on effect sizes. Behavior Research Methods, 45, 905-927.

Kruschke, J. K., & Meredith, M. (2014). BEST: Bayesian Estimation Supersedes the t-test. R package version 0.2.2, URL: http://CRAN.R-project.org/package=BEST.

Lakens, D. (2015). The perfect t-test (version 0.1.0). Retrieved from https://github.com/Lakens/perfect-t-test. doi:10.5281/zenodo.17603

Loy, A., & Hofmann, H. (2014). HLMdiag: A Suite of Diagnostics for Hierarchical Linear Models. R. Journal of Statistical Software, 56, pp. 1-28. URL: http://www.jstatsoft.org/v56/i05/.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365.

Micheaux, P., & Tran, V. (2012). PoweR. URL: http://www.biostatisticien.eu/PoweR/.

Morey R and Rouder J (2015). BayesFactor: Computation of Bayes Factors for Common Designs. R package version 0.9.11-1, URL: http://CRAN.R-project.org/package=BayesFactor

Sharpe, D. (2013). Why the resistance to statistical innovations? Bridging the communication gap. Psychological Methods, 18, 572-582.

Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer New York. ISBN 978-0-387-98140-6, URL: http://had.co.nz/ggplot2/book.

Wilcox, R. R., Granger, D. A., Clark, F. (2013). Modern robust statistical methods: Basics with illustrations using psychobiological data. Universal Journal of Psychology, 1, 21-31.

Wilcox, R. R., & Schönbrodt, F. D. (2015). The WRS package for robust statistics in R (version 0.27.5). URL: https://github.com/nicebread/WRS.

Yap, B. W., & Sim, C. H. (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81, 2141-2155.

To leave a comment for the author, please follow the link and comment on his blog: Daniel Lakens.

↧

Interactive charts in R

May 19, 2015, 1:37 am

≫ Next: Teaching R course? Use analogsea to run your customized RStudio in Digital Ocean!

≪ Previous: The perfect t-test

(This article was first published on Benomics » R, and kindly contributed to R-bloggers)

I’m giving a talk tomorrow at the Edinburgh R usergroup (EdinbR) on how to get started building interactive charts in R. I’ll talk about rCharts as a great general entry point to quickly generating interactive charts, and also the newer htmlwidgets movement, allowing interactive charts to be more easily integrated with RMarkdown and Shiny. I also tried to throw in a decent amount of Edinburgh-related examples along the way.

Current slides are here:

Click through for HTML slide deck.

I’ve since spun out what started as a simple example for the talk into a live web app, viewable at blackspot.org.uk. Here I’m looking at Edinburgh Open Data from the county council of vehicle collisions in the city. It’s still under development and will be my first real project in Shiny, but already has started to come together quite nicely.

Blackspot Shiny web app. Code available on github. NB. The UI currently retains a lot of code borrowed from Joe Cheng’s beautiful SuperZip shiny example.

The other speaker for the session is Alastair Kerr (head of bioinformatics at the Wellcome Trust Centre for Cell Biology here in Edinburgh), and he’ll be giving a beginner’s guide to the Shiny web framework. All in all it should be a great meeting, if you’re nearby do come along!

To leave a comment for the author, please follow the link and comment on his blog: Benomics » R.

↧

Teaching R course? Use analogsea to run your customized RStudio in Digital Ocean!

May 20, 2015, 2:54 pm

≫ Next: New Version of RStudio (v0.99) Available Now

≪ Previous: Interactive charts in R

(This article was first published on Apply R, and kindly contributed to R-bloggers)

Two years ago I taught an introductory R/Shiny course here at The Jackson Lab. We all learnt a lot. Unfortunately not about Shiny itself, but rather about incompatibilities between its versions and trouble with its installation to some machines.

And it is not only my experience. If you look into forums of Rafael Irizarry MOOC courses, so many questions are just about installation / incompatibilities of R packages. The solution exists for a long time: run your R in a cloud. However, customization of virtual machines (like Amazon EC2) used to be a nontrivial task.

In this post I will show how a few lines of R code can start a customized RStudio docklet in a cloud and email login credentials to course participants. So, the participant do not need to install R and the required packages. Moreover, it is guaranteed they all run exactly the same software. All they need is a decent web browser to access RStudio server.

RStudio server login

Running RStudio in Digital Ocean with R/analogsea

So how complicated is it today to start your RStudio on clouds? It is (almost) a one-liner:

If you do not have Digital Ocean account, get one. You should receive a promotional credit $10 (= 1 regular machine running without interruption for 1 month):
https://www.digitalocean.com/
(full disclosure: if you create your account using the link above I might get an extra credit)
Install analogsea package from Github. Make sure to create Digital Ocean personal access token and in R set DO_PAT environment variable. Also create your personal SSH key and upload it to Digital Ocean.
And now it is really easy:

library(analogsea)
# Sys.setenv(DO_PAT = "*****") set access token

# start your machine in Digital Ocean
d <- docklet_create(size = getOption("do_size", "512mb"))
# run RStudio on machine 'd' (rocker/rstudio docker image)
d %>% docklet_rstudio()

The last line should open your browser with RStudio login page (user "rstudio", password "rstudio"). If not, use summary(d) to get the IP address of your machine and go to http://your_machine:8787.

It will cost you ~$0.01 per hour ($5 per month, May 2015). When you are done, do not forget to stop your Digital Ocean machine (droplet_delete(d)). At the end, make sure that you successfully killed all your machines - either log in to Digital Ocean or by calling droplets() in R.

Customized RStudio images

What if the default RStudio image is not good enough for you because you insist that your package needs to be pre-installed. For example, your package has many dependencies, like DOQTL, that needs long time to be downloaded (org.Hs.eg.db, org.Mm.eg.db, ...).

You can still use analogsea to run your Digital Ocean machines but in advance you need to prepare your own customized docker image. First create an account on Docker.com and get yourself introduce to Dockerfile syntax. Then link your Docker account to your Github as described here.

I has been afraid of that because my knowledge of docker is somehow limited. It was actually far easier than I expected: See a dockerfile for RStudio with DOQTL pre-installed.

Also, see Dockerfile of rocker/hadleyverse image with Hadley Wickham's packages preinstalled to get more inspiration.

Start virtual machine, pull and run customized RStudio image, email credintials

Finally, suppose you created your customized docker image (like simecek/doqtl-docker). For each participant of your course, you want to start a virtual machine, pull this image, run it and email IP (and credentials) to the participant.

The code below is doing just that. There are several ways to send emails in R and this program utilizes sendmailR package. I split the code into several for-loops, so if something goes wrong there is a better chance to catch it.

To leave a comment for the author, please follow the link and comment on his blog: Apply R.

↧

New Version of RStudio (v0.99) Available Now

May 26, 2015, 7:37 am

≫ Next: Situational Baseball: Analyzing Runs Potential Statistics

≪ Previous: Teaching R course? Use analogsea to run your customized RStudio in Digital Ocean!

(This article was first published on RStudio Blog, and kindly contributed to R-bloggers)

We’re pleased to announce that the final version of RStudio v0.99 is available for download now. Highlights of the release include:

A new data viewer with support for large datasets, filtering, searching, and sorting.
Complete overhaul of R code completion with many new features and capabilities.
The source editor now provides code diagnostics (errors, warnings, etc.) as you work.
User customizable code snippets for automating common editing tasks.
Tools for Rcpp: completion, diagnostics, code navigation, find usages, and automatic indentation.
Many additional source editor improvements including multiple cursors, tab re-ordering, and several new themes.
An enhanced Vim mode with visual block selection, macros, marks, and subset of : commands.

There are also lots of smaller improvements and bug fixes across the product. Check out the v0.99 release notes for details on all of the changes.

Data Viewer

We’ve completely overhauled the data viewer with many new capabilities including live update, sorting and filtering, full text searching, and no row limit on viewed datasets.

data-viewer

See the data viewer documentation for more details.

Code Completion

Previously RStudio only completed variables that already existed in the global environment. Now completion is done based on source code analysis so is provided even for objects that haven’t been fully evaluated:

Completions are also provided for a wide variety of specialized contexts including dimension names in [ and [[:

completion-bracket

Code Diagnostics

We’ve added a new inline code diagnostics feature that highlights various issues in your R code as you edit.

For example, here we’re getting a diagnostic that notes that there is an extra parentheses:

Screen Shot 2015-04-08 at 12.04.14 PM

Here the diagnostic indicates that we’ve forgotten a comma within a shiny UI definition:

diagnostics-comma

A wide variety of diagnostics are supported, including optional diagnostics for code style issues (e.g. the inclusion of unnecessary whitespace). Diagnostics are also available for several other languages including C/C++, JavaScript, HTML, and CSS. See the code diagnostics documentation for additional details.

Code Snippets

Code snippets are text macros that are used for quickly inserting common snippets of code. For example, the fun snippet inserts an R function definition:

Insert Snippet

If you select the snippet from the completion list it will be inserted along with several text placeholders which you can fill in by typing and then pressing Tab to advance to the next placeholder:

Screen Shot 2015-04-07 at 10.44.39 AM

Other useful snippets include:

lib, req, and source for the library, require, and source functions
df and mat for defining data frames and matrices
if, el, and ei for conditional expressions
apply, lapply, sapply, etc. for the apply family of functions
sc, sm, and sg for defining S4 classes/methods.

See the code snippets documentation for additional details.

Try it Out

RStudio v0.99 is available for download now. We hope you enjoy the new release and as always please let us know how it’s working and what else we can do to make the product better.

To leave a comment for the author, please follow the link and comment on his blog: RStudio Blog.

↧

Situational Baseball: Analyzing Runs Potential Statistics

May 26, 2015, 8:30 am

≫ Next: Live Earthquake Map with Shiny and Google Map API

≪ Previous: New Version of RStudio (v0.99) Available Now

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

By Mark Malter

A few weeks ago, I wrote about my Baseball Stats R shiny application, where I demonstrated how to calculate runs expectancies based on the 24 possible bases/outs states for any plate appearance. In this article, I’ll explain how I expanded on that to calculate the probability of winning the game, based on the current score/inning/bases/outs state. While this is done on other websites, I have added some unique play attempt features -- steal attempt, sacrifice bunt attempt, and tag from third attempt -- to show the probability of winning with and without the attempt, as well as the expected win probability given a user determined success rate for the play. That way, a manager can not only know the expected runs based on a particular decision, but the actual probability of winning the game if the play is attempted.

After the user enters the score, inning, bases state, and outs, the code runs through a large number of simulated games using the expected runs to be scored over the remainder of the current half inning, as well as each succeeding half inning for the remainder of the game.

When there are runners on base and the user clicks on any of the ‘play attempt’ tabs, a new table is generated showing the new probabilities. I allow for sacrifice bunts with less than two outs and a runner on first, second, or first and second. The stolen base tab can be used with any number of outs and the possibility of stealing second, third, or both. The tag from third tab will work as long as there is a runner on third and less than two outs prior to the catch.

I first got this idea after watching game seven of the 2014 World Series. Trailing 3-2 with two outs and nobody on base, Alex Gordon singled to center off of Madison Bumgarner and made it all the way to third base after a two base error. As Gordon was approaching third base, the coach gave him the stop sign, as shortstop Brandon Crawford was taking the relay throw in short left field. Had Gordon attempted to score on the play, he probably would have been out at the plate- game and series over. However, by holding up at third base, the Royals are also still ‘probably’ going to lose since runners only score from third base with two outs about 26% of the time.

The calculator shows that the probability of winning a game down by one run with a runner on third and two outs in the bottom of the ninth (or an extra inning) is roughly 17%. If we click on the ‘Tag from Third Attempt’ tab (Gordon attempting to score would have been equivalent to tagging from third after a catch for the second out), and play with the ‘Base running success rate’ slider, we see that the break even success rate is roughly 0.3. I don’t know the probability of Gordon beating the throw home, but if it was greater than 0.3 then making the attempt would have improved the Royals chances of winning the World Series. In fact, if the true success rate was as much as 0.5 then the Royals win probability would have jumped by 11% to 28%.

Here is the UI code. Here is the server code. And here is the Shiny App.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

↧

Live Earthquake Map with Shiny and Google Map API

May 27, 2015, 10:54 pm

≫ Next: Comrades Marathon Medal Predictions

≪ Previous: Situational Baseball: Analyzing Runs Potential Statistics

(This article was first published on R tutorial for Spatial Statistics, and kindly contributed to R-bloggers)

In the post Exchange data between R and the Google Maps API using Shiny I presented a very simple way to allow communication between R and javascript using shiny.

This is an example of a practical approach for which that same system can be used to create a useful tool to visualize seismic events collected from USGS in the Google Maps API using R to do some basic data preparation. The procedure to complete this experiment is pretty much identical to what I presented in the post mentioned, so I will not bother you will additional details.

The final map looks like this:

and it is accessible from this site: Earthquake

The colours of the markers depends on magnitude and it is set in R. Below 2 the marker is green, between 2 and 4 is yellow, between 4 and 6 is orange and above 6 is red.
I also set R to export other information about the event to the json file that I then use to populate the infowindow of each marker.

The code for creating this map consists of two pieces, an index.html file (which needs to go in a folder names www) and the file server.r, available below:

Server.r

 # server.R  
 #Title: Earthquake Visualization in Shiny  
 #Copyright: Fabio Veronesi  
   
 library(sp)  
 library(rjson)  
 library(RJSONIO)  
   
    
 shinyServer(function(input, output) {  
   
 output$json <- reactive ({  
 if(length(input$Earth)>0){  
 if(input$Earth==1){  
 hour <- read.table("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv", sep = ",", header = T)  
 if(nrow(hour)>0){  
 lis <- list()  
 for(i in 1:nrow(hour)){  
   
 if(hour$mag[i]<=2){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_green.png"}  
 else if(hour$mag[i]>2&hour$mag[i]<=4){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_yellow.png"}  
 else if(hour$mag[i]>4&hour$mag[i]<=6){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_orange.png"}  
 else {icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_red.png"}  
   
 Date.hour <- substring(hour$time[i],1,10)  
 Time.hour <- substring(hour$time[i],12,23)  
   
 lis[[i]] <- list(i,hour$longitude[i],hour$latitude[i],icon,hour$place[i],hour$depth[i],hour$mag[i],Date.hour,Time.hour)  
 }  
   
   
 #This code creates the variable test directly in javascript for export the grid in the Google Maps API  
 #I have taken this part from:http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny  
    paste('<script>test=',         
       RJSONIO::toJSON(lis),        
       ';setAllMap();Cities_Markers();',        
       '</script>')  
 }  
 }  
   
 else if(input$Earth==4){  
 month <- read.table("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv", sep = ",", header = T)  
 if(nrow(month)>0){  
 lis <- list()  
 for(i in 1:nrow(month)){  
   
 if(month$mag[i]<=2){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_green.png"}  
 else if(month$mag[i]>2&month$mag[i]<=4){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_yellow.png"}  
 else if(month$mag[i]>4&month$mag[i]<=6){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_orange.png"}  
 else {icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_red.png"}  
   
 Date.month <- substring(month$time[i],1,10)  
 Time.month <- substring(month$time[i],12,23)  
   
 lis[[i]] <- list(i,month$longitude[i],month$latitude[i],icon,month$place[i],month$depth[i],month$mag[i],Date.month,Time.month)  
 }  
   
   
 #This code creates the variable test directly in javascript for export the grid in the Google Maps API  
 #I have taken this part from:http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny  
    paste('<script>test=',         
       RJSONIO::toJSON(lis),        
       ';setAllMap();Cities_Markers();',        
       '</script>')  
 }  
 }  
   
   
 else if(input$Earth==3){  
 week <- read.table("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv", sep = ",", header = T)  
 if(nrow(week)>0){  
 lis <- list()  
 for(i in 1:nrow(week)){  
   
 if(week$mag[i]<=2){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_green.png"}  
 else if(week$mag[i]>2&week$mag[i]<=4){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_yellow.png"}  
 else if(week$mag[i]>4&week$mag[i]<=6){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_orange.png"}  
 else {icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_red.png"}  
   
 Date.week <- substring(week$time[i],1,10)  
 Time.week <- substring(week$time[i],12,23)  
   
 lis[[i]] <- list(i,week$longitude[i],week$latitude[i],icon,week$place[i],week$depth[i],week$mag[i],Date.week,Time.week)  
 }  
   
   
 #This code creates the variable test directly in javascript for export the grid in the Google Maps API  
 #I have taken this part from:http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny  
    paste('<script>test=',         
       RJSONIO::toJSON(lis),        
       ';setAllMap();Cities_Markers();',        
       '</script>')  
 }  
 }  
   
   
   
   
 else {  
 day <- read.table("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv", sep = ",", header = T)  
 if(nrow(day)>0){  
 lis <- list()  
 for(i in 1:nrow(day)){  
   
 if(day$mag[i]<=2){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_green.png"}  
 else if(day$mag[i]>2&day$mag[i]<=4){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_yellow.png"}  
 else if(day$mag[i]>4&day$mag[i]<=6){icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_orange.png"}  
 else {icon="http://maps.gstatic.com/mapfiles/ridefinder-images/mm_20_red.png"}  
   
 Date.day <- substring(day$time[i],1,10)  
 Time.day <- substring(day$time[i],12,23)  
   
 lis[[i]] <- list(i,day$longitude[i],day$latitude[i],icon,day$place[i],day$depth[i],day$mag[i],Date.day,Time.day)  
 }  
   
   
 #This code creates the variable test directly in javascript for export the grid in the Google Maps API  
 #I have taken this part from:http://stackoverflow.com/questions/26719334/passing-json-data-to-a-javascript-object-with-shiny  
    paste('<script>test=',         
       RJSONIO::toJSON(lis),        
       ';setAllMap();Cities_Markers();',        
       '</script>')  
 }  
 }  
 }  
 })  
 })

Index.html

 <!DOCTYPE html>  
 <html>  
 <head>  
 <title>Earthquake Visualization in Shiny</title>  
   
 <!--METADATA-->    
 <meta name="author" content="Fabio Veronesi">  
 <meta name="copyright" content="©Fabio Veronesi">  
 <meta http-equiv="Content-Language" content="en-gb">  
 <meta charset="utf-8"/>  
   
 <style type="text/css">  
   
 html { height: 100% }  
 body { height: 100%; margin: 0; padding: 0 }  
 map-canvas { height: 100%; width:100% }  
   
 .btn {  
  background: #dde6d8;  
  background-image: -webkit-linear-gradient(top, #dde6d8, #859ead);  
  background-image: -moz-linear-gradient(top, #dde6d8, #859ead);  
  background-image: -ms-linear-gradient(top, #dde6d8, #859ead);  
  background-image: -o-linear-gradient(top, #dde6d8, #859ead);  
  background-image: linear-gradient(to bottom, #dde6d8, #859ead);  
  -webkit-border-radius: 7;  
  -moz-border-radius: 7;  
  border-radius: 7px;  
  font-family: Arial;  
  color: #000000;  
  font-size: 20px;  
  padding: 9px 20px 10px 20px;  
  text-decoration: none;  
 }  
   
 .btn:hover {  
  background: #f29f9f;  
  background-image: -webkit-linear-gradient(top, #f29f9f, #ab1111);  
  background-image: -moz-linear-gradient(top, #f29f9f, #ab1111);  
  background-image: -ms-linear-gradient(top, #f29f9f, #ab1111);  
  background-image: -o-linear-gradient(top, #f29f9f, #ab1111);  
  background-image: linear-gradient(to bottom, #f29f9f, #ab1111);  
  text-decoration: none;  
 }  
   
 </style>  
   
        
 <script type="text/javascript" src="http://google-maps-utility-library-v3.googlecode.com/svn/tags/markerclusterer/1.0/src/markerclusterer.js"></script>  
   
 <script src="https://maps.googleapis.com/maps/api/js?v=3.exp&signed_in=true&libraries=drawing"></script>  
   
 <script type="application/shiny-singletons"></script>  
 <script type="application/html-dependencies">json2[2014.02.04];jquery[1.11.0];shiny[0.11.1];ionrangeslider[2.0.2];bootstrap[3.3.1]</script>  
 <script src="shared/json2-min.js"></script>  
 <script src="shared/jquery.min.js"></script>  
 <link href="shared/shiny.css" rel="stylesheet" />  
 <script src="shared/shiny.min.js"></script>  
 <link href="shared/ionrangeslider/css/normalize.css" rel="stylesheet" />  
 <link href="shared/ionrangeslider/css/ion.rangeSlider.css" rel="stylesheet" />  
 <link href="shared/ionrangeslider/css/ion.rangeSlider.skinShiny.css" rel="stylesheet" />  
 <script src="shared/ionrangeslider/js/ion.rangeSlider.min.js"></script>  
 <link href="shared/bootstrap/css/bootstrap.min.css" rel="stylesheet" />  
 <script src="shared/bootstrap/js/bootstrap.min.js"></script>  
 <script src="shared/bootstrap/shim/html5shiv.min.js"></script>  
 <script src="shared/bootstrap/shim/respond.min.js"></script>  
   
        
      <script type="text/javascript">  
      var map = null;  
      var Gmarkers = [];  
        
   
      function Cities_Markers() {  
             
           var infowindow = new google.maps.InfoWindow({ maxWidth: 500,maxHeight:500 });  
   
           //Loop to add markers to the map based on the JSON exported from R, which is within the variable test  
           for (var i = 0; i < test.length; i++) {   
                var lat = test[i][2]  
                var lng = test[i][1]  
                var marker = new google.maps.Marker({  
                     position: new google.maps.LatLng(lat, lng),  
                     title: 'test',  
                     map: map,  
                     icon:test[i][3]  
                });  
             
           //This sets up the infowindow  
           google.maps.event.addListener(marker, 'click', (function(marker, i) {  
                return function() {  
                     infowindow.setContent('<div id="content"><p><b>Location</b> = '+  
                     test[i][4]+'<p>'+  
                     '<b>Depth</b> = '+test[i][5]+'Km <p>'+  
                     '<b>Magnitude</b> = '+test[i][6]+ '<p>'+  
                     '<b>Date</b> = '+test[i][7]+'<p>'+  
                     '<b>Time</b> = '+test[i][8]+'</div>');  
                     infowindow.open(map, marker);  
                }  
                })(marker, i));  
           Gmarkers.push(marker);  
           };  
             
   
      };  
   
      //Function to remove all the markers from the map  
      function setAllMap() {  
           for (var i = 0; i < Gmarkers.length; i++) {  
                Gmarkers[i].setMap(null);  
           }  
 }  
        
      //Initialize the map  
      function initialize() {  
           var mapOptions = {  
           center: new google.maps.LatLng(31.6, 0),  
           zoom: 3,  
           mapTypeId: google.maps.MapTypeId.TERRAIN  
      };  
   
      map = new google.maps.Map(document.getElementById('map-canvas'),mapOptions);  
        
    
 }  
   
   
 google.maps.event.addDomListener(window, 'load', initialize);  
   </script>  
        
   
 </head>  
   
   
 <body>  
   
  <div id="json" class="shiny-html-output"></div>  
   
    
  <button type="button" class="btn" id="hour" onClick="Shiny.onInputChange('Earth', 1)" style="position:absolute;top:1%;left:1%;width:100px;z-index:999">Last Hour</button>  
  <button type="button" class="btn" id="day" onClick="Shiny.onInputChange('Earth', 2)" style="position:absolute;top:1%;left:10%;width:100px;z-index:999">Last Day</button>  
  <button type="button" class="btn" id="week" onClick="Shiny.onInputChange('Earth', 3)" style="position:absolute;top:1%;left:20%;width:100px;z-index:999">Last Week</button>  
  <button type="button" class="btn" id="month" onClick="Shiny.onInputChange('Earth', 4)" style="position:absolute;top:1%;left:30%;width:100px;z-index:999">Last Month</button>  
    
  <div id="map-canvas" style="top:0%;right:0%;width:100%;height:100%;z-index:1"></div>  
    
    
    
 </body>  
 </html>

Created with CodeFormatter

To leave a comment for the author, please follow the link and comment on his blog: R tutorial for Spatial Statistics.

↧

Comrades Marathon Medal Predictions

May 28, 2015, 12:52 am

≫ Next: New shinyjs version: Useful tools for any Shiny app developer + easily call JavaScript functions as R code

≪ Previous: Live Earthquake Map with Shiny and Google Map API

(This article was first published on Exegetic Analytics » R, and kindly contributed to R-bloggers)

With only a few days to go until race day, most Comrades Marathon athletes will focusing on resting, getting enough sleep, hydrating, eating and giving a wide berth to anybody who looks even remotely ill.

They will probably also be thinking a lot about Sunday's race. What will the weather be like? Will it be cold at the start? (Unlikely since it's been so warm in Durban.) How will they feel on the day? Will they manage to find their seconds along the route?

For the more performance oriented among them (and, let's face it, that's most runners!), there will also be thoughts of what time they will do on the day and what medal they'll walk away with. I've considered ways for projecting finish times in a previous article. Today I'm going to focus on a somewhat simpler goal: making a Comrades Marathon medal prediction.

In the process I have put together a small application which will make medal predictions based on recent race times.

I'm not going to delve too deeply into the details, but if you really don't have the patience, feel free to skip forward to the results or click on the image above, which will take you to the application. If you have trouble accessing the application it's probable that you are sitting behind a firewall that is blocking it. Try again from home.

Raw Data

The data for this analysis were compiled from a variety of sources. I scraped the medal results off the Comrades Marathon Results Archive. Times for other distances were cobbled together from Two Oceans Marathon Results, RaceTec Results and the home pages of some well organised running clubs.

The distribution of the data is broken down below as a function of gender, Comrades Marathon medal and other distances for which I have data. For instance, I have data for 45 female runners who got a Bronze medal and for whom a 32 km race time was available.

Unfortunately the data are pretty sparse for Gold, Wally Hayward and Silver medalists, especially for females. I'll be collecting more data over the coming months and the coverage in these areas should improve. Athletes that are contenders for these medals should have a pretty good idea of what their likely prospects are anyway, so the model is not likely to be awfully interesting for them. This model is intended more for runners who are aiming at a Bill Rowan, Bronze or Vic Clapham medal.

Decision Tree

The first step in the modelling process was to build a decision tree. Primarily this was to check whether it was feasible to predict a medal class based on race times for other distances (I'm happy to say that it was!). The secondary motivation was to assess what the most important variables were. The resulting tree is plotted below. Open this plot in a new window so that you can zoom in on the details. As far as the labels on the tree are concerned, "min" stands for "minimum" time over the corresponding distance and times (labels on the branches) are given in decimal hours.

The first thing to observe is that the most important predictor is 56 km race time. This dominates the first few levels in the tree hierachy. Of slightly lesser importance is 42.2 km race time, followed by 25 km race time. It's interesting to note that 32 km and 10 km results does no feature at all in the tree, probably due to the relative scarcity of results over these distances in the data.

Some specific observations from the tree are:

Male runners who can do 56 km in less than 03:30 have around 20% chance of getting a Gold medal.
Female runners who can do 56 km in less than 04:06 have about 80% chance of getting a Gold medal.
Runners who can do 42.2 km in less than about 02:50 are very likely to get a Silver medal.
Somewhat more specifically, runners who do 56 km in less than 05:53 and 42.2 km in more than 04:49 are probably in line for a Vic Clapham.

Note that the first three observations above should be taken with a pinch of salt since, due to a lack of data, the model is not well trained for Gold, Wally Hayward and Silver medals.

You'd readily be forgiven for thinking that this decision tree is an awfully complex piece of apparatus for calculating something as simple as the colour of your medal.

Well, yes, it is. And I am going to make it simpler for you. But before I make it simpler, I am going to make it slightly more complicated.

A Forest of Decision Trees

Instead of just using a single decision tree, I built a Random Forest consisting of numerous trees, each of which was trained on a subset of the data. Unfortunately the resulting model is not as easy to visualise as a single decision tree, but the results are far more robust.

Medal Prediction Application

To make this a little more accessible I bundled the model up in a Shiny application which I deployed here on Amazon Web Services. Give it a try. You'll need to enter the times you achieved over one or more race distances during the last few months. Note that these are race times, not training run times. The latter are not good predictors for your Comrades medal.

Let's have a quick look at some sample predictions. Suppose that you are a male athlete who has recent times of 00:45, 01:45, 04:00 and 05:00 for 10, 21.1, 42.2 and 56 km races respectively, then according to the model you have a 77% probability of getting a Bronze medal and around 11% chance of getting either a Bill Rowan or Vic Clapham medal. There's a small chance (less than 1%) that you might be in the running for a Silver medal.

What about a male runner who recently ran 03:20 for 56 km? There is around 20% chance that he would get a Gold medal. Failing that he would most likely (60% chance) get a Silver.

If you happen to have race results for the last few years that I could incorporate into the model, please get in touch. I'm keen to collaborate on improving this tool.

The post Comrades Marathon Medal Predictions appeared first on Exegetic Analytics.

To leave a comment for the author, please follow the link and comment on his blog: Exegetic Analytics » R.

↧

New shinyjs version: Useful tools for any Shiny app developer + easily call JavaScript functions as R code

May 31, 2015, 6:50 pm

≫ Next: Hosting Shiny on Amazon EC2

≪ Previous: Comrades Marathon Medal Predictions

(This article was first published on Dean Attali's R Blog, and kindly contributed to R-bloggers)

About a month ago I made an announcement about the initial release of shinyjs. After some feedback, a few feature requests, and numerous hours of work, I’m excited to say that a new version of shinyjs v0.0.6.2 was made available on CRAN this week. The package’s main objective is to make shiny app development better and easier by allowing you to perform many useful functions with simple R code that would normally require JavaScript coding. Some of the features include hiding/showing elements, enabling/disabling inputs, resetting an input to its original value, and many others.

Availability

shinyjs is available through both CRAN (install.packages("shinyjs")) and GitHub (devtools::install_github("daattali/shinyjs")). Use the GitHub version to get the latest version with the newest features.

Quick overview of new features

This post will only discuss new features in shinyjs. You can find out more about the package in the initial post or in the package README on GitHub. Remember that in order to use any function, you need to add a call to useShinyjs() in the shiny app’s UI.

Two major new features:

reset function allows inputs to be reset to their original value
extendShinyjs allows you to add your own JavaScript functions and easily call them from R as regular R code

Two major improvements:

Enabling and disabling of input widgets now works on all types of shiny inputs (many people asked how to disable a slider/select input/date range/etc, and shinyjs now handles all of them)
The toggle functions gained an additional condition argument, which can be used to show/hide or enable/disable an element based on a condition. For example, instead of writing code such as if (test) enable(id) else disable(id), you can simply write toggleState(id, test)

Three new features available on the GitHub version but not yet on CRAN:

hidden (used to initialize a shiny tag as hidden) can now accept any number of tags or a tagList rather than just a single tag
hide/show/toggle can be run on any JQuery selector, not only on a single ID, so that you can hide multiple elements simultaneously
hide/show/toggle have a new arugment delay which can be used to perform the action later rather than immediately. This can be useful if you want to show a message and have it disappear after a few seconds

Two major new features

There were two major features that I wanted to include in the CRAN release.

`reset` - allows inputs to be reset to their original value

Being able to reset the value of an input has been a frequently asked question on StackOverflow and the shiny Google Group, but a general solution was never available. Now with shinyjs it’s possible and very easy: if you have an input with id name and you want to reset it to its original value, simply call reset("name"). It doesn’t matter what type of input it is - reset works with all shiny inputs.

The reset function only takes one arugment, an HTML id, and resets all inputs inside of that element. This makes reset very flexible because you can either give it a single input widget to reset, or a form that contains many inputs and reset them all. Note that reset can only work on inputs that are generated from the app’s ui and it will not work for inputs generated dynamically using uiOutput/renderUI.

Here is a simple demo of reset in action

`extendShinyjs` - allows you to easily call your own JavaScript functions from R

The main idea behind shinyjs when I started working on it was to make it extremely easy to call JavaScript functions that I used commonly from R. Now whenever I want to add a new function to shinyjs (such as the reset function), all I have to do is write the JavaScript function, and the integration between shiny and JavaScript happens seamlessly thanks to shinyjs. My main goal after the initial release was to also allow anyone else to use the same smooth R –> JS workflow, so that anyone can add a JavaScript function and call it from R easily. With the extendShinyjs function, that is now possible.

Very simple example

Using extendShinyjs is very simple and makes defining and calling JavaScript functions painless. Here is a very basic example of using extendShinyjs to define a (fairly useless) function that changes the colour of the page.

library(shiny)
library(shinyjs)

jsCode <- "shinyjs.pageCol = function(params){$('body').css('background', params);}"

runApp(shinyApp(
  ui = fluidPage(
    useShinyjs(),
    extendShinyjs(text = jsCode),
    selectInput("col", "Colour:",
                c("white", "yellow", "red", "blue", "purple"))
  ),
  server = function(input,output,session) {
    observeEvent(input$col, {
      js$pageCol(input$col)
    })
  }
))

Running the code above produces this shiny app:

See how easy that was? All I had to do was make the JavaScript function shinyjs.pageCol, pass the JavaScript code as an argument to extendShinyjs, and then I can call js$pageCol(). That’s the basic idea: any JavaScript function named shinyjs.foo will be available to call as js$foo(). You can either pass the JS code as a string to the text argument, or place the JS code in a separate JavaScript file and use the script argument to specify where the code can be found. Using a separate file is generally prefered over writing the code inline, but in these examples I will use the text argument to keep it simple.

Passing arguments from R to JavaScript

Any shinyjs function that is called will pass a single array-like parameter to its corresponding JavaScript function. If the function in R was called with unnamed arguments, then it will pass an Array of the arguments; if the R arguments are named then it will pass an Object with key-value pairs. For example, calling js$foo("bar", 5) in R will call shinyjs.foo(["bar", 5]) in JS, while calling js$foo(num = 5, id = "bar") in R will call shinyjs.foo({num : 5, id : "bar"}) in JS. This means the shinyjs.foo function needs to be able to deal with both types of parameters.

To assist in normalizing the parameters, shinyjs provides a shinyjs.getParams() function which serves two purposes. First of all, it ensures that all arguments are named (even if the R function was called without names). Secondly, it allows you to define default values for arguments. Here is an example of a JS function that changes the background colour of an element and uses shinyjs.getParams().

shinyjs.backgroundCol = function(params) {
  var defaultParams = {
    id : null,
    col : "red"
  };
  params = shinyjs.getParams(params, defaultParams);

  var el = $("#" + params.id);
  el.css("background-color", params.col);
}

Note the defaultParams that we defined and the call to shinyjs.getParams. It ensures that calling js$backgroundCol("test", "blue") and js$backgroundCol(id = "test", col = "blue") and js$backgroundCol(col = "blue", id = "test") are all equivalent, and that if the colour parameter is not provided then “red” will be the default. All the functions provided in shinyjs make use of shinyjs.getParams, and it is highly recommended to always use it in your functions as well. Notice that the order of the arguments in defaultParams in the JavaScript function matches the order of the arguments when calling the function in R with unnamed arguments. This means that js$backgroundCol("blue", "test") will not work because the arguments are unnamed and the JS function expects the id to come before the colour.

For completeness, here is the code for a shiny app that uses the above function (it’s not a very practical example, but it’s great for showing how to use extendShinyjs with parameters):

library(shiny)
library(shinyjs)

jsCode <- '
shinyjs.backgroundCol = function(params) {
  var defaultParams = {
    id : null,
    col : "red"
  };
  params = shinyjs.getParams(params, defaultParams);

  var el = $("#" + params.id);
  el.css("background-color", params.col);
}'

runApp(shinyApp(
  ui = fluidPage(
    useShinyjs(),
    extendShinyjs(text = jsCode),
    p(id = "name", "My name is Dean"),
    p(id = "sport", "I like soccer"),
    selectInput("col", "Colour:",
                c("white", "yellow", "red", "blue", "purple")),    
    textInput("selector", "Element", ""),
    actionButton("btn", "Go")
  ),
  server = function(input,output,session) {
    observeEvent(input$btn, {
      js$backgroundCol(input$selector, input$col)
    })
  }
))

And the resulting app:

Note that I chose to define the JS code as a string for illustration purposes, but in reality I would prefer to place the code in a separate file and use the script argument instead of text.

Two major improvements

Among the many small improvements made, there are two that will be the most useful.

Enabling/disabling works on all inputs

The initial release of shinyjs had a disable/enable function which worked on the major input types that I was commonly using, but not all. Several people noticed that various inputs could not be disabled, so I made sure to fix all of them in the next version. The reasons behind why not all inputs were easy to disable are very technical so I won’t go into them. Now calling disable(id) or enable(id) will work on any type of shiny input.

Use a `condition` in `toggle` functions

I’ve noticed that some users of shinyjs had to often write code such as if (test) enable(id) else disable(id). This seemed inefficient and verbose, especially since there was already a toggleState function that enables disabled elements and vice versa. The toggleState function now has a new condition parameter to address exactly this problem. The code above can now be rewritten as toggleState(id, test).

Similarly, code that previously used if (test) show(id) else hide(id) can now use toggle(id = id, condition = test), and code that was doing a similar thing with addClass/removeClass can use the toggleClass(id, class, condition) function.

Three new features available on the GitHub version but not yet on CRAN

Since submitting shinyjs to CRAN, there were a few more features added. They will go into the next CRAN submission in about a month, but for now they can be used if you download the GitHub version.

`hidden` now accepts multiple tags

The hidden function is the only shinyjs function that’s used in the UI rather than in the server. It’s used to initialize a tag as hidden, and you can later reveal it using show(tag). The initial release only allows single tags to be given to hidden, but now it can accept any number of tags or a tagList. For example, you can now add a button and some text to the UI and have them both hidden with this code: hidden(actionButton("btn", "Button"), p(id = "text", "text")). You can then call show("btn") or show("text") to unhide them.

Visibility functions can be run on any selector

Previously, the only way to tell the hide, show, and toggle functions what element to act on was to give them an ID. That becomes very limiting when you want to hide or show elements in batch, or even if you just want to show/hide an element without an ID. The visibility functions now have a new optional parameter selector that accepts any CSS-style selector. For example, to hide all hyperlinks on a page that have the class “hide-me-later” you can now call hide(selector = "a.hide-me-later"). This makes the visibility functions much more powerful.

Visibility functions can be delayed

In a shiny app that I’m currently developing for my graduate work there are many different “Update” buttons that the user can click on. After an update is successful, I wanted to show a “Done!” message that would disappear after a few seconds. Using shinyjs I was already able to show the message when I wanted to, but I needed an easy way to make it disappear later. So I added the delay parameter to show/hide/toggle, that tells the function to only act in x seconds instead of immediately. Now if I want to show a message and hide it after 5 seconds, I can call show("doneMsg"); hide(id = "doneMsg", delay = 5). It’s not a big deal, but it can be handy.

Feedback + suggestions

If you have any feedback on shinyjs, I’d love to hear about it! I really do hope that it’s as easy to use as possible and that many of you will find it useful. If you have any suggestions, please do open a GitHub issue or let me know in any other way.

To leave a comment for the author, please follow the link and comment on his blog: Dean Attali's R Blog.

↧

Hosting Shiny on Amazon EC2

May 30, 2015, 4:41 am

≫ Next: Some Impressions from R Finance 2015

≪ Previous: New shinyjs version: Useful tools for any Shiny app developer + easily call JavaScript functions as R code

(This article was first published on Exegetic Analytics » R, and kindly contributed to R-bloggers)

I recently finished some work on a Shiny application which incorporated a Random Forest model. The model was stored in a RData file and loaded by server.R during initialisation. This worked fine when tested locally but when I tried to deploy the application on shinyapps.io I ran into a problem: evidently you can only upload server.R and ui.R files. Nothing else.

Bummer.

I looked around for alternatives and found that Amazon Elastic Compute Cloud (EC2) was very viable indeed. I just needed to get it suitably configured. A helpful article documented the process from an OSX perspective. This is the analogous Ubuntu view (which really only pertains to the last few steps of connecting via SSH and uploading your code).

Create an Account

The first step is to create an account at aws.amazon.com. After you've logged into your account you should see a console like the one below. Select the EC2 link under Compute.

Next, from the EC2 Dashboard select Launch Instance.

Step 1: There is an extensive range of machine images to choose from, but we will select the Ubuntu Server.

Step 2: Select the default option. Same applies for Step 3, Step 4 and Step 5.

Step 6: Choose the security settings shown below. SSH access should be restricted to your local machine alone. When you are done, select Review & Launch.

Step 7: Create a new key pair. Download the key and store it somewhere safe! Now press Launch Instances.

The launch status of your instance will then be confirmed.

At any later time the status of your running instance(s) can be inspected from the EC2 dashboard.

SSH Connection

Now in order to install R and Shiny we need to login to our instance via SSH. In the command below you would need to substitute the name of your key file and also the Public DNS of your instance as the host name (the latter is available from the EC2 Dashboard).

$ ssh -i AWS-key.pem ubuntu@ec2-52-24-93-52.us-west-2.compute.amazonaws.com

Installing R

Once you have the SSH connection up and running, execute the following on your remote instance:

sudo apt-get update
sudo apt-get install r-base
sudo apt-get install r-base-dev

Installing Shiny

To install the Shiny package, execute the following on your remote instance:

sudo su - -c "R -e "install.packages('shiny', repos = 'http://cran.rstudio.com/')""

During the installation a directory /srv/shiny-server/ will have been created, where your applications will be stored.

Installing and Testing your Applications

Transfer your applications across to the remote instance using sftp or scp. Then move them to a location under /srv/shiny-server/. You should now be ready to roll. You access the Shiny server on port 3838. So assuming for example, your application resides in a sub-folder called medal-predictions, then you would browse to http://ec2-52-24-93-52.us-west-2.compute.amazonaws.com:3838/medal-predictions/.

The post Hosting Shiny on Amazon EC2 appeared first on Exegetic Analytics.

To leave a comment for the author, please follow the link and comment on his blog: Exegetic Analytics » R.

↧

Some Impressions from R Finance 2015

June 4, 2015, 8:30 am

≫ Next: Why has R, despite quirks, been so successful?

≪ Previous: Hosting Shiny on Amazon EC2

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

The R/Finance 2015 Conference wrapped up last Saturday at UIC. It has been seven years already, but R/Finance still has the magic! - mostly very high quality presentations and the opportunity to interact and talk shop with some of the most accomplished R developers, financial modelers and even a few industry legends such as Emanuel Derman and Blair Hull.

Emanuel Derman led off with a provocative but extraordinary keynote talk. Derman began way out there, somewhere well beyond the left field wall recounting the struggle of Johannes Kepler to formulate his three laws of planetary motion and closed with some practical advice on how to go about the business of financial modeling. Along the way he shared some profound, original thinking in an attempt to provide a theoretical context for evaluating and understanding the limitations of financial models. His argument hinged on making and defending the distinction between theories and models. Theories such as physical theories of Kepler, Newton and Einstein are ontological: they attempt to say something about how the world is. A theory attempts to provide "absolute knowledge of the world". A model, on the other hand, "tells you about what some aspect of the world is like". Theories can be wrong, but they are not the kinds of things you can interrogate with "why" questions.

Models work through analogies and similarities. They compare something we understand to something we don't. Spinoza's Theory of emotions is a theory because it attempts to explain human emotions axiomatically from first principles.

The Black Scholes equation, by contrast, is a model that tries to provide insight through the analogy with Brownian motion. As I understood it, the practical advice from all of this is to avoid the twin traps of attempting to axiomatize financial models as if they directly captured reality, and of believing that analyzing data, no matter how many terabytes you plow through, is a substitute for an educated intuition about how the world is.

The following table lists the remaining talks in alphabetical order by speaker.

	Presentation	Package	Package Location
1	Rohit Arora: Inefficiency of Modified VaR and ES
2	Kyle Balkissoon: A Framework for Integrating Portfolio-level Backtesting with Price and Quantity Information	PortFolioAnalytics
3	Mark Bennett: Gaussian Mixture Models for Extreme Events
4	Oleg Bondarenko: High-Frequency Trading Invariants for Equity Index Futures
5	Matt Brigida: Markov Regime-Switching (and some State Space) Models in Energy Markets	code for regime switching	GitHub
6	John Burkett: Portfolio Optimization: Price Predictability, Utility Functions, Computational Methods, and Applications	DEoptim	CRAN
7	Matthew Clegg: The partialAR Package for Modeling Time Series with both Permanent and Transient Components	partialAR	CRAN
8	Yuanchu Dang: Credit Default Swaps with R (with Zijie Zhu)	CDS	GitHub
9	Gergely Daroczi: Network analysis of the Hungarian interbank lending market
10	Sanjiv Das: Efficient Rebalancing of Taxable Portfolios
11	Sanjiv Das: Matrix Metrics: Network-Based Systemic Risk Scoring
12	Emanuel Derman: Understanding the World
13	Matthew Dixon: Risk Decomposition for Fund Managers
14	Matt Dowle: Fast automatic indexing with data.table	data.table	CRAN
15	Dirk Eddelbuettel: Rblpapi: Connecting R to the data service that shall not be named	Rblpapi	GitHub
16	Markus Gesmann: Communicating risk - a perspective from an insurer
17	Vincenzo Giordano: Quantifying the Risk and Price Impact of Energy Policy Events on Natural Gas Markets Using R (with Soumya Kalra)
18	Chris Green: Detecting Multivariate Financial Data Outliers using Calibrated Robust Mahalanobis Distances	CerioliOutlierDetection	CRAN
19	Rohini Grover: The informational role of algorithmic traders in the option market
20	Marius Hofert: Parallel and other simulations in R made easy: An end-to-end study	simsalapar	CRAN
21	Nicholas James: Efficient Multivariate Analysis of Change Points	ecp	CRAN
22	Kresimir Kalafatic: Financial network analysis using SWIFT and R
23	Michael Kapler: Follow the Leader - the application of time-lag series analysis to discover leaders in S&P 500	SIT	other
24	Ilya Kipnis: Flexible Asset Allocation With Stepwise Correlation Rank
25	Rob Krzyzanowski: Building Better Credit Models through Deployable Analytics in R
26	Bryan Lewis: More thoughts on the SVD and Finance
27	Yujia Liu and Guy Yollin: Fundamental Factor Model DataBrowser using Tableau and R	factorAnalytics	RFORGE
28	Louis Marascio: An Outsider's Education in Quantitative Trading
29	Doug Martin: Nonparametric vs Parametric Shortfall: What are the Differences?
30	Alexander McNeil: R Tools for Understanding Credit Risk Modelling
31	William Nicholson: Structured Regularization for Large Vector Autoregression	BigVAR	GitHub
32	Steven Pav: Portfolio Cramer-Rao Bounds (why bad things happen to good quants)	SharpeR	CRAN
33	Jerzy Pawlowksi: Are High Frequency Traders Prudent and Temperate?	HighFreq	GitHub
34	Bernhard Pfaff: The sequel of cccp: Solving cone constrained convex programs	cccp	CRAN
35	Stephen Rush: Information Diffusion in Equity Markets
36	Mark Seligman: The Arborist: a High-Performance Random Forest Implementation	Rborist	CRAN
37	Majeed Simaan: Global Minimum Variance Portfolio: a Horse Race of Volatilities
38	Anthoney Tsou: Implementation of Quality Minus Junk	qmj	GitHub
39	Marjan Wauters: Characteristic-based equity portfolios: economic value and dynamic style allocation
40	Hadley Wickham: Data ingest in R	readr	CRAN
41	Eric Zivot: Price Discovery Share-An Order Invariant Measure of Price Discovery with Application to Exchange-Traded Funds

I particularly enjoyed Sanjiv Das' talks on Efficient Rebalancing of Taxable Portfolios and Matrix Metrics: Network Based Systemic Risk Scoring, both of which are approachable by non-specialists. Sanjiv became the first person to present two talks at an R/Finance conference, and thus the first person to win one of the best presentation prizes with the judges unwilling to say which of his two presentations secured the award.

Bryan Lewis' talk: More thoughts on the SVD and Finance was also notable for its exposition. Listening to Bryan you can almost fool yourself into believing that you could develop a love for numerical analysis and willingly spend an inordinate amount of your time contemplating the stark elegance of matrix decompositions.

Alexander McNeil's talk: R Tools for Understanding Credit Risk Modeling was a concise and exceptionally coherent tutorial on the subject, an unusual format for a keynote talk, but something that I think will be valued by students when the slides for all of the presentations become available.

Going out on a limb a bit, I offer a few un-researched, but strong impressions of the conference. This year, to a greater extent than I remember in previous years, talks were built around particular packages; talks 5, 7 and 8 for example. Also, it seemed that authors were more comfortable hightlighting and sharing packages that are work in progress; residing not on CRAN but on GitHub, R-Forge and other platforms. This may reflect a larger trend in R culture.

This is the year that cointegration replaced correlation as the operative concept in many models. The quants are way out ahead of the statisticians and data scientists on this one. Follow the money!

Speaking of data scientists: if you are a Random Forests fan do check out Mark Seligman's Rborist package, a high-performance and extensible implementation of the Random Forests algorithm.

Network analysis also seemed to be an essential element of many presentations. Gergely Daróczi's Shiny app for his analysis of the Hungarian interbank lending network is a spectacular example of how interactive graphics can enhance an analysis.

Finally, I'll finish up with some suggested reading in preparation for studying the slides of the presentations when they become available.

Sanjiv Das: Efficient Rebalancing of Taxable Portfolios
Sanjiv Das: Matrix Metrics: Network-based Systematic Risk Scoring
Emanuel Derman: Models.Behaving.Badly
Jurgen A. Doornik and R.J. O'Brien: Numerically Stable Cointegration Analysis (A recommendation from Bryan Lewis)
Arthur Koestler: The Sleepwalkers (I am certain this is the book whose title Derman forgot.)
Alexander J. McNeil and Rudiger Frey: Quantitative Risk Management Concepts, Techniques and Tools
Bernhard Pfaff: Analysis of Integrated and Cointegrated Time Series with R (Use R!)

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

↧

Why has R, despite quirks, been so successful?

June 8, 2015, 7:08 am

≫ Next: In case you missed it: May 2015 roundup

≪ Previous: Some Impressions from R Finance 2015

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

I was on a panel back in 2009 where Bow Cowgill said, "The best thing about R is that it was written by statisticians. The worst thing about R is that it was written by statisticians." R is undeniably quirky — especially to computer scientists — and yet it has attracted a huge following for a domain-specific language, with more than two million users wordwide.

So why has R become so successful, despite being outside the mainstream of programming languages? John Cook adeptly tackles that question in a 2013 lecture, "The R Language: The Good The Bad And The Ugly" (embedded below). His insight is that to understand a domain-specific language, you have to understand the domain, and statistical data analysis is a very different domain than systems programming.

I think R sometimes gets a bit of an unfair rap from its quirks, but in fact these design decisions — made in the interest of making R extensible rather than fast — have enabled some truly important innovations in statistical computing:

The fact that R has lazy evaluation allowed for the development of the formula syntax, so useful for statistical modeling of all kinds.
The fact that R supports missing values as a core data value allowed R to handle real-world, messy data sources without resorting to dangerous hacks (like using zeroes to represent missing data).
R's package system — a simple method of encapsulating user-contributed functions for R — enabled the CRAN system to flourish. The pass-by-value system and naming notation for function arguments also made it easy for R programmers to create R functions that could easily be used by others.
R's graphics system was designed to be extensible, which allowed the ggplot2 system to be built on top of the "grid" framework (and influencing the look of statistical graphics everywhere).
R is dynamically typed and allows functions to "reach outside" of scope, and everything is an object — including expressions in the R language itself. These language-level programming features allowed for the development of the reactive programming framework underlying Shiny.
The fact that every action in R is a function — including operators — allowed for the development of new syntax models, like the %>% pipe operator in magrittr.
R gives programmers the ability to control the REPL loop, which allowed for the development of IDEs like ESS and RStudio.
The "for" loops can be slow in R which ... well, I can't really think of an upside for that one, except that it encouraged the development of high-performance extension frameworks like Rcpp.

Some languages have some of these features, but I don't know of any language that has all of these features — probably with good reason. But there's no doubt that without these qualities, R would not have been able to advance the state of the art in statistical computing in so many ways, and attract such a loyal following in the process.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

↧

In case you missed it: May 2015 roundup

June 10, 2015, 2:04 pm

≫ Next: R man (chester) in the North…

≪ Previous: Why has R, despite quirks, been so successful?

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

In case you missed them, here are some articles from May of particular interest to R users.

RStudio 0.99 released with improved autocomplete and data viewer features.

A tutorial on the new Naive Bayes classifier in the RevoScaleR package.

R is the most popular Predictive Analytics / Data Mining / Data Science software in the latest KDnuggets poll.

A Shiny application predicts the winner of baseball games mid-game using R.

A list of over 100 open data sources you can use with R.

Revolution R Open 3.2.0 now available, following RRO 8.0.3.

A review of talks at the Extremely Large Databases conference, featuring Stephen Wolfram and John Chambers.

My TechCrunch article on the impact of open source software on business features several R examples.

You can improve performance of R even further by using Revolution R Open with Intel Phi coprocessors.

New features in Revolution R Enterprise 7.4, now available.

The next release of SQL Server will run R in-database.

Create embeddable, interactive graphics in R with htmlwidgets.

Computerworld reviews R packages for data wrangling.

A tutorial on using data stored in the Azure cloud with R.

Using histograms as points in scatterplots, and other embedded plots in R.

A comparison of data frames, data.table, and dplyr with a random walks problem.

A video on using R for human resources optimization.

How to call R and Python from base SAS.

General interest stories (not related to R) in the past month included: a song written by an iPhone, a Facebook algorithm that tells when “like” becomes “love”, a map of light pollution and a machine-learning application that tells you how old you look.

As always, thanks for the comments and please send any suggestions to me at davidsmi@microsoft.com. Don't forget you can follow the blog using an RSS reader, via email using blogtrottr, or by following me on Twitter (I'm @revodavid). You can find roundups of previous months here.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

↧

R man (chester) in the North…

June 11, 2015, 9:03 am

≫ Next: Mimicking a Google Form with a Shiny app

≪ Previous: In case you missed it: May 2015 roundup

(This article was first published on Mango Solutions, and kindly contributed to R-bloggers)

By Andrew Vodden, Account Manager

The Manchester R user group met last Tuesday in a temporary venue next to the Manchester Art Gallery in the city centre.

We had a full house and were delighted to welcome representatives from both Manchester Universities as well as a number of leading research institutions. The commercial sector was again well represented with attendees from several high profile online retail organisations as well as marketing and analytics consultancies.

The group enjoyed 3 excellent presentations:

Mango’s own Chris Campbell spoke about some interesting work we are doing with Activinsights on automating human behavioural classification using data feeds from wrist-worn accelerometer devices. Tom Liptrot who does great work for one of Europe’s top cancer hospitals gave a fascinating account of the development of his first Shiny app, Predictshine and Graeme Hutcheson from Manchester University shared his experiences of how effectively R can be used to deal with missing data.

These presentations can be viewed here

Our thanks to Chris, Tom and Graeme for their presentations. Details of the next meeting will be announced soon (to join our mailing list and receive news of all ManchesterR meetups email us at rmanchester@mango-solutions.com

To leave a comment for the author, please follow the link and comment on his blog: Mango Solutions.

↧

Mimicking a Google Form with a Shiny app

June 14, 2015, 5:30 am

≫ Next: Shiny App for the Wittgenstein Centre Population Projections

≪ Previous: R man (chester) in the North…

(This article was first published on Dean Attali's R Blog, and kindly contributed to R-bloggers)

In this post we will walk through the steps required to build a shiny app that mimicks a Google Form. It will allow users to submit responses to some input fields, save their data, and allow admins to view the submitted responses. Like many of my other posts, it may seem lengthy, but that’s only because I like to go into fine details to ensure everything is as foolproof and reproducible as possible.

Motivation

Last year I was fortunate enough to be a teaching assistant for STAT545 – a course at the University of British Columbia, taught by Jenny Bryan, that introduces R into the lives of student scientists. (It was especially special to me because just 12 months prior, that course taught me how to write my first line in R.) To facilitate communication with the students, we wanted to gather some basic information from them, such as their prefered name, email, and Twitter and Github handles. We didn’t want to simply have them all send us an email with this information, as we were trying to set an example of how to be techy and automated and modern Our first thought was to use a Google Form, but university policies didn’t quite allow for that because the data must be stored within Canadian borders, and with Google Forms it’d probably end up somewhere in the US. We also decided earlier that day that one of the course modules will be about shiny, so we figured we’d put our money where our mouths were, and attempt to collect student data via a shiny app. I was given the task of developing this shiny app, which was a great learning experience. You can view the original code for that app on GitHub or visit the app yourself to see it in action.

The idea of recording user-submitted form data can be applied to many differente scenarios. Seeing how successful the previous app was for our course, we decided to also collect all peer reviews of assignments in a similar shiny app. I created an app with a template for a marking sheet, and every week students would use the app to submit reviews for other students’ work. This worked great for us – you can see the original code on GitHub or try the app out yourself.

Since developing those apps, I’ve become a better shiny developer also wrote the shinyjs package to help with many user-experience stuff like hiding/disabling/resetting inputs. I’ve also seen multiple people asking how to do this kind of thing with shiny, so my hope is that this post will be useful for others who are also looking to create user-submitted forms with shiny.

Overview

The app we will build will be a form collecting data on a user’s R habits – their name, length of time using R, favourite R package, etc. You can see the result of this tutorial on my shiny server and the corresponding code on GitHub. It looks like this:

The main idea is simple: create a UI with some inputs that users need to fill out, add a submit button, and save the response. Sounds simple, and it is! In this tutorial each response will be saved to a .csv file along with the timestamp of submission. To see all submissions that were made, we simply read all csv files and join them together. There will also be an “admin panel” that will show admin users all previous responses and allow them to download this data. When using Shiny Server Pro or paid shinyapps.io accounts, you can add authentication/login to your apps, and decide which usernames have admin access. Since my app is hosted on a free shiny server that doesn’t support authentication, it’ll just assume that everyone is an admin. I also like to focus a lot (arguably too much) on user experience, so this post will also discuss many small tips & tricks that are optional but can be nice additions. Many of these use the shinyjs package, so instead of loading the package in the beginning, I’ll explicitly show when functions from shinyjs are used so that you know what functions are not core shiny.

Note about persistent storage

One major component of this app is storing the user-submitted data in a way that would allow it to be retrieved later. This is an important topic of its own, and in a few days I will write a detailed post about all the different storage options and how to use them. In this tutorial I will use the simplest approach for saving the data: every submission will be saved to its own .csv file.

NOTE: this method should only be used if you have your own shiny server or are running the app on your own machine, and should not be used if your app is hosted on shinyapps.io. Using the local filesystem in shinyapps.io is a bad idea because every time your app is launched it will be on a different machine, and it will not have access to files saved by other users who were running the app on a different machine. If using shinyapps.io, you will need to use remote storage, which will be discussed in my next post. You can get a bit more information about why shinyapps.io can’t be used for local storage in the shiny docs.

Build the basic UI (inputs)

I generally prefer to split shiny apps into a ui.R and server.R file (with an additional helpers.R or globals.R if necessary), but for simplicity, I’ll place all the app code together in this tutorial.

Create a new file named app.R and copy the following code into it to build the input elements.

shinyApp(
  ui = fluidPage(
    titlePanel("Mimicking a Google Form with a Shiny app"),
    div(
      id = "form",
      
      textInput("name", "Name", ""),
      textInput("favourite_pkg", "Favourite R package"),
      checkboxInput("used_shiny", "I've built a Shiny app in R before", FALSE),
      sliderInput("r_num_years", "Number of years using R", 0, 25, 2, ticks = FALSE),
      selectInput("os_type", "Operating system used most frequently",
                  c("",  "Windows", "Mac", "Linux")),
      actionButton("submit", "Submit", class = "btn-primary")
    )
  ),
  server = function(input, output, session) {
  }
)

Most of this code is simply setting up a shiny app and adding a few input fields and a button to a div element named form.

After saving this file, you should be able to run it either with shiny::runApp() or by clicking the “Run App” button in RStudio. The app simply shows the input fields and the submit button, but does nothing yet.

Define mandatory fields

We want everyone to at least tell us their name and favourite package, so let’s ensure the submit button is only enabled if both of those fields are filled out. We need to use shinyjs for that, so you need to add a call to shinyjs::useShinyjs() anywhere in the UI. In the global scope (above the definition of shinyApp, outside the UI and server code), define the mandatory fields:

fieldsMandatory <- c("name", "favourite_pkg")

Now we can use the toggleState function to enable/disable the submit button based on a condition. The condition is whether or not all mandatory fields have been filled. To calculate that, we can loop through the mandatory fields and check their values. Add the following code to the server portion of the app:

observe({
  # check if all mandatory fields have a value
  mandatoryFilled <-
    vapply(fieldsMandatory,
           function(x) {
             !is.null(input[[x]]) && input[[x]] != ""
           },
           logical(1))
  mandatoryFilled <- all(mandatoryFilled)
  
  # enable/disable the submit button
  shinyjs::toggleState(id = "submit", condition = mandatoryFilled)
})

Now try running the app again, and you’ll see the submit button is only enabled when these fields have a value.

Show which fields are mandatory in the UI

If you want to be extra fancy, you can add a red asterisk to the mandatory fields. Here’s a neat though possibly overcomplicated approach to do this: define a function that takes an input label and adds an asterisk to it (you can define it in the global scope):

labelMandatory <- function(label) {
  tagList(
    label,
    span("*", class = "mandatory_star")
  )
}

To use it, simply wrap the label argument of both mandatory input element with labelMandatory. For example, textInput("name", labelMandatory("Name"), "").

To make the asterisk red, we need to add some CSS, so define the CSS in the global scope:

appCSS <- ".mandatory_star { color: red; }"

And add the CSS to the app by calling shinyjs::inlineCSS(appCSS) in the UI.

The complete code so far should look like this (it might be a good idea to just copy and paste this, to make sure you have the right code):

fieldsMandatory <- c("name", "favourite_pkg")

labelMandatory <- function(label) {
  tagList(
    label,
    span("*", class = "mandatory_star")
  )
}

appCSS <-
  ".mandatory_star { color: red; }"

shinyApp(
  ui = fluidPage(
    shinyjs::useShinyjs(),
    shinyjs::inlineCSS(appCSS),
    titlePanel("Mimicking a Google Form with a Shiny app"),
    
    div(
      id = "form",
      
      textInput("name", labelMandatory("Name"), ""),
      textInput("favourite_pkg", labelMandatory("Favourite R package")),
      checkboxInput("used_shiny", "I've built a Shiny app in R before", FALSE),
      sliderInput("r_num_years", "Number of years using R", 0, 25, 2, ticks = FALSE),
      selectInput("os_type", "Operating system used most frequently",
                  c("",  "Windows", "Mac", "Linux")),
      actionButton("submit", "Submit", class = "btn-primary")
    )
  ),
  server = function(input, output, session) {
    observe({
      mandatoryFilled <-
        vapply(fieldsMandatory,
               function(x) {
                 !is.null(input[[x]]) && input[[x]] != ""
               },
               logical(1))
      mandatoryFilled <- all(mandatoryFilled)
      
      shinyjs::toggleState(id = "submit", condition = mandatoryFilled)
    })    
  }
)

Save the response upon submission

The most important part of the app is to save the user’s response. First we need to define (a) what input fields we want to store and (b) what directory to use to store all the responses. I also like to add the submission timestamp to each submission, so I also want to define (c) a function that returns the current time as an integer. Let’s define these three things in the global scope:

fieldsAll <- c("name", "favourite_pkg", "used_shiny", "r_num_years", "os_type")
responsesDir <- file.path("responses")
epochTime <- function() {
  as.integer(Sys.time())
}

Make sure you create a responses directory so that the saved responses can go there.

Next we need to have a way to gather all the form data (plus the timestamp) into a format that can be saved as a csv. We can do this easily by looping over the input fields. Note that we need to transpose the data to get it into the right shape that we want (1 row = 1 observation = 1 user submission). Add the following reactive expression to the server:

formData <- reactive({
  data <- sapply(fieldsAll, function(x) input[[x]])
  data <- c(data, timestamp = epochTime())
  data <- t(data)
  data
})

The last part is to actually save the data. As I said earlier, in this post we will save the data to a local file, but in my next post I’ll show how to alter the following function in order to save to other sources. When saving the user responses locally to a file, there are two options: either save all responses to one file, or save each response as its own file. The first approach might sound like it makes more sense, but I wanted to avoid it for two reasons: first of all, it’s slower because in order to save (add a new row to the file), we’d need to first read the whole file to know where to add the new row. Secondly, this approach is not thread-safe, which means that if two people submit at the same time, one of their responses will get lost. So I opted to use the second solution – each submission is its own file. It might seem weird, but it works.

To ensure that we don’t lose any submissions, we need to make sure that no two files have the same name. It’s difficult to 100% guarantee that, but it’s easy enough to be almost sure that filenames are unique by adding some randomness to them. However, instead of having turly random characters in the filename, I went a slightly different way: I make the filename a concatenation of the current time and the md5 hash of the submission data. This way the only realistic way that two submissions will overwrite each other is if they happen at the same second and have the exact same data. Here is the function to save the response (add to the server):

saveData <- function(data) {
  fileName <- sprintf("%s_%s.csv",
                      humanTime(),
                      digest::digest(data))
  
  write.csv(x = data, file = file.path(responsesDir, fileName),
            row.names = FALSE, quote = TRUE)
}

# action to take when submit button is pressed
observeEvent(input$submit, {
  saveData(formData())
})

Notice that I used humanTime() instead of epochTime() because I wanted the filename to have a more human-friendly timestamp. You’ll need to define humanTime() as

humanTime <- function() format(Sys.time(), "%Y%m%d-%H%M%OS")

Now you should be able to run the app, enter input, save, and see a new file created for every submission. If you get an error when saving, make sure the responses directory exists and you have write permissions.

Note regarding file permissions

If you are running the app on a shiny server, it’s very improtant to understand user permissions. By default, all apps are run as the shiny user, and that user will probably not have write permission on folders you create. You should either add write permissions to shiny, or change the running user to yourself. See more information on how to do this in this post.

After submission show a “Thank you” message and let user submit again

Right now, after submitting a response, there is no feedback and the user will think nothing happened. Let’s add a “Thank you” message that will get shown, and add a button to allow the user to submit another response (if it makes sense for your app).

Add the “thank you” section to the UI after the form div (initialize it as hidden because we only want to show it after a submission):

div(id = "form", ...),
shinyjs::hidden(
  div(
    id = "thankyou_msg",
    h3("Thanks, your response was submitted successfully!"),
    actionLink("submit_another", "Submit another response")
  )
)

And in the server, after saving the data we now want to reset the form, hide it, and show the thank you message:

# action to take when submit button is pressed
observeEvent(input$submit, {
  saveData(formData())
  shinyjs::reset("form")
  shinyjs::hide("form")
  shinyjs::show("thankyou_msg")
})

Note that the this observer should overwrite the previous one because we added 3 expressions.

We also need to add an observer to clicking on the “Submit another response” button that will do the opposite: hide the thank you message and show the form (add the following to the server):

observeEvent(input$submit_another, {
  shinyjs::show("form")
  shinyjs::hide("thankyou_msg")
})

Now you should be able to submit multiple responses with a clear indication every time that it succeeded.

Better user feedback while submitting and on error

Right now there is no feedback to the user when their response is being saved and if it encounters an error, the app will crash. Let’s fix that! First we need to add a “Submitting…” progress message and an error message container to the UI – add them inside the form div, just after the submit button:

shinyjs::hidden(
  span(id = "submit_msg", "Submitting..."),
  div(id = "error",
      div(br(), tags$b("Error: "), span(id = "error_msg"))
  )
)

Now let’s hook up the logic. When the “submit” button is pressed, we want to: disable the button from being pressed again, show the “Submitting…” message, and hide any previous errors. We want to reverse these actions when saving the data is finished. If an error occurs while saving the data, we want to show the error message. All these sorts of actions are why shinyjs was created, and it will help us here. Change the observer of input$submit once again:

observeEvent(input$submit, {
  shinyjs::disable("submit")
  shinyjs::show("submit_msg")
  shinyjs::hide("error")
  
  tryCatch({
    saveData(formData())
    shinyjs::reset("form")
    shinyjs::hide("form")
    shinyjs::show("thankyou_msg")
  },
  error = function(err) {
    shinyjs::text("error_msg", err$message)
    shinyjs::show(id = "error", anim = TRUE, animType = "fade")
  },
  finally = {
    shinyjs::enable("submit")
    shinyjs::hide("submit_msg")
  })
})

Just as a small extra bonus, I like to make error messages red, so I added #error { color: red; } to the appCSS string that we defined in the beginning, so now appCSS is:

appCSS <-
  ".mandatory_star { color: red; }
   #error { color: red; }"

Now you have a fully functioning form shiny app! The only thing that’s missing so far is a way to view the responses directly in the app. Remember that all the responses are saved locally, so you can also just open the files manually or use any approach you want to open the files.

Add table that shows all previous responses

Note: this section is not visually identical to the app shown on my shiny server because in my app I placed the table to the right of the form, and the code given here will place the table above the form.

Now that we can submit responses smoothly, it’d be nice to also be able to view submitted responses in the app. First we need to add a dataTable placeholder to the UI (add it just before the form div, after the titlePanel):

DT::dataTableOutput("responsesTable"),

The main issue we need to solve in this section is how to retrieve all previous submissions. To do this, we’ll look at all the files in the responses directory, read each one into a data.frame separately, and then use dplyr::rbind_all to concatenate all the responses together. Note that this will only work if all the response files have exactly the same fields, so if you change your app to add new fields, you’ll probably need to either remove all previous submissions or make your own script to add a default value to the new field of all previous submissions.

Here’s our function that will retrieve all submissions and load them into a data.frame. You can define it in the global scope.

loadData <- function() {
  files <- list.files(file.path(responsesDir), full.names = TRUE)
  data <- lapply(files, read.csv, stringsAsFactors = FALSE)
  data <- dplyr::rbind_all(data)
  data
}

Now that we have this function, we just need to tell the dataTable in the UI to display that data. Add the following to the server:

output$responsesTable <- DT::renderDataTable(
  loadData(),
  rownames = FALSE,
  options = list(searching = FALSE, lengthChange = FALSE)
)

Now when you run the app you should be able to see your previous submissions, assuming you followed the instructions without problems.

Add ability to download all responses

It would also be very handy to be able to download all the reponses into a single file. Let’s add a download button to the UI, either just before or just after the dataTable:

downloadButton("downloadBtn", "Download responses"),

We already have a function for retrieving the data, so all we need to do is tell the download hadler to use it. Add the following to the server:

output$downloadBtn <- downloadHandler(
  filename = function() { 
    sprintf("mimic-google-form_%s.csv", humanTime())
  },
  content = function(file) {
    write.csv(loadData(), file, row.names = FALSE)
  }
)

Almost done!

Restrict access to previous data to admins only

The only missing piece is that right now everyone will see all the responses, and you might want to restrict that access to admins only. This is only possible if you enable authentication, which is available in Shiny Server Pro and in the paid shinyapps.io accounts. Without authentication, everyone who goes to your app will be treated equally, but with authentication you can give different people different usernames and decide which users are considered admins.

The first thing we need to do is remove all the admin-only content from the UI and only generate it if the current user is an admin. Remove the dataTableOutput and the downloadButton from the UI, and instead add a dynamic UI element:

uiOutput("adminPanelContainer"),

We’ll re-define the dataTable and download button in the server, but only if the user is an admin. The following code ensures that for non-admins, nothing gets rendered in the admin panel, but admins can see the table and download button (add this to the server):

output$adminPanelContainer <- renderUI({
  if (!isAdmin()) return()
  
  wellPanel(
    h2("Previous responses (only visible to admins)"),
    downloadButton("downloadBtn", "Download responses"), br(), br(),
    DT::dataTableOutput("responsesTable")
  )
})

All that’s left is to decide if the user is an admin or not (note the isAdmin() call in the previous code chunk, we need to define that function). If authentication is enabled, then the logged in user’s name will be available to us in the session$user variable. If there is no authentication, it will be NULL. Let’s say John and Sally are the app developers so they should be the admins, we can define a list of admin usernames in the global scope:

adminUsers <- c("john", "sally")

Now that we know who are the potential admins, we can use this code (in the server) to determine if the current user is an admin:

isAdmin <- reactive({
  !is.null(session$user) && session$user %in% adminUsers
})

This will ensure that only if “john” or “sally” are using the app, the admin panel will show up. For illustration purposes, since many of you don’t have authentication support, you can change the isAdmin to

isAdmin <- reactive({
  is.null(session$user) || session$user %in% adminUsers
})

This will assume that when there is no authentication, everyone is an admin, but when authentication is enabled, it will look at the admin users list.

That’s it! You are now ready to create forms with shiny apps. You can see what the final app code looks like on GitHub (with a few minor modifications), or test it out on my shiny server).

To leave a comment for the author, please follow the link and comment on his blog: Dean Attali's R Blog.

↧

Shiny App for the Wittgenstein Centre Population Projections

June 15, 2015, 4:21 am

≫ Next: Shiny Wool Skeins

≪ Previous: Mimicking a Google Form with a Shiny app

(This article was first published on Guy Abel » R, and kindly contributed to R-bloggers)

A few weeks ago a new version of the the Wittgenstein Centre Data Explorer was launched. The data explorer is intended to disseminate the results of a recent global population projection exercise which uniquely incorporates level of education (as well as age and sex) and the scientific input of more than 500 population experts around the world. Included are the projected populations used in the 5th assessment report of the Intergovernmental Panel on Climate Change (IPCC).

Over the past year or so I have been working (on and off) with the data lab team to create a shiny app, on which the data explorer is based. Below are notes to summarise some of the lessons I learnt:

1. Large data

We had a pretty large amount of data to display (31 indicators based on up to 7 scenarios x 26 time periods x 223 geographical areas x 21 age groups x 2 genders x 7 education categories)… so somewhere over 8 million rows for some indicators. Further complexity was added by the fact that some indicators were by definition not available for some dimensions of the data, for example, population median age is not available by age group. The size and complexity meant that data manipulations were a big issue. Using read.csv to load the data didn’t really cut the mustard, taking over 2 minutes when running on the server. The fantastic saves package and ultra.fast=TRUE argument in the loads function came to the rescue, alongside some pre-formatting to avoid as much joining and reshaping of the data on the server as possible. This cut load times to a couple of seconds at most, and allowed the app to work with the indicator variables on the fly as demanded by the user selections. Once the data was in, the more than awesome dplyr functions finished the data manipulations jobs in style. I am sure there is some smarter way to get everything running a little bit quicker than it does now, but I am pretty happy with the present speed, given the initial waiting times.

2. googleVis and gvisMerge

It’s a demographic data explorer, which means population pyramids have to pop-up somewhere. We needed pyramids that illustrate population sizes by education level, on top of the standard age and sex breakdown. Static versions of the pyramids in the explorer have been used by my colleagues for a while to illustrate past and future populations. For the graphic explorer I created some interactive versions, for comparisons over time and between countries, and which also have some tool tip features. These took a little while to develop. I played with ggvis but couldn’t get my bar charts to go horizontal. I also took a look at some other functions for interactive pyramids but I couldn’t figure out a way to overlay the educational dimension. I found a solution by creating gender specific stacked bar charts from gvisBarChart in the googleVis package and then gvisMerge to bring them together in one plot. As with the data tables, they take a second or so render, so I added a withProgress bar to try and keep the user entertained.

I could not figure out a way in R to convert the HTML code outputted by the gvisMerge function to a familiar file format for users to download. Instead I used a system call to the wkhtmltopdf program to return PDF or PNG files. By default, wkhtmltopdf was a bit hit and miss, especially with converting the more complex plots in the maps to PNG files. I found setting --enable-javascript --javascript-delay 2000 helped in many cases.

3. The shiny user community

I asked questions using the shiny tag on stackoverflow and the shiny google group a number of times. A big thank you to everyone who helped me out. Browsing through other questions and answers was also super helpful. I found this question on organising large shiny code particularly useful. Making small changes during the reviewing process became a lot easier once I broke the code up across multiple .R files with sensible names.

4. Navbar Pages

When I started building the shiny app I was using a single layout with a sidebar and tabbed pages to display data and graphics (using tabsetPanel()), adding extra tabs as we developed new features (data selection, an assumption data base, population pyramids, plots of population size, maps, FAQ’s, etc, etc.). As these grew, the switch to the new Navbar layout helped clean up the appearance and provide a better user experience, where you can switch between data, graphics and background information using the bar at the top of page.

5. Shading and link buttons

I added some shading and buttons to help navigate through the data selection and between different tabs. For the shading I used cssmatic.com to generate the colour of a fluidRow background. The code generated there was copy and pasted into a tags$style element for my defined row myRow1, as such;

library(shiny)
runApp(list(
  ui = shinyUI(fluidPage(
    br(),
    fluidRow(
      class = "myRow1", 
      br(),
      selectInput('variable', 'Variable', names(iris))
    ),
    tags$style(".myRow1{background: rgba(212,228,239,1); 
                background: -moz-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -webkit-gradient(left top, right top, color-stop(0%, rgba(212,228,239,1)), color-stop(100%, rgba(44,146,208,1)));
                background: -webkit-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -o-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: -ms-linear-gradient(left, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                background: linear-gradient(to right, rgba(212,228,239,1) 0%, rgba(44,146,208,1) 100%);
                filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#d4e4ef', endColorstr='#2c92d0', GradientType=1 );
                border-radius: 10px 10px 10px 10px;
                -moz-border-radius: 10px 10px 10px 10px;
                -webkit-border-radius: 10px 10px 10px 10px;}")
  )),
  server = function(input, output) {
  }
))

I added some buttons to help novice users switch between tabs once they had selected or viewed their data. It was a little tougher to implement than the shading, and in the end I need a little help. I used bootsnipp.com to add some icons and define the style of the navigation buttons (using the tags$style element again).

That is about it for the moment. I might add a few more notes to this post as they occur to me… I would encourage anyone who is tempted to learn shiny to take the plunge. I did not know JavaScript or any other web languages before I started, and I still don’t… which is the great beauty of the shiny package. I started with the RStudio tutorials, which are fantastic. The R code did not get a whole lot more complex than what I learnt there, even though the shiny app is quite large in comparisons to most others I have seen.

Any comments or suggestions for improving website are welcome. If anyone would like the code for the pyramids or something else not covered, let me know.

To leave a comment for the author, please follow the link and comment on his blog: Guy Abel » R.

↧

Shiny Wool Skeins

June 15, 2015, 7:00 am

≫ Next: Shiny 0.12: Interactive Plots with ggplot2

≪ Previous: Shiny App for the Wittgenstein Centre Population Projections

(This article was first published on Ripples, and kindly contributed to R-bloggers)

Chaos is not a pit: chaos is a ladder (Littlefinger in Game of Thrones)

Some time ago I wrote this post to show how my colleague Vu Anh translated into Shiny one of my experiments, opening my eyes to an amazing new world. I am very proud to present you the first Shiny experiment entirely written by me.

In this case I took inspiration from another previous experiment to draw some kind of wool skeins. The shiny app creates a plot consisting of chords inside a circle. There are to kind of chords:

Those which form a track because they are a set of glued chords; number of tracks and number of chords per track can be selected using Number of track chords and Number of scrawls per track sliders of the app respectively.
Those forming the background, randomly allocated inside the circle. Number of background chords can be chosen as well in the app

There is also the possibility to change colors of chords. This are the main steps I followed to build this Shiny app:

Write a simple R program
Decide which variables to parametrize
Open a new Shiny project in RStudio
Analize the sample UI.R and server.R files generated by default
Adapt sample code to my particular code (some iterations are needed here)
Deploy my app in the Shiny Apps free server

Number 1 is the most difficult step, but it does not depends on Shiny: rest of them are easier, specially if you have help as I had from my colleague Jorge. I encourage you to try. This is an snapshot of the app:

You can play with the app here.

Some things I thought while developing this experiment:

Shiny gives you a lot with a minimal effort
Shiny can be a very interesting tool to teach maths and programming to kids
I have to translate to Shiny some other experiment
I will try to use it for my job

Try Shiny: is very entertaining. A typical Shiny project consists on two files, one to define the user interface (UI.R) and the other to define the back end side (server.R).

This is the code of UI.R:

# This is the user-interface definition of a Shiny web application.
# You can find out more about building applications with Shiny here:
#
# http://shiny.rstudio.com
#

library(shiny)

shinyUI(fluidPage(

  # Application title
  titlePanel("Shiny Wool Skeins"),
  HTML("<p>This experiment is based on <a href="https://aschinchon.wordpress.com/2015/05/13/bertrand-or-the-importance-of-defining-problems-properly/">this previous one</a> I did some time ago. It is my second approach to the wonderful world of Shiny.</p>"),
  # Sidebar with a slider input for number of bins
  sidebarLayout(
    sidebarPanel(
      inputPanel(
        sliderInput("lin", label = "Number of track chords:",
                    min = 1, max = 20, value = 5, step = 1),
        sliderInput("rep", label = "Number of scrawls per track:",
                    min = 1, max = 50, value = 10, step = 1),
        sliderInput("nbc", label = "Number of background chords:",
                    min = 0, max = 2000, value = 500, step = 2),
        selectInput("col1", label = "Track colour:",
                    choices = colors(), selected = "darkmagenta"),
        selectInput("col2", label = "Background chords colour:",
                    choices = colors(), selected = "gold")
      )
      
    ),

    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("chordplot")
    )
  )
))

And this is the code of server.R:

# This is the server logic for a Shiny web application.
# You can find out more about building applications with Shiny here:
#
# http://shiny.rstudio.com
#
library(ggplot2)
library(magrittr)
library(grDevices)
library(shiny)

shinyServer(function(input, output) {

  df<-reactive({
    ini=runif(n=input$lin, min=0,max=2*pi)
    ini %>% 
      +runif(n=input$lin, min=pi/2,max=3*pi/2) %>% 
      cbind(ini, end=.) %>% 
      as.data.frame() -> Sub1
    Sub1=Sub1[rep(seq_len(nrow(Sub1)), input$rep),]
    Sub1 %>% apply(c(1, 2), jitter) %>% as.data.frame() -> Sub1
    Sub1=with(Sub1, data.frame(col=input$col1, x1=cos(ini), y1=sin(ini), x2=cos(end), y2=sin(end)))
    Sub2=runif(input$nbc, min = 0, max = 2*pi)
    Sub2=data.frame(x=cos(Sub2), y=sin(Sub2))
    Sub2=cbind(input$col2, Sub2[(1:(input$nbc/2)),], Sub2[(((input$nbc/2)+1):input$nbc),])
    colnames(Sub2)=c("col", "x1", "y1", "x2", "y2")
    rbind(Sub1, Sub2)
  })
  
  opts=theme(legend.position="none",
             panel.background = element_rect(fill="white"),
             panel.grid = element_blank(),
             axis.ticks=element_blank(),
             axis.title=element_blank(),
             axis.text =element_blank())
  
  output$chordplot<-renderPlot({
    p=ggplot(df())+geom_segment(aes(x=x1, y=y1, xend=x2, yend=y2), colour=df()$col, alpha=runif(nrow(df()), min=.1, max=.3), lwd=1)+opts;print(p)
  }, height = 600, width = 600 )
  

})

To leave a comment for the author, please follow the link and comment on his blog: Ripples.

↧

Table of contents

Step 1: Sign up to DigitalOcean

Step 2: Create a new droplet

Step 3: Log in to your very own shiny new server

Step 4: Ensure you don't shoot yourself in the foot

Step 5: See your droplet in a browser

Quick nginx references

Step 6: Install R

Step 7: Install RStudio Server

Step 8: Install Shiny Server

Step 9: Make pretty URLs for RStudio Server and Shiny Server

Step 10: Custom domain name

Configure your domain

Change your domain servers to DigitalOcean

Resources

Disclaimer

1.Heroku

2.Docker and Dokku

3.Git pushing the Wordcloud Shiny App

3.1 Git pushing to my server

3.2 Installing Dokku on Digital Ocean

3.3 Important details

Conclusion

Table of contents

Data preparation

Get data from Twitter

Scrape R-Bloggers to get author info

Clean up data

Add a score metric

Exploration

Scores of all tweets

Most successful posts

Highest scoring authors

Posts by highest scoring authors

Who contributes the most?

Post success by day of week

Wordcloud

Comparing two groups

Running the Markdown script

Promoting Statistical Innovations

References

Running RStudio in Digital Ocean with R/analogsea

Customized RStudio images

Start virtual machine, pull and run customized RStudio image, email credintials

Data Viewer

Code Completion

Code Diagnostics

Code Snippets

Try it Out

Raw Data

Decision Tree

A Forest of Decision Trees

Medal Prediction Application

Table of contents

Availability

Quick overview of new features

Two major new features:

Two major improvements:

Three new features available on the GitHub version but not yet on CRAN:

Two major new features

reset - allows inputs to be reset to their original value

extendShinyjs - allows you to easily call your own JavaScript functions from R

Very simple example

Passing arguments from R to JavaScript

Two major improvements

Enabling/disabling works on all inputs

Use a condition in toggle functions

Three new features available on the GitHub version but not yet on CRAN

hidden now accepts multiple tags

Visibility functions can be run on any selector

Visibility functions can be delayed

Feedback + suggestions

Create an Account

SSH Connection

Installing R

Installing Shiny

Installing and Testing your Applications

Table of contents

Motivation

Overview

`reset` - allows inputs to be reset to their original value

`extendShinyjs` - allows you to easily call your own JavaScript functions from R

Use a `condition` in `toggle` functions

`hidden` now accepts multiple tags