R on the ACCRE Cluster
R is a widely used statistical analysis environment and programming language. Many versions of R are available to use on the cluster. Users typically first develop code interactively on their laptop/desktop, and then run batch processing jobs on the ACCRE cluster through the SLURM job scheduler.
Versions of R on the ACCRE Cluster
R can be added to your environment using Lmod. We encourage users to use the most recent version installed. To see a list of installed versions simply type:
[bob@vmps11 ~]$ module spider R
To see details about how to load a specific version you can then run the same command but with version information included in the package name:
[bob@vmps11 ~]$ module spider R/3.3.3-X11-20160819
The output from this command will give you information about the dependencies that first need to be loaded in order to add R to your environment. For example:
[bob@vmps11 ~]$ module load GCC OpenMPI R
Here, we are loading the R version 3.3.3 built with the Intel compiler and Intel’s MPI library. We will periodically install new versions of R, at which point the default version of R will change, so you may want to hard-code the version of R into your module load command (i.e. module load GCC/5.4.0-2.26 OpenMPI/1.10.3 R/3.3.3-X11-20160819) to avoid picking up a new version of R when you don’t want it. Since it can be a handful to type in, you may wish to define a shortcut using the alias
command if it’s not part of a SLURM script or bash script. Our current R installation comes with a large number of popular scientific and high-performance computing packages preinstalled (e.g. ggplot2, snow, doParallel, foreach, Rmpi). Even more packages are available in the R Bioconductor package which is also available via Lmod:
[bob@vmps11 ~]$ module load R-bundle-Bioconductor
Checking Installed Packages
One simple way to do this is by typing library() from the R command prompt. For example:
[bob@vmps11 ~]$ R -e 'library()'
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library()
Packages in library ‘/gpfs22/easybuild/centos6/software/MPI/GCC/
5.4.0-2.26/OpenMPI/1.10.3/R-bundle-Bioconductor/
3.3-R-3.3.3’:
affy Methods for Affymetrix Oligonucleotide Arrays
affycoretools Functions useful for those doing repetitive
analyses with Affymetrix GeneChips
affyio Tools for parsing Affymetrix data files
AgiMicroRna Processing and Differential Expression Analysis
of Agilent microRNA chips
ALDEx2 Analysis of differential abundance taking
sample variation into account
annaffy Annotation tools for Affymetrix biological
metadata
annotate Annotation for microarrays
AnnotationDbi Annotation Database Interface
AnnotationForge Code for Building Annotation Database Packages
AnnotationHub Client to access AnnotationHub resources
baySeq Empirical Bayesian analysis of patterns of
differential expression in count data
Biobase Biobase: Base functions for Bioconductor
BiocGenerics S4 generic functions for Bioconductor
BiocInstaller Install/Update Bioconductor, CRAN, and github
Packages
BiocParallel Bioconductor facilities for parallel evaluation
biomaRt Interface to BioMart databases (e.g. Ensembl,
COSMIC ,Wormbase and Gramene)
biomformat An interface package for the BIOM file format
Biostrings String objects representing biological
sequences, and matching algorithms
biovizBase Basic graphic utilities for visualization of
genomic data.
BSgenome Infrastructure for Biostrings-based genome data
packages and support for efficient SNP
representation
BSgenome.Hsapiens.UCSC.hg19
Full genome sequences for Homo sapiens (UCSC
version hg19)
bumphunter Bump Hunter
.
.
.
Note that the above output has been truncated for brevity. If you were to run this command you would see additional information about installed packages: the path to the package, version, dependencies, license information, and a few other details. To load a package into a R session simply type library("package_name")
. For example, to load the parallel package one would need to type:
library("parallel")
Installing New Packages
If you find that a particular package you need is missing from the R version you use, you will need to install the package yourself into your home directory. There are multiple ways to install R packages. Below is an example of how you would go about installing a package from the R command prompt. To begin, create a directory in your home directory to install these packages into. In this example, the packages will be installed into a directory at ~/R/rlib
:
[bob@vmps11 ~]$ mkdir -p ~/R/rlib-3.3.3
Notice that we are including the R version in the name of this directory. In general, when switching to a new R version you should reinstall packages to be used with the new version of R. So you might have a ~/R/rlib-3.3.3 directory and later create a ~/R/rlib-3.4.0 when you switch over the new version of R.
Now load R and start up an R session from the terminal. In this example we will install the Zelig package.
[bob@vmps11 ~]$ module load GCC OpenMPI R R-bundle-Bioconductor
[bob@vmps11 ~]$ R
.
.
.
> .libPaths("~/R/rlib-3.3.3")
> install.packages("Zelig")
Installing package into ‘/gpfs22/home/bob/R/rlib-3.3.3’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
CRAN mirror
1: 0-Cloud 2: Algeria
3: Argentina (La Plata) 4: Australia (Canberra)
5: Australia (Melbourne) 6: Austria
7: Belgium 8: Brazil (BA)
9: Brazil (PR) 10: Brazil (RJ)
11: Brazil (SP 1) 12: Brazil (SP 2)
13: Canada (BC) 14: Canada (NS)
15: Canada (ON) 16: Canada (QC 1)
17: Canada (QC 2) 18: Chile
19: China (Beijing 1) 20: China (Beijing 2)
21: China (Beijing 3) 22: China (Beijing 4)
23: China (Hefei) 24: China (Lanzhou)
25: China (Xiamen) 26: Colombia (Cali)
27: Czech Republic 28: Denmark
29: Ecuador 30: El Salvador
31: Estonia 32: France (Lyon 1)
33: France (Lyon 2) 34: France (Montpellier)
35: France (Paris 2) 36: France (Strasbourg)
37: Germany (Berlin) 38: Germany (Goettingen)
39: Germany (Frankfurt) 40: Germany (Münster)
41: Greece 42: Hungary
43: Iceland 44: India
45: Indonesia (Jakarta) 46: Iran
47: Ireland 48: Italy (Milano)
49: Italy (Padua) 50: Italy (Palermo)
51: Japan (Tokyo) 52: Japan (Yamagata)
53: Korea (Seoul 1) 54: Korea (Seoul 2)
55: Korea (Ulsan) 56: Lebanon
57: Mexico (Mexico City) 58: Mexico (Texcoco)
59: Netherlands (Amsterdam) 60: Netherlands (Utrecht)
61: New Zealand 62: Norway
63: Philippines 64: Poland
65: Portugal 66: Russia (Moscow 1)
67: Russia (Moscow 2) 68: Singapore
69: Slovakia 70: South Africa (Johannesburg)
71: Spain (A Coruña) 72: Spain (Madrid)
73: Sweden 74: Switzerland
75: Taiwan (Chungli) 76: Taiwan (Taipei)
77: Thailand 78: Turkey
79: UK (Bristol) 80: UK (Cambridge)
81: UK (Hampshire) 82: UK (London)
83: UK (London) 84: UK (St Andrews)
85: USA (CA 1) 86: USA (CA 2)
87: USA (IA) 88: USA (IN)
89: USA (KS) 90: USA (MD)
91: USA (MI 1) 92: USA (MI 2)
93: USA (MO) 94: USA (OH 1)
95: USA (OH 2) 96: USA (OR)
97: USA (PA 1) 98: USA (PA 2)
99: USA (TN) 100: USA (TX 1)
101: USA (WA 1) 102: USA (WA 2)
103: Venezuela 104: Vietnam
Here we are prompted for the repository we would like to download the package from. Let’s choose the Tennessee repository (option 99):
Selection: 99
also installing the dependencies ‘zoo’, ‘sandwich’
trying URL 'http://mirrors.nics.utk.edu/cran/src/contrib/zoo_1.7-12.tar.gz'
Content type 'application/x-gzip' length 839181 bytes (819 KB)
==================================================
downloaded 819 KB
trying URL 'http://mirrors.nics.utk.edu/cran/src/contrib/sandwich_2.3-3.tar.gz'
Content type 'application/x-gzip' length 466503 bytes (455 KB)
==================================================
downloaded 455 KB
trying URL 'http://mirrors.nics.utk.edu/cran/src/contrib/Zelig_4.2-1.tar.gz'
Content type 'application/x-gzip' length 3262531 bytes (3.1 MB)
==================================================
downloaded 3.1 MB
* installing *source* package ‘zoo’ ...
** package ‘zoo’ successfully unpacked and MD5 sums checked
** libs
icc -std=gnu99 -I/usr/local/R/3.2.0/x86_64/intel14/nonet/lib64/R/include
-DNDEBUG -I../inst/include -I/usr/local/include -fpic -O3 -msse3
-funroll-loops -funsigned-char -c coredata.c -o coredata.o
icc -std=gnu99 -I/usr/local/R/3.2.0/x86_64/intel14/nonet/lib64/R/include
-DNDEBUG -I../inst/include -I/usr/local/include -fpic -O3 -msse3
-funroll-loops -funsigned-char -c init.c -o init.o
icc -std=gnu99 -I/usr/local/R/3.2.0/x86_64/intel14/nonet/lib64/R/include
-DNDEBUG -I../inst/include -I/usr/local/include -fpic -O3 -msse3
-funroll-loops -funsigned-char -c lag.c -o lag.o
icc -std=gnu99 -shared -L/usr/local/lib64 -o zoo.so coredata.o init.o lag.o
installing to /gpfs22/home/frenchwr/R/rlib/zoo/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (zoo)
* installing *source* package ‘sandwich’ ...
** package ‘sandwich’ successfully unpacked and MD5 sums checked
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (sandwich)
* installing *source* package ‘Zelig’ ...
** package ‘Zelig’ successfully unpacked and MD5 sums checked
** R
** data
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (Zelig)
The downloaded source packages are in
‘/tmp/RtmpmGSgLS/downloaded_packages’
Notice that Zelig had a few dependencies (zoo and sandwich) that were also installed along the way. It appears that the installation was successful, so let’s exit the R session to check if the packages are now in our home directory:
> quit()
Save workspace image? [y/n/c]: n
[bob@vmps65 ~]$ ls ~/R/rlib-3.3.3/
sandwich Zelig zoo
There they are! Finally, let’s re-start R to make sure we can load the package we’ve installed:
[bob@vmps11 ~]$ R
.
.
.
> .libPaths("~/R/rlib-3.3.3")
> library("Zelig")
Loading required package: boot
Loading required package: MASS
Loading required package: sandwich
ZELIG (Versions 4.2-1, built: 2013-09-12)
+----------------------------------------------------------------+
| Please refer to http://gking.harvard.edu/zelig for full |
| documentation or help.zelig() for help with commands and |
| models support by Zelig. |
| |
| Zelig project citations: |
| Kosuke Imai, Gary King, and Olivia Lau. (2009). |
| ``Zelig: Everyone's Statistical Software,'' |
| http://gking.harvard.edu/zelig |
| and |
| Kosuke Imai, Gary King, and Olivia Lau. (2008). |
| ``Toward A Common Framework for Statistical Analysis |
| and Development,'' Journal of Computational and |
| Graphical Statistics, Vol. 17, No. 4 (December) |
| pp. 892-913. |
| |
| To cite individual Zelig models, please use the citation |
| format printed with each model run and in the documentation. |
+----------------------------------------------------------------+
Attaching package: ‘Zelig’
The following object is masked from ‘package:utils’:
cite
Zelig appears to load properly, confirming we have successfully installed the package. Note that we first needed to type:
> .libPaths("~/R/rlib-3.3.3")
in order to point R to the directory where our packages are installed. This command was also need before installing the packages. Alternatively, you may drop this line in your .RProfile if you always want R to see these libraries. Note that these packages were installed for a specific version of R, so it’s unlikely that they will work for a different version. If you need information on installing a package from source code or from Bioconductor, refer to our FAQ page.
Example Scripts
Running a R script within a SLURM job is generally straightforward. Unless you are attempting to run one of R’s multi-processing packages, you will want to request a single task, load the appropriate version of R from your SLURM script, and then run your script using the Rscript command. The –no-save flag passed to Rscript prevents R from saving the workspace, which in this example would be relatively large. The following example runs R 3.2.0 on a simple R script that demonstrates the utility of writing vectorized R code:
[bob@vmps11 run1]$ ls
R.slurm vectorize.R
[bob@vmps11 run1]$ cat R.slurm
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --mem=500M
#SBATCH --output=R_job_slurm.out
module load GCC OpenMPI R
Rscript --no-save vectorize.R
[bob@vmps11 run1]$ cat vectorize.R
n = 10^7
# populate with random nos
v=runif(n)
system.time({vv<-v*v; m<-mean(vv)}); m
system.time({for(i in 1:length(v)) { vv[i]<-v[i]*v[i] }; m<-mean(vv)}); m
Note this example was taken from a Stackoverflow thread . We next submit the job with sbatch :
[bob@vmps11 run1]$ sbatch R.slurm
Submitted batch job 2271536
After waiting a few minutes:
[bob@vmps11 run1]$ ls
R_job_slurm.out R.slurm vectorize.R
[bob@vmps11 run1]$ cat R_job_slurm.out
user system elapsed
0.047 0.014 0.062
[1] 0.3333861
user system elapsed
20.158 0.058 20.253
[1] 0.3333861
The elapsed column indicates that the vectorized version of the code executed in 0.062 seconds while the non-vectorized section executed in 20.253 seconds. Both versions produced identical results (0.3333861). Moral of the story: used vectorized code in R scripts whenever possible!
Contributing New Examples
In order to foster collaboration and develop local R expertise at Vanderbilt, we encourage users to submit examples of their own to ACCRE’s R Github repository . Instructions for doing this can be found on this page .
The R logo is © 2016 The R Foundation and licensed under CC-BY-SA 4.0.