Monthly Archives: October 2019

corto – the Correlation Tool

We developed corto (Correlation Tool), a simple package to infer gene regulatory networks from gene expression data using DPI (Data Processing Inequality) and bootstrapping to recover edges.

Supplementary Material containing all gene networks generated during corto benchmarking:
https://www.dropbox.com/sh/qzl8vjeoa7mqxfp/AACEfLQpAzUz7rqqEHEjMrhQa?dl=0

CRAN stable package: https://cran.r-project.org/package=corto

Github developmental version: https://github.com/federicogiorgi/corto

Progress bars and parallelization in R

Since SNOW is being discontinued, today I worked a bit on finding new solutions to have a progress bar in R for jobs running in parallel. In this example, I run 10,000 times a simple function to calculate logarithms, using 2 threads and monitoring the progress of the 10,000 calculations.

Set up the parameters

The following are the three parameters needed for any parallel job: number of threads, number of replicates (jobs) and a function:

nthreads<-2
nreps<-10000
funrep<-function(i){
res<-c(log2(i),log10(i))
}

SNOW solution

This was my old solution in SNOW, but CRAN is flagging all packages using SNOW with a warning “superseded packages” so we have to change it:

library(doSNOW)
cl<-makeCluster(nthreads)
registerDoSNOW(cl)
pb<-txtProgressBar(0,nreps,style=3)
progress<-function(n){
setTxtProgressBar(pb,n)
}
opts<-list(progress=progress)
i<-0
output<-foreach(i=icount(nreps),.combine=c,.options.snow=opts) %dopar% {
s<-funrep(i)
return(s)
}
close(pb)
stopCluster(cl)

Parallel solution (not working)

Unfortunately, Parallel doesn’t have a .options in foreach, and running it like this won’t work, as the combine function is run only at the end:

library(doParallel)
cl<-makeCluster(nthreads)
registerDoParallel(cl)
pb<-txtProgressBar(0,nreps,style=3)
output<-foreach(i=icount(nreps),.combine=c) %dopar% {
funrep(i)
setTxtProgressBar(pb,i)
}
stopCluster(cl)

Another parallel solution

After many tears, I finally found a solution that could work. Essentially, instead of c() I am running a progcombine() that contains c() and also updates a progress bar. Luckily, it works on both Windows and Linux:

library(doParallel)
progcombine<-function(){
pb <- txtProgressBar(min=1, max=nreps-1,style=3)
count <- 0
function(…) {
count <<- count + length(list(…)) – 1
setTxtProgressBar(pb,count)
flush.console()
c(…)
}
}
cl <- makeCluster(nthreads)
registerDoParallel(cl)
output<-foreach(i = icount(nreps),.combine=progcombine()) %dopar% {
funrep(i)
}
stopCluster(cl)

The working solution: pblapply

library(pbapply)
cl<-parallel::makeCluster(nthreads)
invisible(parallel::clusterExport(cl=cl,varlist=c(“nreps”)))
invisible(parallel::clusterEvalQ(cl=cl,library(utils)))
result<-pblapply(cl=cl,
X=1:nreps,
FUN=funrep)
parallel::stopCluster(cl)