Category Archives: Science

Online Tools to fight the COVID-19 pandemic (updated 07-Sep-2020)

ToolLinkMain InstitutionNationArchitectureTagsProsCons
JHU COVID-19 Dashboard Hopkins UniversityUSAPythonDashboard, Interactive Map, Trend Assessment, WorldwideFrequently Updated, Quick Assessment, Worldwide Analysis 
DSCovR UniversityUSAShiny/RDashboard, Interactive Map, Trend AssessmentComparative Region Analysis, Demographics IncludedSlow to load, Focused on USA
WHO Dashboardhttp://covid19.who.intWHOWorldwideJavaScriptDashboard, Interactive Map, WorldwideComparative Region Analysis, Easy to Use, Frequently Updated, Quick Assessment, Worldwide Analysis 
Worldometers, WorldwideEasy to Use, Frequently Updated, Quick Assessment, Worldwide Analysis 
COVID-19 Scenarioshttp://covid19-scenarios.orgUniversity of BaselSwitzerlandJavaScriptInteractive Simulator, WorldwideDemographics Included, High Number of ParametersNon-trivial to tailor the simulation for specific regions
Harvard COVID-19 Simulatorhttp://covid19sim.orgHarvard Medical SchoolUSARInteractive SimulatorFrequently UpdatedFocused on USA
CovidSIMhttp://covidsim.euExploSYS GmbHGermanyJavaScriptInteractive SimulatorHigh Number of ParametersNon-trivial to tailor the simulation for specific regions
COVID-19 Trajectory viewer of LeipzigGermanyShiny/RInteractive SimulatorComparative Region Analysis 
COVID-19 Exit Strategies versus Corona initiativeWorldwideShiny/RInteractive SimulatorComparison of Several Exit StrategiesTunable Parameters are Few
Greifswald COVID-19 Simulator of GreifswaldGermanyShiny/RInteractive SimulatorPredict Effect of Social Contact ReductionFocused on specific countries and German regions
COVID19-Tracker Biomedical Research InstituteSpainShiny/RCase Number Visualizer and PredictorFrequently UpdatedFocused on Spain
GISAIDhttp://gisaid.orgGISAIDWorldwideCMS TYPO3Data Repository, WorldwideDatabase Fully Downloadable, Frequently Updated, Precomputed Multiple Sequence Alignment 
Nextstrain of BaselSwitzerlandPythonDashboard, Nucleotide Mutation Analysis, Phylogenesis, WorldwideFrequently Updated, Simulation of Mutation Spread over Time, WorldwideDifficult to zoom into specific regions of the interactive phylogenetic tree
Covidex of LujánArgentinaShiny/RPhylogenetic CategorizationAllows User-provided Data, Intuitive TutorialWorks exclusively with User-provided Data
Coronapp of BolognaItalyShiny/RAmino Acid Mutation Analysis, Nucleotide Mutation Analysis, Frequency of Mutations over TimeAllows User-provided Data, Nucleotide and Protein Mutations, WorldwideSlow to load
COVID-19 Genotyping Toolhttp://covidgenotyper.appUniversity of TorontoCanadaShiny/RPhylogenetic Categorization via 2D clusteringAllows User-provided DataAnalysis is very slow, Maximum number of sequences is only 10
Pangolinhttp://pangolin.cog-uk.ioCentre for Genomic Pathogen SurveillanceUnited KingdomPythonPhylogenetic Categorization, Lineage AssignerAllows User-provided Data, Intuitive Assignment of LineageAnalysis is slow
SARS-CoV-2 Alignment Screen College LondonUnited KingdomShiny/RNucleotide Mutation AnalysisMutation Analysis can be Focused on specific Genomic Regions or GenesNot frequently updated
CoV-GLUE of GlasgowUnited KingdomJavaScriptAmino Acid Mutation Analysis, Nucleotide Mutation Analysis, SpreadsheetMutation Analysis can be Focused on specific Genomic Regions or Genes, Mutations Categorized as Replacements/Insertions/Deletions 
Coronavirus3Dhttp://coronavirus3d.orgUniversity of California RiversideUSAJavaScriptAmino Acid Mutation Analysis, 3D StructureAllows to project mutations on viral protein structures from PDB, Frequently Updated 
CoVex University of MunichGermanyJavaScriptInteractome VisualizerAllows to identify Known Drugs for selected Target Proteins 
VirHostNet 2.0http://virhostnet.prabi.frUniversity of LyonFranceCytoscape webInteractome VisualizerPrediction of novel interactions on user-provided protein sequencesAnalysis is slow
P-HIPSTerhttp://phipster.orgColumbia UniversityUSAJavaScriptInteraction ListPrediction of novel interactions using sequence- and structure-based machine learningNot focused on SARS-CoV-2
COVID-19 Gene/Drug Set Library School of Medicine Mount SinaiUSAJavaScriptCurated Lists of Genes and DrugsLists can be Searched, New Sets can be ProposedNo link with external databases
canSAR Cancer Therapeutics UnitUnited KingdomJavaScriptDatabase of Clinical Trials, Drugs and Druggable TargetsIntuitive Visualization of Druggable Interactome, Drug Prediction 
CORDITEhttp://cordite.mathematik.uni-marburg.deUniversity of MarburgGermanyJavaScriptDatabase of Clinical Trials, Drugs and Druggable TargetsQuick SearchNot Frequently Updated
COVID-19 Disease Map of LuxemburgLuxemburgJavaScriptDatabase of Drugs and PathwaysSearch for relevant interactions between viral proteins and human pathwaysInteractome Labels are hard to read, Not Frequently Updated, No Examples provided, Not focused on SARS-CoV-2
CoV-Hipathia for Progress and HealthSpainWeb ComponentsAnalysis of Druggable Pathways affected by Gene Expression ChangesAllows User-provided DataAnalysis is slow
Chemical Checker for Research in BiomedicineSpainJavaScriptDatabase of DrugsDrugs Ranked by Evidence Quality and Quantity, Frequently Updated 
Clinical Trials StatesJavaScriptDatabase of Clinical TrialsFrequently Updated, Fully ComprehensiveNot categorized by Drugs

corto – the Correlation Tool

We developed corto (Correlation Tool), a simple package to infer gene regulatory networks from gene expression data using DPI (Data Processing Inequality) and bootstrapping to recover edges.

Supplementary Material containing all gene networks generated during corto benchmarking:

CRAN stable package:

Github developmental version:

Progress bars and parallelization in R

Since SNOW is being discontinued, today I worked a bit on finding new solutions to have a progress bar in R for jobs running in parallel. In this example, I run 10,000 times a simple function to calculate logarithms, using 2 threads and monitoring the progress of the 10,000 calculations.

Set up the parameters

The following are the three parameters needed for any parallel job: number of threads, number of replicates (jobs) and a function:


SNOW solution

This was my old solution in SNOW, but CRAN is flagging all packages using SNOW with a warning “superseded packages” so we have to change it:

output<-foreach(i=icount(nreps),.combine=c,.options.snow=opts) %dopar% {

Parallel solution (not working)

Unfortunately, Parallel doesn’t have a .options in foreach, and running it like this won’t work, as the combine function is run only at the end:

output<-foreach(i=icount(nreps),.combine=c) %dopar% {

Another parallel solution

After many tears, I finally found a solution that could work. Essentially, instead of c() I am running a progcombine() that contains c() and also updates a progress bar. Luckily, it works on both Windows and Linux:

pb <- txtProgressBar(min=1, max=nreps-1,style=3)
count <- 0
function(…) {
count <<- count + length(list(…)) – 1
cl <- makeCluster(nthreads)
output<-foreach(i = icount(nreps),.combine=progcombine()) %dopar% {

The working solution: pblapply


Combining P-values


We have come a long way from the original simple p-value integration methods of Fisher and Stouffer. Hong Zhang, a talented grad student from the Worcester Polytechnic
Institute, and his colleagues have developed a novel method, called TFisher, for dealing with p-value integration in a wide range of test scenarios.

I quote from their abstract, available here:

For testing a group of hypotheses, tremendous p-value combination methods have been developed and widely applied since 1930’s. Some methods (e.g., the minimal p-value) are optimal for sparse signals, and some others (e.g., Fisher’s combination) are optimal for dense signals. To address a wide spectrum of signal patterns, this paper proposes a unifying family of statistics, called TFisher, with general p-value truncation and weighting schemes. Analytical calculations for the p-value and the statistical power of TFisher under general hypotheses are given. Optimal truncation and weighting parameters are studied based on Bahadur Efficiency (BE) and the proposed Asymptotic Power Efficiency (APE), which is superior to BE for studying the signal detection problem. A soft-thresholding scheme is shown to be optimal for signal detection in a large space of signal patterns. When prior information of signal pattern is unavailable, an omnibus test, oTFisher, can adapt to the given data. Simulations evidenced the accuracy of calculations and validated the theoretical properties. The TFisher tests were applied to analyzing a whole exome sequencing data of amyotrophic lateral sclerosis. Relevant tests and calculations have been implemented into an R package TFisher and published on the CRAN.

The methods are implemented in R and available on CRAN:

RNASeq aligners

books aligned.jpgI would say the match has now four competitors:


  • Pros: the classic, the first universally used, still widely adopted in pipelines all over the World, basically people keep using it so their new results are comparable to the old ones
  • Cons: slow (several CPU hours per alignment on a human genome with 10M reads), limited to 4Gbases genomes (so, no complex metatranscriptomics for him) and on their very website they say to use HISAT2


  • Pros: super, wicked fast, the standard used by ENCODE and the big RNASeq projects
  • Cons: uses a LOT of RAM, like really a lot (64GB for a human index)


  • Pros: fast and low RAM requirements. If you start from scratch, this is the aligner to pick
  • Cons: it’s still new and so many people don’t trust it yet


These are actually not strictly aligners, but rather transcript counters. I put them together for simplicity, but they are different softwares

  • Pros: high speed and low RAM requirements. Ideal for quick RNA-Seq gene expression measurements
  • Cons: they cannot do de novo transcript detection, sad. They don’t produce counts, which are the expected input for many downstream analysis tools. However, some tools are starting to accept Salmon/Kallisto outputs (in R you can use the transcript abundance import package tximport)


Quantifying RNA-Seq Transcripts

About ten years ago, when RNA-Seq was young, we struggled to make sense of the huge quantity of data that came out of Next-Generation Sequencers. The RNA-Seq pipelines were founded on the simple scheme:

Reads -> Alignments -> Quantification

The most popular RNA-Seq alignment tool, Tophat (now Tophat2) was actually built on the Bowtie aligner to focus on transcribed genomic regions (the Transcriptome), with the optional feature of aligning reads in the whole Genome, for de-novo transcript discovery.

Continue reading