computationalgenomics


anaconda
Anaconda
Anaconda is a freemium open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.

https://www.continuum.io



ASCAT
ASCAT
A tool for accurate dissection of genome-wide allele-specific copy number in tumors.

https://www.crick.ac.uk/peter-van-loo/software/ASCAT



aspera-connect
Aspera Connect
High-performance transfer plug-in

http://downloads.asperasoft.com/connect2/



ATLAS
ATLAS
A data warehouse for integrative bioinformatics

https://sourceforge.net/projects/atlasdb



bcftools
BCFtools
Utilities for variant calling and manipulating VCFs and BCFs

https://samtools.github.io/bcftools/bcftools.html



beagle
Beagle
Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.

https://faculty.washington.edu/browning/beagle/beagle.html



bedtools
bedtools
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format.

https://github.com/arq5x/bedtools2



bismark
bismark
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step.

https://github.com/FelixKrueger/Bismark



BisSNP
BisSNP
A bisulfite space genotyper & methylation caller

http://people.csail.mit.edu/dnaase/bissnp2011/



blast
BLAST
Basic Local Alignment Search Tool

https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download



boost
boost
Boost provides free portable peer-reviewed C++ libraries. The emphasis is on portable libraries which work well with the C++ Standard Library.

http://www.boost.org/



bowtie
Bowtie
An ultrafast, memory-efficient short read aligner.

http://bowtie-bio.sourceforge.net/index.shtml



bowtie2
Bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

http://bowtie-bio.sourceforge.net/bowtie2/index.shtml



breakdancer
BbreakDancer
A Perl/C++ package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads.

http://gmt.genome.wustl.edu/packages/breakdancer/



butter
butter
A wrapper for bowtie to produce small RNA-seq alignments where multimapped small RNAs tend to be placed near regions of confidently high density.

https://github.com/MikeAxtell/butter



bvatools
bvatools
BVATools -- Bam and Variant Analysis Tools

https://bitbucket.org/mugqic/bvatools



bwa
BWA
A software package for mapping low-divergent sequences against a large reference genome, such as the human genome.

http://bio-bwa.sourceforge.net/



bwakit
Bwakit
Bwakit is a self-consistent installation-free package of scripts and precompiled binaries, providing an end-to-end solution to read mapping.

https://github.com/lh3/bwa/tree/master/bwakit



cd-hit
CD-HIT
CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.

http://weizhongli-lab.org/cd-hit/



cellranger
Cell Ranger
Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.

http://www.10xgenomics.com/



cufflinks
Cufflinks
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

http://cole-trapnell-lab.github.io/cufflinks/



duk
duk
Duk is a fast, accurate,and memory efficent DNA sequence matching tool. It finds whether a query sequence partially or totally matches given reference sequences or not, but it does not give how a query matches a reference sequence. The common application is to group sequencing reads into small manageable chunks for downstream analysis in assessing quality of a sequencing run, which includes contaminant removal (with contaminant sequences known), organelle genome separation, and assembly refinement.

http://duk.sourceforge.net/



ea-utils
ea-utils
A command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

https://code.google.com/p/ea-utils/



emboss
emboss
EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.

http://emboss.sourceforge.net/



EPACTS
EPACTS
A versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

http://genome.sph.umich.edu/wiki/EPACTS



exonerate
Exonerate
A generic tool for pairwise sequence comparison.

http://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate



fastqc
FastQC
A quality control tool for high throughput sequence data.

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/



fasttree
FastTree
FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

http://www.microbesonline.org/fasttree/



fastx
FASTX-Toolkit
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

http://hannonlab.cshl.edu/fastx_toolkit/



FLASH
FLASH
FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies.

http://ccb.jhu.edu/software/FLASH/



gcc
gcc
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for theselanguages (libstdc++, libgcj,...). GCC was originally written as the compiler for the GNU operating system.

https://gcc.gnu.org/



gemini
GEMINI
Flexible framework for exploring genetic variation in the context of the wealth of genome annotations available for the human genome.

https://gemini.readthedocs.io/en/latest/



gemLibrary
GEM
The GEM library strives to be a true 'next-generation' tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task.

http://algorithms.cnag.cat/wiki/The_GEM_library



GenomeAnalysisTK
Genome Analysis Toolkit
Developed by the Data Science and Data Engineering group at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping.

https://software.broadinstitute.org/gatk/



ghostscript
Ghostscript
An interpreter for the PostScript language and for PDF.

http://ghostscript.com/



gnuplot
Gnuplot
Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms.

http://www.gnuplot.info/



hmmer
HMMER
HMMER is used for searching sequence databases for sequence homologs,and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).

http://hmmer.org/



homer
HOMER
HOMER offers tools and methods for interpreting Next-gen-Seq experiments. In addition to Genome Browser/UCSC visualization support and peak finding [and motif finding of course], HOMER can help assemble data across multiple experiments and look at positional specific relationships between sequencing tags, motifs, and other features. You do not need to use the peak finding methods in this package to use motif finding.

http://homer.salk.edu/homer/ngs/



htslib
HTSlib
A C library for reading/writing high-throughput sequencing data

http://www.htslib.org/



igv
IGV
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.

http://software.broadinstitute.org/software/igv/



igvtools
igvtools
The igvtools utility provides a set of tools for pre-processing data files. File names must contain an accepted file extension, e.g. test-xyz.bam.

http://software.broadinstitute.org/software/igv/igvtools



java
Java
Java technology is the foundation of most networked applications and is used worldwide to develop and deliver mobile and nested applications, games, web content and enterprise software.

http://download.java.net



jellyfish
JELLYFISH
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA.

http://www.cbcb.umd.edu/software/jellyfish/



kentUtils
kentUtils
UCSC command-line bioinformatic utilities, implemented by Jim Kent

https://github.com/ENCODE-DCC/kentUtils



kmergenie
KmerGenie
KmerGenie estimates the best k-mer length for genome de novo assembly.

http://kmergenie.bx.psu.edu/



KronaTools
KronaTools
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

https://sourceforge.net/p/krona/wiki/KronaTools/



lapack
LAPACK
LAPACK provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.

http://www.netlib.org/lapack/



longranger
Long Ranger
Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants.

http://www.10xgenomics.com/



MACS
MACS
Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa)

http://liulab.dfci.harvard.edu/MACS/



MACS2
MACS2
Novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites.

https://pypi.python.org/pypi/MACS2



mirdeep2
miRDeep2
miRDeep2 is a completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs. The tool reports known and hundreds of novel microRNAs with high accuracy in seven species representing the major animal clades. The low consumption of time and memory combined with user-friendly interactive graphic output makes miRDeep2 accessible for straightforward application in current reasearch.

https://www.mdc-berlin.de/8551903/en/



mpich
mpich
MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.

https://www.mpich.org/



mugqic_pipelines
mugqic_pipelines
MUGQIC pipelines consist of Python scripts which create a list of jobs running Bash commands. Those scripts support dependencies between jobs and smart restart mechanism if some jobs fail during pipeline execution. Jobs can be submitted in different ways: by being sent to a PBS scheduler like Torque or by being run as a series of commands in batch through a Bash script

https://bitbucket.org/mugqic/mugqic_pipelines



mugqic_R_packages
mugqic_R_packages
This library implements various -seq downstream analysis, as well as Nozzle-based reporting for mugqic_pipelines.

https://bitbucket.org/mugqic/rpackages/src/29b7650e3b38f2f1e16ffb529715268999ec3a14/gqSeqUtils/DESCRIPTION?at=master&fileviewer=file-view-default



mugqic_tools
mugqic_tools
Perl, python, R, awk and sh scripts use in several bioinfomatics pipelines of the MUGQIC PIPELINE.

https://bitbucket.org/mugqic/mugqic_tools



MUMmer
MUMmer
Ultra-fast alignment of large-scale DNA and protein sequences

http://mummer.sourceforge.net/



MUSCLE
MUSCLE
Program for creating multiple alignments of protein sequences.

http://www.drive5.com/muscle/downloads.htm



mutect
MuTect
Reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes

http://www.broadinstitute.org/cancer/cga/mutect



nextclip
NextClip
Tool for analysing reads from LMP libraries, generating a comprehensive quality report and extracting good quality trimmed and deduplicated reads

https://github.com/richardmleggett/nextclip/



OpenBLAS
OpenBLAS
Optimized BLAS library based on GotoBLAS2 1.13 BSD version

http://www.openblas.net/



pandoc
Pandoc
Universal document converter

http://pandoc.org/



parallel
parallel
Shell tool for executing jobs in parallel using one or more computers

http://www.gnu.org/software/parallel/



pbs-drmaa
pbs-drmaa
DRMAA for Torque/PBS Pro is implementation of Open Grid Forum DRMAA (Distributed Resource Management Application API) specification for submission and control jobs to PBS systems

http://apps.man.poznan.pl/trac/pbs-drmaa



perl
perl
Feature-rich programming language

https://www.perl.org/



picard
Picard
Set of tools (in Java) for working with next generation sequencing data in the BAM format

https://sourceforge.net/projects/picard/



pigz
pigz
Replacement for gzip that exploits multiple processors and multiple cores when compressing data

http://zlib.net/pigz/



prinseq-lite
PRINSEQ-lite
Used to filter, reformat, or trim your genomic and metagenomic sequence data

http://prinseq.sourceforge.net/



prodigal
Prodigal
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee.

http://prodigal.ornl.gov/



python
Python
Programming language that lets you work quickly and integrate systems more effectively

https://www.python.org/



qualimap
qualimap
Qualimap is a platform-independent application written in Java and R that provides both a Graphical User Interface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data.

https://bitbucket.org/kokonech/qualimap/downloads



R_Bioconductor
R_Bioconductor
face (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data

https://www.bioconductor.org/



ray
Ray
Parallel genome assemblies for parallel DNA sequencing

http://denovoassembler.sourceforge.net/



rnammer
RNAmmer
Predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences.

http://www.cbs.dtu.dk/services/RNAmmer/



rnaseqc
RNA-SeQC
Java program which computes a series of quality control metrics for RNA-seq data

https://www.broadinstitute.org/cancer/cga/rna-seqc



rsem
RSEM
Accurate quantification of gene and isoform expression from RNA-Seq data

http://deweylab.github.io/RSEM/



samtools
SAMtools
A suite of programs for interacting with high-throughput sequencing data.

http://www.htslib.org/



scalpel
Scalpel
Software package for detecting INDELs (INsertions and DELetions) mutations in a reference genome

http://scalpel.sourceforge.net/



ShortStack
ShortStack
Tool developed to process and analyze small RNA-seq data with respect to a reference genome, and output a comprehensive and informative annotation of all discovered small RNA genes

http://sites.psu.edu/axtell/?s=shortstack



signalp
SignalP
Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms

http://www.cbs.dtu.dk/services/SignalP/



smrtanalysis
SMRT-Analysis
Pacbio secondary analysis through a graphical or command-line user interface.

https://omictools.com/http://www.pacb.com/products-and-services/analytical-software/smrt-analysis/



snap
SNAP
General purpose gene finding program suitable for both eukaryotic and prokaryotic genomes

http://korflab.ucdavis.edu/software.html



snpEff
SnpEff
Variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes

http://snpeff.sourceforge.net/



sphinx
Sphinx
Sphinx is a tool that makes it easy to create intelligent and beautiful documentation of Python projects

http://sphinx-doc.org/



SPAdes
SPAdes
SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.

http://cab.spbu.ru/software/spades/



SplAdder
SplAdder
Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data. Briefly, the software takes a given annotation and RNA-Seq read alignments, transforms the annotation into a splicing graph representation, augments the splicing graph with additional information extracted from the read data, extracts alternative splicing events from the graph and quantifies the events.

http://www.raetschlab.org/suppl/spladder



star
STAR
Spliced Transcripts Alignment to a Reference. Based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure.

https://github.com/alexdobin/STAR



supernova
Supernova
Supernova is a software package for de novo assembly from Chromium Linked-Reads that are made from a single whole-genome library from an individual DNA source.

https://support.10xgenomics.com/



tabix
Tabix
Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates an index file ( in.tab.bgz.tbi or in.tab.bgz.csi ) when region is absent from the command-line.

http://www.htslib.org/doc/tabix.html



tmhmm
TMHMM
Predicting Transmembrane Protein Topology with a Hidden Markov Model

http://www.cbs.dtu.dk/~krogh/TMHMM/



tools
tools
Perl, Python, R, awk and sh scripts use in several bioinfomatics pipelines of the MUGQIC PIPELINES repo.

https://bitbucket.org/mugqic/mugqic_tools



tophat
TopHat
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

http://tophat.cbcb.umd.edu



TransDecoder
TransDecoder
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks

http://transdecoder.github.io



trimmomatic
Trimmomatic
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

http://www.usadellab.org/cms/?page=trimmomatic



trinity
Trinity
Trinity assembles transcript sequences from Illumina RNA-Seq data

https://github.com/trinityrnaseq/trinityrnaseq/wiki



trinotate
Tinotate
A comprehensive annotation suite for functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases (eggNOG/GO/Kegg databases).

https://trinotate.github.io/



ucsc
UCSC tools
UCSC genome browser 'kent' bioinformatic utilities

http://hgdownload.cse.ucsc.edu/admin/jksrc.zip



usearch
USEARCH
Ultra-fast search for high-identity top hit or hits from sequence files

http://drive5.com/usearch/



VarScan
VarScan
VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454 and similar instruments. It can be used to detect different types of variation: Germline variants, multi-sample variants, somatic mutations and somatic copy number alterations

http://varscan.sourceforge.net/



vcftools
VCFtools
A program package that can be used to perform the following operations on standard variants (VCF) files: Filter out specific variantsCompare filesSummarize variantsConvert to different file typesValidate and merge filesCreate intersections and subsets of variants

http://vcftools.sourceforge.net/



verifyBamID
VerifyBamID
Verifies whether the reads in particular file match previously known genotypes for an individual (or group of individuals), and checks whether the reads are contaminated as a mixture of two samples. verifyBamID can detect sample contamination and swaps when external genotypes are available. When external genotypes are not available, verifyBamID still robustly detects sample swaps

http://genome.sph.umich.edu/wiki/VerifyBamID



ViennaRNA
ViennaRNA
The ViennaRNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

https://www.tbi.univie.ac.at/RNA/



vsearch
VSEARCH
VSEARCH supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering and conversion.

https://github.com/torognes/vsearch



vt
vt
A tool set for short variant discovery in genetic sequence data.

http://genome.sph.umich.edu/wiki/vt



weblogo
WebLogo
A tool for creating sequence logos from biological sequence alignments. It can be run on the command line as a standalone webserver, as a CGI webapp, or as a python library.

http://weblogo.threeplusone.com/



wgs-assembler
Celera Assembler
A de novo whole-genome shotgun (WGS) DNA sequence assembler. It reconstructs long sequences of genomic DNA from fragmentary data produced by whole-genome shotgun sequencing

http://wgs-assembler.sourceforge.net