Thursday, September 26, 2013

Bioinformatics Journals and Tutorials

from http://cdwscience.blogspot.com/


Here are some of the journals that I check on a weekly basis.  I would strongly recommend subscribing to the relevant RSS feeds (using something like Google Reader).


Bioinformatics / Computational Biology:


Genomics:


Other Journals:
  • Nature Methods and Nature Biotechnology - not specific for bioinformatics articles, but many important programs / protocols are published here
  • PLOS ONE - general subject journal, but it has some good bioinformatics articles
  • peerJ - similar to PLOS ONE, but utilizes a membership system (so, you pay by author instead of by article)
  • NatureSciencePNAS, etc.
Tutorials / Blogs:

  • OpenHelix - tutorials for popular programs; some free, some require subscription
    • Open Helix Blog - this covers tutorials and FAQs for common bioinformatics tools. I mostly read it for the Friday SNPpets (collection of popular weekly twitter feeds)
  • Omixon Blog - Bioinformatics company that provides free tutorials for common tools
  • Core Genomics - "personal blog written by James Hadfield who runs a Genomics core facility Cambridge" - lots of interesting technical details about next-generation sequencing
  • MassGenomics - medical genomics blog by Dan Koboldt, a staff scientist at the Genome Institute at Washington University. Consistently great article reviewers.
  • Genomes Unzipped - popular blog run by several genomics researchers. I would argue that it was made popular by Daniel McArthur (who doesn't post there as often now), but there are still other contributors that keep the blog up to date.
  • Getting Genetics Done - a well-maintained blog written mostly by Stephen Turner (Bioinformatics Core director at University of Virginia). Focuses mostly on providing technical suggestions.
  • NIH Bioinformatics Support System - probably doesn't have a feed, but contains useful tutorials

Bioinformatics 101: Literature / Text Mining

from http://cdwscience.blogspot.com/


Search Engines:

  • PubMed
    • popular, free tool provided by NCBI to search biomedical journal articles
    • includes links to connected NCBI resources (GEO, RefSeq, etc.)
  • Google Scholar
    • popular, free tool to search the scientific literature
    • provides citation information
    • allows authors to create their own bibliographies (which provide author-level citation metrics) 
Gene-Centric Information:
  • NCBI Gene
    • free tool curated by the NCBI
    • includes literature citations, Gene Ontology categories, alternative and official gene symbols, etc.
  • iHOP (Information Hyperlinked Over Proteins)
    • free text-mining program that predicts interactions between genes
  • PolySearch
    • free text-mining program that predicts interactions between genes, diseases, drugs, metabolites, SNPs, pathways, and/or tissues
  • IPA (Ingenuity Pathway Analysis)
    • commercial program curating information about genes, metabolites, etc.
    • most popular use is for functional enrichment analysis, but it can also be used as a general tool for searching the literature

Bioinformatics 101: General Coding Information

from http://cdwscience.blogspot.com/


UNIX:



Perl:


Python:


R:


Other:

Short Read Aligners

from http://cdwscience.blogspot.com/


General Purpose Aligners:


  • BWA
  • Bowtie
  • Novoalign
    • commercial software covering a variety of alignment needs (RNA-Seq, miRNA-Seq, DNA-Seq, BS-Seq, etc.)
    • some functionality is also available in the free version

RNA-Seq Aligners:


BS-Seq Aligners:

DNA Sequence Analysis

from http://cdwscience.blogspot.com/

Genome Visualization Tools:

  • UCSC Genome Browser
    • popular, free genomic visualization tool for a wide variety of organisms
    • also serves as a database for genomic sequences and features
  • Integrative Genomics Viewer (IGV)
    • very efficient tool for visualizing almost any type of genomic data
    • open-source
  • Gbrowse - open-source genome browser

Sequence Alignment:

  • BLAST - search for similar DNA sequences in GenBank
  • ClustalW - multi-species genome alignment
  • TCoffee - multi-species genome alignment
  • Mauve - multi-species alignment and visualization tool to detect segments of conserved sequence

General DNA-Seq Tools:

  • samtools
    • popular, free tool to extract data from .SAM alignment files
    • Picard - java-based version of samtools
    • see short read aligners necessary for upstream analysis
  • Galaxy
    • open-source, cloud-based suite of popular sequence analysis tools (including deep sequencing analysis 
  • GATK
    • toolkit for analysis of next-generation sequencing data
    • previously open-source, but now requires a commercial license
  • CLC Bio Genomics Workbench
    • commercial software covering a wide variety of applications such as sequence alignment, SNP/DIP detection, de novo assembly, etc.
    • CLC Bio Genomics Workbench also has the functionality of CLC Bio Main Workbench for standard sequencing analysis (cloning, primer design, etc.)
      • both are commercial programs that require a purchased license
  • Nexus Copy Number
    • commercial software for analysis of copy number alterations
    • works for a variety of microarray platforms as well as for deep sequencing analysis
  • SeqAnswers Software List

Transcription Factor Motif Analysis:

  • TRANSFAC
    • database of transcription factor motifs
    • a subscription is required to access the most recent annotations, but older versions are freely available
    • A plug-in is available within CLC Bio (a commercial program for genomics analysis)
  • JASPAR
    • free database of transcription factor motif sequences
  • TFsitescan
    • free tool to search for transcription factor motifs
  • MEME Suite
    • tools for ab initio motif finding
  • rVista / VISTA Suite
    • tool for searching motifs conserved across closely related organisms
  • TESS
    • transcription factor search system
    • unfortunately, this tool now has to be run locally

Mutation Analysis:
  • VarScan
    • open-source variant calling tool
    • see short read aligners necessary for upstream analysis
    • usually also requires something like samtools to create input file
  • SeattleSNPs Genome Variation Server
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
  • ANNOVAR (pronounced Anno-Var)
    • tool to filter candidate variants (based upon frequency, predicted function, etc.)
    •  wANNOVAR is the web-based interface
  • GWAS Catalog
    • NHGRI database of SNP-based phenotypic / disease associations
  • Promethease
    • open-source tool for personalized genomic analysis
    • it is technically free to use, but you can pay $5 to get your report more quickly
    • uses annotations from SNPedia
  • Interpretome
    • Genome interpretation tool similar to Promethease
    • In my opinion, nicer interface.  However, it currently only works with raw data from 23andMe and  Lumigenix.
  • SNPedia
    • crowd sourced annotation of SNP associations
    • includes some publicly available genomes
ChIP-Seq Tools:


de novo Assembly Algorithms:


Other Tools:
  • Primer3 - PCR primer design
  • Repeatmasker - identifies repetitive elements within a DNA sequence
  • Webcutter - detects restriction enzyme sites in a DNA sequence
  • Translate - a tool that allows translation of nucleotide (DNA / RNA) sequence into a protein sequence

Image Analysis

from http://cdwscience.blogspot.com/

Microscopy Image Analysis / Visualization:

  • ImageJ - NIH image viewer and analysis tool
    • Fiji - Fiji Is Just ImageJ
      • ImageJ wrapper containing a number of plug-ins for advanced analysis
  • Cell Profiler
  • Cell Profiler Analyst
    • tool for high-throughput image analysis
  • LSM Image Viewer
    • free software to view .lsm images
    • more advanced software is commercially available

Genomic Databases

from http://cdwscience.blogspot.com/

Genomic Annotations:

Systems Biology Databases:
  • Gene Ontology (GO)
    • Database of functional annotations for protein-coding genes
  • KEGG - Kyoto Encyclopedia of Genes and Genomes
    • primarily used as a pathway database
  • IntAct
    • database for protein-protein interactions
  • Reactome
  • Regulome Explorer - software to visualize integrative genomic data from the TCGA project
  • BioGRID - database of genetic and protein interactions
  • MINT - protein-protein interaction database
  • STRING - database for known and predicted protein-protein interactions
  • STITCH: database of drug-protein interactions

Microarray / Sequencing Databases:

  • GEO - microarray database
  • ArrayExpress - microarray database
  • SRA - sequencing archive; entries are often also indexed in GEO
  • BioGPS - similar to NCBI Gene, but also includes normal tissue expression levels (from microarray data)
  • TiGER - tissue-specific gene expression database
  • CellMiner - query NCI-60 cell line data
  • TCGA Data Portal - integrative genomic data for large cancer datasets

Genomic Variation Databases:


Disease-Centric Databases:

  • General
    • OMIM - Online Mendelian Inheritance in Man
      • database of human diseases
    • SIDER - EMBL side effect database
  • Cancer
    • TCGA - The Cancer Genome Atlas
      • includes microarray and sequencing data
    • Oncomine
      • database of gene expression and copy number data from patients
      • basic access is free, but license is required for premium access
    • caArray - NCI Cancer Database

Protein Databases:

Protein Analysis

from http://cdwscience.blogspot.com/

Protein Domain / Structure / Homology Tools:



3D-Structure Viewers:

Mass Spectrometry:

  • PRIDE - mass-spectrometry sample database managed by EMBL-EBI
  • PeptideAtlas - database for mass spectrometry data - includes links to relevant publications
  • MaxQuant - popular tool for mapping proteomics spectra from mass spectrometry data
  • ProteinProphet - another popular tool for mapping proteomics spectra to proteins
  • DanteR - R implementation of the popular DAnTE algorithm for differential expression of mass spectrometry proteomics data
  • LabKey / CPAS - open-source LIMS + basic analysis pipeline
  • PIR - UniProt Protein Information Resource: includes links to databases and peptide mapping tools
Other:

  • STITCH: database of drug-protein interactions
  • PaxDb - database of protein expression across different tissues and organisms
  • MOPED - database of protein expression across different tissues and model organisms
  • HIPPIE: database of human protein-protein interactions, integrating data from several other databases

Gene Expression Analysis

from http://cdwscience.blogspot.com/

Differential Expression Tools:

  • R - statistical programming language
    • most common statistical functions (t-test, ANNOVA, etc.) are built in
    • Bioconductor - suite of R packages used for bioinformatic analysis
      • limma - most commonly used differential expression tool for microarray analysis
      • edgeR - R package for RNA-Seq differential expression analysis
      • DEseq - R package for RNA-Seq differential expression analysis
  • cuffdiff
    • differential expression package within cufflinks
    • cufflinks provides transcript abundance calculations
    • strictly speaking, the developers recommend using cuffdiff for differential expression, although it is relatively common to use edgeR, DEseq, etc. for differential expression following mRNA quantification via cufflinks
  • Java TreeView
    • free tool for clustering microarray data
  • OCplus - R package for statistical power calculations (and differential expression) for microarray studies
  • Scotty - web-based tool for statistical power calculations for RNA-Seq data
  • Partek Genomics Suite
    • Commercial program that includes a number of workflows, such as microarray gene expression and RNA-Seq analysis
    • Includes statistics for differential expression analysis as well as tools for downstream functional analysis and upstream quality control assessment

Transcription Factor Motif Analysis:

  • IPA Upstream Regulator Analysis
    • Commercial tool that searches for enrichment of known targets for regulatory genes and molecules (such as transcription factors)
    • Can also detect if targets are consistent with activation or inhibition of the regulator
  • SCOPE
    • free tool that identifies upstream motifs enriched for gene lists
    • works on a wide variety of species, so it is useful for motif finding in less commonly studies organisms
  • Whole Genome rVISTA - calculate enrichment of transcription factor motifs predicted based upon evolutionary conservation
  • TRED (Transcriptional Regulatory Element Database) - database from CSHL for transcription factors.  Includes target gene lists for transcription factors in human, mouse, and rat
  • TRANSFAC - database of transcription factor motif sequences.  There are commerical and open-source versions of the database
  • JASPAR - open-source database of transcription factor motif sequences
General RNA-Seq Information:

Microarray Annotation Resources:
  • NetAffx
    • Affymetrix resource for probe design information
    • registration is free but required
  • GeneAnnot
    • an alternative resource for Affymetrix probe annotations

Pathway Analysis

from http://cdwscience.blogspot.com/

Gene List Enrichment Tools (Requires Differenital Expression Analysis):


Other Systems-Level Analysis Tools (No Upstream Filtering Necessary):

RNA Sequence Analysis

from http://cdwscience.blogspot.com/

miRNA Resources:

  • MirBase
    • free database of miRNA sequences
  • TarBase
    • free database of experimentally validated miRNA targets
  • miRecords
    • database of miRNA-target interactions
  • IPA miRNA-target analysis
    • commercial database that includes free databases as well as a proprietary list of miRNA-target interactions found using text-mining of the literature
  • TargetScan
    • free tool to predict miRNA targets
  • sylArray
    • tool to predict miRNA targets from gene expression data.  Uses gene ranking, so it doesn't require mRNA differential expression (although you will need to check that the miRNA regulator is differentially expressed)
In general, I think you really need both miRNA expression and mRNA expression data to get reliable results when trying to identify miRNA-target interactions 

RNA Secondary Structure:


RNA Domain Homology:

  • Rfam
    • may be helpful in predicting function of a non-coding RNA of unknown function

de novo Assembly Algorithms (RNA-Seq):

  • Oases
  • Trans-ABySS
  • Trinity
  • eXpress - mRNA quantification tool that works with both de novo assembly transcripts (as well as transcripts from direct genome alignment)
RNA-Seq QC

  • FASTX-Toolkit - popular suite of tools to quantify and manipulate sequences .fastq and .fasta files
  • samtools - popular suite of tools to quantify and reformat .sam/.bam files
  • Picard - Java-based implementation of samtools;CollectRNASeqMetrics can produce a coverage plot (normalized per start to end of transcript)
  • RSeQC - package to produce a variety of RNA-Seq QC figures


RNA-Seq Analysis

DNA methylation

from http://cdwscience.blogspot.com/

Enrichment-Based Analysis Tools:


Bisulfite-Conversion Based Analysis Tools: