Track 8: Data Visualization and Exploration Tools

With a sharp increase in the volume and complexity of big data sets for research and drug discovery labs, data visualization is needed to clearly express the complex patterns. It is more important than ever to develop data visualization and exploration tools alongside the rest of the analytics, as opposed to later in the game. The Data Visualization & Exploration Tools track will address ways to not only develop, design, and implement visualization tools in genomics, drug discovery, clinical development, and translational research, but also address real-world case studies where these tools have been successfully used.

Final Agenda

Tuesday, April 16

W2. Data Visualization to Accelerate Biological Discovery

W9. Research Project Management

* Separate registration required.

Wednesday, April 17

11:00 Data-Driven Healthcare: Visual Analytics for Exploration and Prediction of Clinical Data

Adam Perer, PhD, Assistant Research Professor, School of Computer Science, Human-Computer Interaction Institute, Carnegie Mellon University

Healthcare institutions are now recording more electronic health data about patients than ever before. Many hope that if researchers tap into this real world observational data, the collective experience of the healthcare system can be leveraged to unearth insights to improve the quality of care. My research focuses on building interactive visual systems that leverage machine learning so clinicians and researchers can derive such insights.

11:30 Interactive Concept Learning for Visual Exploration of Epigenetic Patterns

Fritz Lekschas, PhD Candidate, Hanspeter Pfister Lab, Computer Science, Harvard University

Epigenetic datasets contain rich sets of patterns but searching and exploring nonstandard patterns is often time consuming and visual feedback is needed for verification of the results. I am going to present Peax, a new web-based tool for interactively training a classifier that learns your notion of interestingness and operates on deep learning-based unsupervised featurizations of the epigenetic datasets.

Hector Corrada Bravo, PhD, Assistant Professor, Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park

1:55 Computational Steering of Interactive Exploratory Analysis of Genomics Data

Hector Corrada Bravo, PhD, Assistant Professor, Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park

Interactive visual analysis integrated with computational analyses has gained popularity in genomics. We have previously built interactive analysis systems that are efficient and effective for exploratory analyses of large datasets arranged over richly structured features. Here, we discuss the next generation of tools where tighter integration between visualization and computation is used to guide and steer data analysts exploration based on the results of computations of interest.

2:25 MERmaid: A WebGL-Based Tool for Exploring Spatially Resolved Single-Cell Transcriptomics Data

Jean Fan, PhD, Postdoctoral Fellow, Chemistry and Chemical Biology, Harvard University

Recent advancements in highly multiplexed spatially-resolved single-cell gene expression measurements demand scalable computational tools to assist in data exploration and hypothesis-generation. We present MERmaid, an open-source visualization tool built on WebGL, that provides a rich interface for rapid exploration of spatially-resolved transcriptomics data. We apply MERmaid to visualize cell-type heterogeneity in tissues as well as intra-cellular heterogeneity in mRNA localization in MERFISH data. MERmaid is available online at

4:00 FEATURED PRESENTATION: Expanding Access to Dynamic Clinical Biomarker Visualizations: Automation, Integration and Exploration of Data Lakes

Philip Ross, PhD, Head of Translational Bioinformatics Data Science, Translational Medicine, BMS

With biomarker samples from thousands of patients across multiple indications, how do we detect meaningful clinical biomarker results in a reasonable timeframe and at reasonable levels of effort? Dynamic visualizations with up-to-date data provide evolving insights. We are automating the integration of clinical and biomarker results in data lakes and leveraging dynamic visualizations to give the best possible access and exploration of emerging clinical biomarker data signals and trends.

4:30 Universal Spotfire Template (UniSpoT) for Clinical Biomarker Discovery

Sittichoke Saisanit, PhD, Principal Scientist, Data Science, Pharma Research and Early Development Informatics (pREDi), Roche Innovation Center New York

UniSpoT is the Roche pRED standardized visual analytics platform for clinical biomarker data. Enabled by the underlying BRAVE data process, it addresses the increasing and unmet business need for near real-time access to biomarker data, integrated with clinical data for exploratory analysis. It has been used for early clinical studies which have open-label design (e.g. phase 1b). Using UniSpoT, scientists can gain earlier and better understanding of biology, generate hypothesis, improve biomarker strategy and quality of data collection.

10:40 Longitudinal and Context Visualization for Precision Oncology

Jeremy Goecks, PhD, Assistant Professor of Biomedical Engineering and Computational Biology, Oregon Health and Science University

The goal of precision oncology is to find effective treatments for each patient’s cancer based its molecular profile. Visualization plays a key role in precision oncology, helping to understand and integrate longitudinal and complex data analyses and then communicate results to physicians, patients, and other stakeholders. We will discuss our work applying visualization for precision oncology and identify opportunities and challenges for visualization in precision oncology going forward.

11:10 Sharing and Visualizing Cancer Genomics Datasets Using cBioPortal

Carlos Rios, PhD, Senior Research Investigator, Computational Genomics - Translational Medicine, Bristol-Myers Squibb

BMS has been using cBioPortal for visualizing cancer genomics datasets since early 2016, supported by The Hyve, an open source bioinformatics company based in The Netherlands. The cBioPortal server runs on Amazon AWS and is tied to the company’s Active Directory for authentication and uses Keycloak for authorization. Data can be loaded through a pipeline that takes input files from Amazon S3. For BMS, cBioPortal was extended with support for rich metadata and canvasXpress integration.

Baohong Zhang, PhD, Director of Genome Informatics, Translational Biology, Biogen

2:00 Creating Effective Visualizations – Design and Choreography for the Chaos of Data

Martin Krzywinski, Staff Scientist, Genome Sciences Centre, BC Cancer Research Centre

The process of design, which is a kind of choreography for the page, can be of great help in assembling individual data visualizations into a cohesive explanation across many levels of detail. In the same way that visualizations are a way to organize data, design is a way to organize visualizations. I will share with you my experiences in combining science, visualization and design to create explanations, promote engagement, inspire imagination and, where possible, provide visual support in the often vexing process of research.

2:30 Big Data to Insights Visually

Baohong Zhang, PhD, Director of Genome Informatics, Translational Biology, Biogen

How to utilize the most advanced JavaScript visualization tool kits, such as D3.js, canvasXpress.js and canvasDesigner.js to empower everyday scientists to extract biological insights from ever growing data sets.

3:00 CanvasXpress: An R-Library Data Visualization for Reproducible Research

Isaac M. Neuhaus, PhD, Director, Computational Genomics, BMS

CanvasXpress is a standalone JavaScript library used for visualization of genomics and non-genomics data sets. It has a user-friendly and unobtrusive interface to allow users to explore data sets and customize their visualizations. It also has a sophisticated mechanism to track all user interactions and modifications, which makes it ideal for use in Reproducible Research. More information can be found at

3:30 Interactive Visualization of Person-Generated Health Data for Precision Health: Challenges & Possibilities

Arlene E. Chung, MD, MHA, MMCi, Associate Director of Health & Clinical Informatics, University of North Carolina School of Medicine; Lead Informatics Physician for Patient Engagement, UNC Health Care

While there is much interest in remote monitoring using person-generated health data (PGHD) from wearables and other data streams, transforming these data into meaningful and actionable insights for precision health is an open challenge as heterogeneity, missingness, and sparsity are inherent within these data. This presentation focuses on how interactive data visualization approaches could allow clinicians and patients to better understand the impact of lifestyle on symptoms and health outcomes.

