Center for Data Science & Informatics
At the Center for Data Science and Informatics (CDSI), we are working to advance the applications of data science and informatics toward improved biomedical research and healthcare at NUCATS and our clinical partners. Our vision is to create an integrated healthcare and research environment in which all available data are optimally leveraged for knowledge discovery and improved health.
CDSI has been a major driver of NUCATS’ model of sustainable innovation, seeking out research barriers that can be overcome with informatics tools, developing new software and data platforms, testing usability and impact and then deploying these novel tools to the NUCATS community and to other institutions across the country. Visit the Resources section for a full overview of the services and software we offer.
About Data Science
The amount of data produced is exploding. It is estimated that 2.5 quintillion (1 followed by 18 zeros) bytes of data are created every day. The volume of data is growing so quickly that 90 percent of the world's data has been produced in the past two years. This explosion of data is also occurring in all areas of biomedical research. A single human genome sequence contains roughly 6 billion base pairs. A single research study may require analyzing the genome sequences of tens of thousands of patients. Processing and managing these data are at the forefront of modern science, including the capture, curation, storage, searching, sharing, transferring and analysis of these huge data sets. New approaches will help to expand the impact of all of the informatics technologies on health and disease.
Classically, scientific progress has been anchored on two pillars: Theory and Experimentation. Recently, the Big Data revolution has hit science as the sheer volume of scientific data increases exponentially. Advances in scientific computing technology, together with Big Data, have created a third pillar: Computation. Data Science brings together these three pillars to accelerate discoveries. Recently, the Harvard Business Review declared that the data scientist is the "sexiest job of the 21st century." This role brings together deep domain knowledge, a solid foundation in statistical and mathematical methods, advanced computation and visualization technology and a desire to tackle "wicked problems."
Education & Training
CDSI offers formal coursework through a variety of degree programs at the master’s and PhD levels. We also offer weekly brown bag seminars that rotate among the following topics:
- Bioinformatics Journal Club (coordinated by Ramana Davuluri)
- Data Science and Informatics Series (coordinated by Daniel Fort)
- Feinberg School of Medicine/Enterprise Data Warehouse Analytics (coordinated by Daniel Schneider)
CDSI offers introductory and training sessions for many of our developed and hosted software tools. Please see our calendar for upcoming training sessions. OpenHelix also provides a research portal to find the most relevant genomics resources and training on those resoucres. Through Galter Library, we provide access to a suite of bioinformatics tutorials.
Please see our calendar at the bottom of this page for all upcoming events and trainings.
The CDSI is advised by a steering committee composed of 20 experts drawn from across the institution and chaired by Justin Starren. Members of the steering committee are split among four working groups, reflecting the directions of CDSI future growth:
- Rex Chisholm
- Leonidas Platanias
- Massimo Cristofanilli
- Beth McNally
- Ramana Davuluri
- Alfred George
- Ali Shilatifard
- William Lowe
- Andrea Dunaif
- Denise Scholtens