September 6th, 2016

Screen Shot 2016-09-06 at 9.03.56 AMThe National Institutes of Health (NIH) Big Data to Knowledge (BD2K) program is pleased to announce The BD2K Guide to the fundamentals of Data Science, a series of online lectures given by experts from across the country covering a range of diverse topics in data science. This course is an introductory overview that assumes no prior knowledge or understanding of data science.

The series starts Friday, September 9th and will run all year once per week at 12noon-1pm ET.

If you would like to join the meeting, please go to the BD2K Guide web page for the most up-to-date computer or mobile logins. 

This is a joint effort of the BD2K Training Coordinating Center (TCC), the BD2K Centers Coordination Center (BD2KCCC), and the NIH Office of the Associate Director of Data Science. For up-to-date information about the series and to see archived presentations, go to this website.

Tentative Schedule

9/9/16 Introduction to big data and the data lifecycle (Mark MusenStanford)

9/16/16 SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences)

9/23/16 Finding and accessing datasets, Indexing and Identifiers (Lucila Ohno-Machado, UCSD)
9/30/16 Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics)
10/7/16 Ontologies (Michel Dumontier, Stanford)
10/14/16 Metadata standards (Zachary Ives, Penn)

10/21/16 Provenance (Suzanne Sansone, Oxford)


11/4/16 Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF)
11/11/16 No lecture- Veteran’s Day
11/18/16 Social networking data (TBD)
12/2/16 Data wrangling, normalization, preprocessing (Joseph PiconeTemple)
12/9/16 Exploratory Data Analysis (Brian Caffo, Johns Hopkins)

12/16/16 Natural Language Processing (Noemie Elhadad, Columbia)

1/6/17 SECTION 3: COMPUTING OVERVIEW (Dates tentative)

1/13/17 Workflows/pipelines
1/20/17 Programming and software engineering; API; optimization
1/27/17 Cloud, Parallel, Distributed Computing, and HPC

2/3/17 Commons: lessons learned, current state


2/17/17 Smoothing, Unsupervised Learning/Clustering/Density Estimation
2/24/17 Supervised Learning/prediction/ML, dimensionality reduction
3/3/17 Algorithms, incl. Optimization
3/10/17 Multiple testing, False Discovery rate
3/17/17 Data issues: Bias, Confounding, and Missing data
3/24/17 Causal inference
3/31/17 Data Visualization tools and communication

4/7/17 Modeling Synthesis


4/14/17 Open science
4/21/17 Data sharing (including social obstacles)
4/28/17 Ethical Issues
5/5/17 Extra considerations/limitations for clinical data
5/12/17 reproducibility
5/19/17 SUMMARY and NIH context
Other Upcoming BD2K Opportunities 
  • PLOS Computational Biology Symposium, September 16, 9:30am – 4:00pm ET, on the NIH Bethesda campus. The agenda includes two discussion panels served by PLOS Computational Biology editors from a range of fields. The morning panel will discuss the “Biggest Challenges and Greatest Opportunities in Computational Biology over the Next 10 Years”. The afternoon panel will discuss “How Computational Biology Will Affect Human Health”. Register here. For those unable to attend in person, the event will be webcasted here.
  • The Advanced Computational Neuroscience Network (ACNN) and NSF presents: Midwest Workshop on Big Neuroscience Data, Tools, Protocols & Services, September 20-21, Ann Arbor, MI. Register here.
  • Joint NSF/NIH Training Opportunity: NOT-EB-16-008 “Notice of NIH/BD2K Participation in the Joint NSF/NIH Initiative on Quantitative Approaches to Biomedical Big Data (QuBBD)”. Proposals due September 28, 2016. For more information, please see this website.
  • NIH Request for Information (RFI): NOT-OD-16-133 “Metrics to Assess Value of Biomedical Digital Repositories”. The NIH seeks feedback from a broad range of repository stakeholders, including researchers, data scientists and curators, repository managers, and standards or tool developers. Responses must be submitted to by September 30, 2016.
