Skip to main content


This course is designed for students who seek to develop skills in statistical computing using the R programming language. STATA for statistical analysis will be introduced briefly. Students will learn to use R for data manipulation, reporting generating, data presentation, and data tabulation and summarization. Topics will include organization and documentation of data, input and export of data sets, methods of cleaning data, tabulation and graphing of data, programming capabilities, and an introduction to simulations and bootstrapping. Students will also be introduced to LaTeX, Markdown and knitr for report writing. Fall. [2] (Beck and Fonnesbeck)

Principles of Modern Biostatistics is a foundational first course in graduate level statistics designed to develop a richer understanding of one- and two-sample statistical methods and statistical philosophies. It explores the operational characteristics of frequently used statistical methods. Through simulation studies conducted in R and STATA, students will explore questions such as: What are the true coverage rates of commonly used confidence interval methods for proportions? What is the impact of sampling from various non-normal distributions on the true Type I Error rate for hypothesis testing methods? How do various testing methods compare in terms of power in a variety of settings? How do traditional hypothesis testing methods compare and contrast with methods in the Bayesian and Likelihoodist paradigms? This course is intended for graduate students in programs for biostatistics, biomedical informatics, and epidemiology, and by students in other programs who have a strong undergraduate-level background in statistics. Lab required [1]. Prerequisite: Calculus I. Fall. [3]. Greevy.

This is the second in a two-course series designed for students who seek to develop skills in modern biostatistical reasoning and data analysis. Students learn modern regression analysis and modeling building techniques from an applied perspective. Theoretical principles will be demonstrated with real-world examples from biomedical studies. This course requires substantial statistical computing in software packages STATA and/or R. The course covers regression modeling for continuous outcomes, including simple linear regression, multiple linear regression, and analysis of variance with one-way, two-way, multi-way, and analysis of covariance models. Data types to be modeled include continuous outcomes (classic regression models), binary outcomes (logistic models), ordinal outcomes (proportional odds models), count outcomes (Poisson/negative binomial models), and time to event outcomes (Kaplan-Meier curves, Cox proportional hazard modeling). Incorporated into the presentation of these models are subtopic topics such as regression diagnostics, nonparametric regression, splines, data reduction techniques, model validation, parametric bootstrapping, and methods for handling missing data. Lab required [1]. Prerequisite: BIOS 6311 or equivalent. Spring. [3]. Johnson.

This course covers the statistical aspects of study designs, monitoring, and analysis. Emphasis is on studies of human subjects, i.e. clinical trials. Topics include: principles of measurement, selection of endpoints, bias, masking, randomization and balance, blocking, study designs, sample size projections, interim monitoring of accumulating results, flexible and adaptive designs, sequential analysis, analysis principles, data and safety monitoring boards (DSMB), and the ethics of animal and human subject experimentation. Spring [3] (Koyama)

This is the second in a two-course series designed to impart the fundamental probabilistic and inferential framework in statistical probability and inference. Students learn the key tools of mathematical statistics (likelihood, estimating equations, information quantities, etc.), popular methods of inference (hypothesis testing, significance testing, confidence intervals), the schools of inferential philosophy (Frequentist, Bayesian and Likelihood) and their associated controversies. Topics include: delta method, sufficiency, minimal sufficiency, ancillarity, completeness, conditionality principle, Fisher’s Information, Cramer-Rao inequality, hypothesis testing (likelihood ratios test, most powerful test, optimality, Neyman-Pearson lemma, inversion of test statistics), Likelihood principle, Law of Likelihood, Bayesian posterior estimation, Interval estimation (confidence intervals, support intervals, credible intervals), basic asymptotic and large sample theory, maximum likelihood estimation, re-sampling techniques (e.g., bootstrap). Lab required [1]. Spring [3] (Blume)

This course provides an introduction to methods for time-to-event data with censoring mechanisms. Topics include: ideas of censoring and truncation, nonparametric approaches (e.g., Kaplan-Meir, log-rank), semi-parametric approaches (e.g., Cox model, extended Cox model with time-dependent covariates), parametric approaches (e.g., Weibull, gamma), multivariate survival model (e.g. frailty model, marginal model), model diagnostics, and sample size calculation for time-to-event data. Focus is on fitting the models and the relevance of those models for the biomedical application. Lab required [1]. Fall [3] (Chen, Q.)

The first part of the course presents the following elements of multivariable predictive modeling for a single response variable: using regression splines to relax linearity assumptions, perils of variable selection and over-fitting, where to spend degrees of freedom, shrinkage, imputation of missing data, data reduction, interaction surfaces, and measuring predictive accuracy. Then a default overall modeling strategy will be described. This is followed by methods for graphically understanding models (e.g., using nomograms) and using re-sampling to estimate a model’s likely performance on new data. Then, the R rms package, which facilitates most steps of the modeling process, will be overviewed. Next, statistical methods related to longitudinal regression, binary logistic models, ordinal regression, and survival models will be covered. Along the way, various general features of maximum likelihood estimation and bootstrapping are explored. Comprehensive case studies will be presented: analysis of efficacy in a longitudinal randomized clinical trial using generalized least squares, modeling hemoglobin A1c from NHANES data, an exploration of the survival of Titanic passengers, flexible modeling of ordinal clinical outcomes, developing a survival time model for critically ill patients, and developing a Cox model in chronic disease. Students undertake a variety of in-depth analyses incorporating methods of reproducible research. Spring [3] (Harrell)

Students are exposed to a theoretical framework for linear and generalized models. The first half of the semester covers linear models: multivariate normal theory, least squares estimation, limiting chi-square and F-distributions, sum of squares (partial, sequential) and expected sum of squares, weighted least squares, orthogonality, Analysis of Variance (ANOVA). Second half of the semester focuses on generalized linear models: binomial, Poisson, multinomial errors, introduction to categorical data analysis, conditional likelihoods, quasi-likelihoods, model checking. Lab required [1]. Fall [3] (Kang)

Covers the classic repeated measures model, the general linear model for longitudinal data, linear and generalized linear mixed effects models, and for generalized linear models for longitudinal data, distinguishes marginal and conditional models. Semi-parametric (generalized estimating equations) and parametric (generalized least squares and likelihood-based mixed effects models) estimation and inference are central to the course. Advanced topics include missing data techniques, causal inference, marginalized regression model, and study design considerations for longitudinal data. Lab required [1]. Spring [3] (Schildcrout)

Students are exposed to a variety of problems that arise in collaborative arrangements. The course’s goal is to develop the knowledge and skills necessary to successfully interact with research collaborators. The importance of developing communicative, ethical, and professional skills to establish a successful collaboration will be emphasized. Students will roleplay and develop projects with real investigators, present research and biostatistical topics, discuss collaborative situations that have gone awry, discover important concepts through a cases, and face real life issues such as, poor scientific formulation, lack of time and expectations, supervision and interview skills, career track choices, grantsmanship, and business negotiations. Course content will also make use of departmental clinics that run concurrently. Fall [3] (Davidson)

Second course of a year-long sequence in collaboration in statistical science. Students are exposed to a variety of statistical and methodological problems that can arise in collaborative arrangements. The course’s goal is to sharpen students’ skills in applying their statistical knowledge in real world settings, while exposing them to the application of advanced statistical techniques in routine health science applications. The importance of understanding and learning the science underlying collaborations will be emphasized. Students will engage in consulting projects that will involve the use of a wide range of biostatistics methods from design to analysis. Prerequisite: 7351 Spring [3] (Liu)

This course provides a basic foundation in probability theory that includes probability spaces, set functions, sigma-algebras, random variables, expectation, Lp spaces, conditional expectation and projections, characteristic functions, modes of convergance, uniform integrability, classical limit theorems, random walks, martingales, Markov chains, and Brownian motion. Emphasis on measure theory is minimal. Concepts are illustrated in biomedical applications whenever possible. Fall [3] (Johnson)

This course provides a technically oriented survey of modern inferential tools and statistical learning. Topics include variable selection and regularization, basis expansions (e.g., splines), kernel smoothing, tree-based methods, supervised and unsupervised learning, neural networks, support vector machines, and ensemble methods. General techniques for inference will also be discussed, including bootstrap techniques and analytical approximations (e.g., the multivariate delta method), and exact methods. Lab required [1]. Spring. [3] (Shotwell)

The second computational statistics course covers advanced computational and machine learning algorithms using the Python programming language. These include numerical optimization and integration, Markov Chain Monte Carlo (MCMC), estimation-maximization (EM) algorithms, Gaussian processes, Hamiltonian Monte Carlo, clustering, decision trees, and graphical models. Students will be also be introduced to parallel and high performance computing approaches. Prerequisite: BIOS 301 or permission of instructor Fall [3] (Fonnesbeck)

Examines the foundations of statistical inference as viewed from Frequentist, Bayesian, and Likelihood approaches. Famous papers and controversies are discussed along with statistical theories of evidence and decision theory, and their historic significance. Spring. [3] (Blume)

This course covers the methodology and rationale for Bayesian methods and their applications. Statistical topics include the historical development of Bayesian method such as hierarchical models, Markov Chain Monte Carlo (MCMC) and related sampling methods, specification of priors, sensitivity analysis, and model checking and comparison. This course features applications of Bayesian methods to biomedical research. Prerequisite: BIOS 6301, BIOS 6312, BIOS 7330, BIOS 6341, BIOS 6342 and BIOS 7345, or equivalent; for non-biostatistics students, permission required. Fall [3] (Choi)

This course provides an introduction to causal inference methods for observational data and randomized studies. Topics include the Rubin causal model, directed acyclic graphs, propensity scores, inverse probability weighting, instrumental variables, causal mediation analysis, marginal structural models, g-computation, and sensitivity analyses to examine robustness to untestable assumptions. Students will learn the basic theory behind the methods and will apply them to biomedical data examples. Prerequisites: 6341, 6342, 7323, and 7346 or approval by the instructor. Spring. [3] (Shepherd)

Credit hours for students engaging in Master’s thesis research. Research Advisor

Credit hours for students engaging in dissertation research prior to completing qualifying exams. (Research Advisor)

Credit hours for students engaging in dissertation research. (Dissertation Advisor)