Curriculum
Our courses are organized into three core sequences:
- Computation which will focus on programming, data structures, computer systems, and methods.
- Data Analysis which will focus on data exploration, analysis, prediction, inference and algorithms.
- Practice aimed to impart workplace skills, ethical standards, and awareness of data science to date.
The Vanderbilt Master of Science in Data Science is an in person 4-semester, 16-course (48 credits) program, which includes the completion and presentation of a capstone project. Students will be trained in the three core sequences, gain practical experience, and sharpen workplace skills (teamwork, communication, leadership). Below is an example of course sequencing.
PLAN OF STUDY 2022 – 2023
Year 1 |
FALL DS 5220: AI Assisted Programming (Section 1. Beginners, Section 2. Advanced) DS 5620: Probability and Statistical Inference DS 5610: Exploratory Data Analysis DS 5320: Survey of Data Science Applications |
SPRING DS 5640: Modeling and Machine Learning I DS 5340: Data Science Rights and Responsibilities DS 5380: Data Science Teamwork in Practice (Teams) DS 5420: Data Management Systems (Databases) |
SUMMER DS 5700: Data Science Internship *Students can either sign up for 0-3 credits: 0 credits, will only need to pay student fees. 3 credits, will need to pay tuition and student fees, but can take one less elective in their second year. *Students are required to record their internship in 12Twenty. |
Year 2 |
FALL DS 5440: Data Science Algorithms DS 5660: Modeling and Machine Learning II DS 5899: Transformers in Theory and Practice (Fall or Spring) Elective 1 |
SPRING DS 5460: Big Data Scaling DS 5999: Capstone Development Elective 2 Elective 3 |
*Electives need to be approved by the Director of Graduate Studies |
Course Descriptions
Section 1- Beginners. Section 2- Advanced. Students learn the foundations of effective software design and programming practice, how to program and evaluate a simulation, and how to apply modern resampling techniques in simulations in both R and Python. Students learn workflow solutions, e.g., Jupyter Notebooks, Latex, Knitr, Markdown reports, and collaboration platforms (e.g., GitHub and version control). Reproducible methods for programming and data processing are emphasized.
This course explores the ethos, ethics, and obligations of the modern data scientist. Modern data security and privacy vulnerabilities, as well as solutions, for individual-level data and the institutions from which the data are derived will be discussed. The history, ethics, and standards for human experimentation are reviewed. The legal landscape concerning data ownership and privacy will be surveyed.
Students will work in teams and learn how to use the technology of teams to engage in real world data science problems. Teams will apply their skills in a supervised environment where active learning is reinforced and learn to make practical decisions during a first end-to-end project. Students will gain a practical experience in teamwork tools, commonly used data science technologies, as well as learn how to participate and support teams as the primary data curator and data analyst and practice the soft skills needed to successfully contribute to team projects during a second project proposed by a partnering client.
This course covers database management systems, e.g., relation databases, data architecture, and security. Topics include entity-relationship models and relational theory; storage and access of data; complex SQL queries; and non-relational databases including NoSQL databases. Connections to Hadoop and MapReduce will be highlighted. Students are exposed to database architectures as time allows.
An applied and practical combination of discrete structures and computational algorithms that are relevant for data science applications and infrastructure. Topics include natural language processing, graph and network models, (stochastic) gradient descent, block coordinate descent, and (quasi-) newton methods along with an overview of more traditional topics such as sorting and searching, hashing, queues, trees, string processing, advanced data structures, recurrence relations, shortest paths, matching, and dynamic programming. The course will also cover streaming algorithms for computational statistics, e.g., Monte-Carlo Markov Chain, simulated annealing, and stability of numerical algorithms.
This course will address key challenges that arise when working with big data and parallel processing. Practical techniques for storing, retrieving, and scaling are discussed. Topics include high-performance computing, parallel processing, commercial cloud architectures, and mapping of data science algorithms onto scalable computing platforms.
This course will teach students how to explore, summarize, and graph data (big and small). Topics include principles of perception, how to display data, scatterplots, histograms, boxplots, bar charts, dynamite plots, proper data summaries, dimensionality reduction, multidimensional scaling, and unsupervised clustering algorithms, such as principal component analysis, k-means clustering, and nearest neighbor algorithms.
This course covers the fundamentals of probability theory and statistical inference. Topics in probability include random variables, distributions, expectations, moments, Jensen’s inequality, law of large numbers, central limit theorem. Topics in inference include maximum likelihood, point estimation (Bayesian, frequentist, and likelihood versions); hypothesis and significance testing; re-sampling techniques. Complex mathematical proofs will be illustrated with computational solutions.
This is the first course in a sequence exploring statistical modeling and machine learning techniques. Both courses emphasize unifying and advanced concepts, such as prediction and calibration, classification and discrimination, optimism and cross-validation, re-sampling methods for model assessment, the evaluation of modeling assumptions and bias-variance trade-off. This first course focused on regression, generalized linear models, regularized regression, support-vector machines and kernel methods, and simple neural networks.
This is the second course in a sequence exploring statistical modeling and machine learning techniques. Both courses emphasize unifying and high-level concepts such as prediction and calibration, classification and discrimination, optimism and cross-validation, re-sampling methods for model assessment, the evaluation of modeling assumptions and bias-variance trade-off. This second course covers nonparametric regression, neural networks (convolution and recurrent), deep learning, reinforcement learning, long-short term memory models, hidden-markov models and Bayesian networks.
This course explores recent research on the analysis of social networks and on models and algorithms that are used to abstract their properties and make predictions. Key topics covered in this course are: Graph models; Network centrality measurements; Computational methods of link prediction, clustering and classification on graphs, and network diffusion; Deep learning on graphs including network embedding and graph neural network models and their applications. Prerequisites: DS 5440 and 5660. [3]
Transformer models are finding wide application (NLP, audio analysis or “textless NLP”, computer vision, and more) and are achieving state of the art performance across multiple tasks. In this course we will discuss the theoretical underpinnings of transformers, cover the skills and tools needed to use transformers, and gain hands-on experience. Students will be assigned two papers to present over the semester, complete self-guided training using the Huggingface.co training material, and will apply a transformer-based model to solve a research problem. Offered in Spring and Fall. [3]
A structured environment in which students develop their capstone projects; get feedback from students, faculty, and industry mentors; learn how to construct a poster presentation; and practice oral presentations. Students will also learn how to set a timeline and work toward completion in a supervised environment.
ELECTIVE Course Descriptions
*Sample of DSI pre-approved electives. Other electives need to be approved by the Director or Graduate Studies.
This course is a supervised internship external to Vanderbilt. Students have an opportunity to apply concepts learned in the classroom to real-world settings in a supervised internship experience. The experience hones technical skills, fosters professional development, and enhances communication, critical-thinking, and teamwork skills. Students must present a one-page plan for their internship and generate at least one deliverable (talk, report, etc.) based on their internship experience. Student will need to identify a Data Science faculty mentor to monitor their progress and discuss their experience. Offered in Summer. [3]
Explores recent research on the analysis of social networks and on models and algorithms that are used to abstract their properties and make predictions. Key topics covered in this course are: Graph models; Network centrality measurements; Computational methods of link prediction, clustering and classification on graphs, and network diffusion; Deep learning on graphs including network embedding and graph neural network models and their applications. Prerequisites: DS 5440 and 5660. Offered in Spring. [3]
This course develops a foundational understanding of advanced applications of statistics and applies them to specific real-world scenarios. We will cover time series analysis and various forecasting methodologies (ARIMA, SARIMAX, Prophet), PCA/clustering applied to customer segmentation and survey data, econometric techniques (regression discontinuity, difference in differences, etc.), and A/B testing methodologies. We will also explore issues around missing data and outliers/anomalies that plague real-world data sets. While theoretical foundations will be covered, the focus of this class is on case studies, real-world data, computing, and application rather than mathematical derivations or proofs. Offered in Fall. [3]
This course will prepare students on current and emerging practices for handling unstructured data. Many modern data science applications are highly data intensive, require heavy read/write workloads, and are often unstructured in nature, which requires storage and processing beyond relational databases and management methodologies. NoSQL, or Not-Only SQL, databases are non-schema oriented and provide additional capabilities that support these types of applications. This course will introduce NoSQL systems such as BigTable (by Google), Dynamo (by Amazon), Apache Cassandra (used by Facebook), Apache HBase (used by Twitter), and other NoSQL systems such as MongoDB. Other topics covered will include an introduction to big data analytics such as Apache Hadoop and MapReduce. Offered in Fall. [3]
This course is a supervised practicum in data science at Vanderbilt or VUMC. Students have an opportunity to apply concepts learned in the classroom to real-world settings in a supervised lab experience. The experience hones technical skills, fosters professional development, and enhances communication, critical-thinking, and teamwork skills. Students must present a one-page plan for their practicum and generate at least one deliverable (talk, report, etc.) based on their experience. Student will need to identify a Data Science faculty mentor to monitor their progress and discuss their experience. Offered in Spring, Summer and Fall. [3]
Every day, financial analysts are tasked with synthesizing and summarizing a huge number of documents or extracting important themes or metrics from those documents. Today, ESG-oriented strategies have become more mainstream. Here, ESG represents Environmental, Social, and Governance. Asset managers are looking for ways to assess ESG-related activities in their investment companies and monitor their progress toward their goals. Among many ESG-related documents, Corporate and social responsibility (CSR) reports are used by companies to communicate their CSR efforts and their impact on the environment and community. Though it is not required for a company to publish its CSR report annually, more than 90% of the companies in the S&P 500 Index have done so for 2019. Within AB, a common problem within Fixed income responsibility investment team is finding certain ESG metrics to answer certain questions. In this Special Topics course, a team of students will work with experts from AB to explore question-answering transformer models for extracting ESG-related answers from CSR Reports for Industry-specific data. [3]
Theoretical, mathematical, and simulation models of neurons, neural networks, or brain systems. Computational approaches to analyzing and understanding data such as neurophysiological, electrophysiological, or brain imaging. Demonstrations simulating neural models. No credit for students who have earned credit for 3270. Offered in Spring. [3]
Offered in Fall.
Mathematical and computational models of the cognitive processes underlying human memory. Attribute-based models, instance theories, neural network models, retrieved-context models, executive function and working memory models. Methods of fitting models to empirical data. Offered in Fall, alternative years. [3]
This course will focus on the key managerial questions in the health care industry, the unique institutional data that is available, and how to develop models to address these questions. Topics will include benchmarking financial, operational, and clinical performance at both the organizational and market levels. Students will be required to develop a basic familiarity with SAS programming. [2]
This course is designed to provide an overview of marketing research that yields consumer insights for use in effective marketing decision making. The course emphasizes two things that are very relevant for a marketing manager: 1) how to evaluate the design of research studies to assess whether the results are valid and meaningful, and 2) how to analyze and interpret market research data for marketing decision making. Towards this end, we will examine a variety of marketing research techniques, including focus groups, projective techniques, depth interviews, observation, ethnography, and survey design. This course will provide students with a “hands- on” experience with these various marketing research techniques, through case discussions and group projects. Mod 2, Offered second half of Fall semester.
The broad objective of this course is to provide a fundamental understanding of the quantitative marketing research methods employed by well-managed firms. The course is aimed at the manager who is the ultimate user of the research and is thus responsible for determining the scope and direction of research conducted. In the course, we will cover different types of research designs, techniques of data collection, and data analysis. Emphasis will be on the interpretation and use of results rather than on mathematical derivations. The course focuses on helping managers recognize the role of systematic information gathering and analysis in making marketing decisions, in addition to developing an appreciation for the potential contributions and limitations of marketing research data. Mod 3, Offered first half of Spring Semester [2].
Marketing decisions are primarily the purview of CEOs, CMOs, consultants, and marketing managers, but, increasingly, marketing has permeated throughout companies such that all managers must consider their customers. Marketing decisions are optimal when they are fact based, and marketing models are informed by both data and judgment. Models will be studied, created, and tested for all elements of marketing: clustering customers into segments, forecasting market sizes, customer relationship management database systems, diffusion rates for new products, advertising budgeting, pricing models, etc. Mod 4, offered second half of Spring semester. [2]
Uses data and quantitative methods to measure performance and make decisions to gain advantage in the competitive sports arena. The course builds on Managerial Statistics and Spreadsheets for Business Analytics. These two pre-requisite courses have exposed students to powerful quantitative methods such as multiple regression, constrained optimization and simulation. In this course, students gain further experience in applying quantitative methods to problems in sports. Typical questions in sports analytics include: How to predict future performance of players or teams? How much is a player on a team worth? How to rank players or teams? Which decision is more likely to lead to a win? We will cover several sports, but to be able to ‘go deep’ we will focus significantly on baseball and football. The course is for second-year MBA students only. Pre-requisites: MGT 6381 and MGT 6574. Mod 2, offered second half of Fall semester. [2]
This class provides the framework for analyzing the various components needed to value real assets, as well as an introduction to the valuation of financial assets. Topics include the time value of money, capital budgeting, measuring risk in financial markets, market efficiency and an introduction to options. Mod 1, offered first half of Fall semester. [2]
This course focuses on providing students with a strong theoretical and applied understanding of the key tools used in equity valuation and stock selection. Approaches to valuation include dividend discount models, cash flow models, and valuation by multiples. Financial statement data are used in developing cash flow forecasts, and market data are used in estimating the cost of capital. The effects of firm financing policy, corporate taxes, and potential investment options are given special consideration. Applications include capital budgeting, the evaluation of potential mergers and acquisitions, and corporate restructuring. The objective of the course is to show how to manage companies to add value. Prerequisite: MGT 6331. Mod 1,offered in first half of Fall semester. Mod 2, offered second half of Fall semester. And Mod 3, offered first half of Spring semester. [2]
Studies solutions to fundamental problems faced by individual and institutional investors. First, we cover a number of topics in fixed income markets including the different ways of computing bond yields, forecasts of interest rates using the yield curve, and duration and convexity as measures of bond risk. Second, we solve the asset allocation problem to determine an optimal portfolio mix. We review the relevant theory, use an advanced spreadsheet to find an answer, and discuss issues faced by portfolio managers. Third, we use two methods to value options, the Black-Scholes formula and the binomial tree, and show how investors can use options to customize their risk-reward profile. This course is equivalent to MGT 6404 so it is not available for MSF students. Prerequisite: MGT 6331. Mod 3, offered first half of Spring semester. Mod 1, offered in first half of Fall semester. [2]
Every company, regardless of size or industry, relies upon energy and creates carbon emissions in a variety of ways. Increasing demand for energy from emerging economies coupled with the concern over climate change is rapidly changing the nature of energy supply and demand. Companies are increasingly turning to energy conservation, energy efficiency and renewable energy to both create new business opportunities as well as reduce the risks of increasing costs or disruptions in supply. This course will focus on this critical sector of the economy and examine how leading businesses are acting to address this topic. Mod 2, offered second half of Fall semester. [2]
Social Enterprise & Entrepreneurship will explore the spectrum of activity in the growing social enterprise arena, where business models and entrepreneurial approaches are increasingly being used to directly address social and environmental issues. Topics addressed will explore nonprofit, hybrid, and for-profit social enterprise models, and the intersection of social entrepreneurship with capital formation issues, international development, technology & innovation, global health, cross-sector models, and microfinance as a case study in social enterprise & innovation. Course content will include a combination of instructor lecture, readings on focus areas, guest speakers representing the leading social entrepreneurs and social enterprises in the field, and a group project that will be integrated with the other course curriculum. Mod 4, offered second half of Spring semester. [2]
This course builds upon the business process innovation concepts introduced in the introductory operations management course and examines material, information, and cash flows between firms within a supply chain. Topics include supply chain strategy, demand forecasting and inventory management methods for short and long lifecycle products, supply chain collaboration and coordination, and operational methods for managing supply chain risk. Prerequisite: MGT 6371. Mod 4, offered second half of Spring semester. [2]
In the U.S, the health care sector accounts for 17% of gross domestic product. Facing decreasing reimbursements and ever increasing costs, and coupled with pressure to deliver quality under pay-for-performance and bundled payment schemes, health care organizations are under unprecedented pressure to improve efficiency and quality. Consequently, health care organizations need to adopt well-proven operations management concepts to better manage their processes. In this course, we will analyze health care organizations using both qualitative and quantitative principles of operations management to address issues around patient flows, capacity and staff planning, process failure and learning. The course is based on reading current articles, solving case studies, and hands-on data driven exercises. The final project involves students deploying operations management concepts to propose solutions to problems currently faced by a real hospital. The course builds on the core course in operations management, and will benefit students interested in consulting, operations management, and/or health care. Prerequisite: MGT 6371. Mod 1, offered first half of Fall semester. [2]