Skip to main content

Curriculum

Our courses are organized into three core sequences:

  1. Computation which will focus on programming, data structures, computer systems, and methods.
  2. Data Analysis which will focus on data exploration, analysis, prediction, inference and algorithms.
  3. Practice aimed to impart workplace skills, ethical standards, and awareness of data science to date.

The Vanderbilt Master of Science in Data Science is an in person 4-semester, 16-course (48 credits) program, which includes the completion and presentation of a capstone project.  Students will be trained in the three core sequences, gain practical experience, and sharpen workplace skills (teamwork, communication, leadership). Below is an example of course sequencing.

PLAN OF STUDY 2023 – 2024

Year 1

FALL
DS 5220: Principals of Programming and Simulation
DS 5620: Probability and Statistical Inference
DS 5610: Exploratory Data Analysis
DS 5320: Survey of Data Science Applications
SPRING
DS 5640: Machine Learning
DS 5340: Data Science Rights and Responsibilities (Or Spring 2nd year)
DS 5380: Data Science Teamwork in Practice (Teams)
DS 5420: Data Management Systems (Databases)
SUMMER
DS 5700: Data Science Internship
*Students can either sign up for 0-3 credits: 0 credits, will only need to pay student fees. 3 credits, will need to pay tuition and student fees, but can take one less elective in their second year. *Students are required to record their internship in 12Twenty.

Year 2

FALL
DS 5440: Data Science Algorithms
DS 5660: Deep Learning
DS 5690: Generative AI Models in Theory and Practice
(Fall or Spring)
Elective 1
SPRING
DS 5460: Big Data Scaling
DS 5999: Capstone Development
Elective 2
Elective 3
*Electives and required course waivers need to be approved by the Director of Graduate Studies 
Course Descriptions

Students learn the foundations of effective software design and programming practice, how to program and evaluate a simulation, and how to apply modern resampling techniques in simulations in both R and Python. Students learn workflow solutions, e.g., Jupyter Notebooks, Latex, Knitr, Markdown reports, and collaboration platforms (e.g., GitHub and version control). Reproducible methods for programming and data processing are emphasized.

Offered in Fall. [3] 

This course introduces foundational data science terminology and conventions, and exposes students to a wide range of data science applications, e.g., in genomics, health care, informatics, astronomy/physics, neuroimaging, cyber-physical systems, business, and finance.

Offered in Fall. [3] 

This course explores the ethos, ethics, and obligations of the modern data scientist. Modern data security and privacy vulnerabilities, as well as solutions, for individual-level data and the institutions from which the data are derived will be discussed. The history, ethics, and standards for human experimentation are reviewed. The legal landscape concerning data ownership and privacy will be surveyed.

Offered in Spring. [3] 

Can be taken in either 1st or 2nd year of the program. If taken in 2nd year, you will need to sign up for an alternative course in the 1st year (i.e. DS 5690 or another Elective). 

Students will work in teams and learn how to use the technology of teams to engage in real world data science problems. Teams will apply their skills in a supervised environment where active learning is reinforced and learn to make practical decisions during a first end-to-end project. Students will gain a practical experience in teamwork tools, commonly used data science technologies, as well as learn how to participate and support teams as the primary data curator and data analyst and practice the soft skills needed to successfully contribute to team projects during a second project proposed by a partnering client.

Offered in Spring. [3] 

This course covers database management systems, e.g., relation databases, data architecture, and security. Topics include entity-relationship models and relational theory; storage and access of data; complex SQL queries; and non-relational databases including NoSQL databases. Connections to Hadoop and MapReduce will be highlighted. Students are exposed to database architectures as time allows.

Offered in Spring. [3] 

An applied and practical combination of discrete structures and computational algorithms that are relevant for data science applications and infrastructure. Topics include natural language processing, graph and network models, (stochastic) gradient descent, block coordinate descent, and (quasi-) newton methods along with an overview of more traditional topics such as sorting and searching, hashing, queues, trees, string processing, advanced data structures, recurrence relations, shortest paths, matching, and dynamic programming. The course will also cover streaming algorithms for computational statistics, e.g., Monte-Carlo Markov Chain, simulated annealing, and stability of numerical algorithms.

Offered in Fall. [3] 

This course will address key challenges that arise when working with big data and parallel processing. Practical techniques for storing, retrieving, and scaling are discussed. Topics include high-performance computing, parallel processing, commercial cloud architectures, and mapping of data science algorithms onto scalable computing platforms.

Offered in Spring. [3] 

This course will teach students how to explore, summarize, and graph data (big and small). Topics include principles of perception, how to display data, scatterplots, histograms, boxplots, bar charts, dynamite plots, proper data summaries, dimensionality reduction, multidimensional scaling, and unsupervised clustering algorithms, such as principal component analysis, k-means clustering, and nearest neighbor algorithms.

Offered in Fall. [3] 

This course covers the fundamentals of probability theory and statistical inference. Topics in probability include random variables, distributions, expectations, moments, Jensen’s inequality, law of large numbers, central limit theorem. Topics in inference include maximum likelihood, point estimation (Bayesian, frequentist, and likelihood versions); hypothesis and significance testing; re-sampling techniques. Complex mathematical proofs will be illustrated with computational solutions.

Offered in Fall. [3] 

This is the first course in a sequence exploring statistical modeling and machine learning techniques. Both courses emphasize unifying and advanced concepts, such as prediction and calibration, classification and discrimination, optimism and cross-validation, re-sampling methods for model assessment, the evaluation of modeling assumptions and bias-variance trade-off. This first course focused on regression, generalized linear models, regularized regression, support-vector machines and kernel methods, and simple neural networks.

Offered in Fall. [3] 

This is the second course in a sequence exploring statistical modeling and machine learning techniques. Both courses emphasize unifying and high-level concepts such as prediction and calibration, classification and discrimination, optimism and cross-validation, re-sampling methods for model assessment, the evaluation of modeling assumptions and bias-variance trade-off. This second course covers nonparametric regression, neural networks (convolution and recurrent), deep learning, reinforcement learning, long-short term memory models, hidden-markov models and Bayesian networks.

Offered in Spring. [3] 

Transformer models are finding wide application (NLP, audio analysis or “textless NLP”, computer vision, and more) and are achieving state of the art performance across multiple tasks. In this course we will discuss the theoretical underpinnings of transformers, cover the skills and tools needed to use transformers, and gain hands-on experience. Students will be assigned two papers to present over the semester, complete self-guided training using the Huggingface.co training material, and will apply a transformer-based model to solve a research problem.

Offered in Spring and Fall. [3] 

This course is a supervised internship external to Vanderbilt. Students have an opportunity to apply concepts learned in the classroom to real-world settings in a supervised internship experience. The experience hones technical skills, fosters professional development, and enhances communication, critical-thinking, and teamwork skills. Students must present a one-page plan for their internship and generate at least one deliverable (talk, report, etc.) based on their internship experience. Student will need to identify a Data Science faculty mentor to monitor their progress and discuss their experience.

Offered in Summer, Spring and Fall. [3] 

This course explores recent research on the analysis of social networks and on models and algorithms that are used to abstract their properties and make predictions. Key topics covered in this course are: Graph models; Network centrality measurements; Computational methods of link prediction, clustering and classification on graphs, and network diffusion; Deep learning on graphs including network embedding and graph neural network models and their applications.

Prerequisites: DS 5440 and 5660. [3]

A structured environment in which students develop their capstone projects; get feedback from students, faculty, and industry mentors; learn how to construct a poster presentation; and practice oral presentations. Students will also learn how to set a timeline and work toward completion in a supervised environment.

Offered in Spring and Fall. [0-3] 

ELECTIVE Course Descriptions

The following are some possible electives that you may take to fulfil your elective credit requirements that have been pre-approved. Many of these courses have pre-requisites or require instructor approval, so please contact the individual department before enrolling in your choices. You may take your internship or participate in a research practicum for credit and that will count towards your elective credits.  

Data Science Electives

In this course, students will learn how to effectively communicate data insights to stakeholders through written, verbal, and visual means. The course integrates two key skills: communicating with data and data visualization. Students will explore different storytelling techniques and learn how to design effective visualizations that are both aesthetically pleasing and informative. By the end of the course, students will have the skills they need to become effective data storytellers and communicators, able to craft compelling narratives that convey complex data insights in a way that is both informative and engaging for a wide range of audiences.

Offered in Spring [3] 

This course is a supervised internship external to Vanderbilt. Students have an opportunity to apply concepts learned in the classroom to real-world settings in a supervised internship experience. The experience hones technical skills, fosters professional development, and enhances communication, critical-thinking, and teamwork skills. Students must present a one-page plan for their internship and generate at least one deliverable (talk, report, etc.) based on their internship experience. Student will need to identify a Data Science faculty mentor to monitor their progress and discuss their experience. Offered in Summer. [3] 

Explores recent research on the analysis of social networks and on models and algorithms that are used to abstract their properties and make predictions. Key topics covered in this course are: Graph models; Network centrality measurements; Computational methods of link prediction, clustering and classification on graphs, and network diffusion; Deep learning on graphs including network embedding and graph neural network models and their applications.

Offered in Spring. [3]

Prerequisites: DS 5440 and 5660

This course develops a foundational understanding of advanced applications of statistics and applies them to specific real-world scenarios. We will cover time series analysis and various forecasting methodologies (ARIMA, SARIMAX, Prophet), PCA/clustering applied to customer segmentation and survey data, econometric techniques (regression discontinuity, difference in differences, etc.), and A/B testing methodologies. We will also explore issues around missing data and outliers/anomalies that plague real-world data sets. While theoretical foundations will be covered, the focus of this class is on case studies, real-world data, computing, and application rather than mathematical derivations or proofs.

Offered in Fall. [3]

This course will prepare students on current and emerging practices for handling unstructured data. Many modern data science applications are highly data intensive, require heavy read/write workloads, and are often unstructured in nature, which requires storage and processing beyond relational databases and management methodologies. NoSQL, or Not-Only SQL, databases are non-schema oriented and provide additional capabilities that support these types of applications. This course will introduce NoSQL systems such as BigTable (by Google), Dynamo (by Amazon), Apache Cassandra (used by Facebook), Apache HBase (used by Twitter), and other NoSQL systems such as MongoDB. Other topics covered will include an introduction to big data analytics such as Apache Hadoop and MapReduce.

Offered in Fall. [3]

This course will focus on using computers to automatically analyze language data for linguistic features. The goal of the course is to provide students with the background and computing skills necessary to independently analyze and assess language features in naturally occurring text. At the lexical level, this will include assessments of words and morphemes for features related to complexity. At the syntactic level, analyses will examine part-of-speech tagging, dependency parsing, and phrasal and clausal complexity. At the cognitive level, this will involve assessing affective language and annotating texts for named entities. At the semantic level, cohesion, topicality, and similarity will be examined. Additionally, the course will cover topics including rule-based matching and a brief introduction to large language models. The course will be taught using the Python programming language and appropriate Python packages (e.g., Pandas, spaCy). The course will be practice-based, and the final project will require students to analyze language data of interest to their field of research, discuss the data from an analytical language perspective, and model outcomes.

Offered in Spring [3] 

This course is a supervised practicum in data science at Vanderbilt or VUMC. Students have an opportunity to apply concepts learned in the classroom to real-world settings in a supervised lab experience. The experience hones technical skills, fosters professional development, and enhances communication, critical-thinking, and teamwork skills. Students must present a one-page plan for their practicum and generate at least one deliverable (talk, report, etc.) based on their experience. Student will need to identify a Data Science faculty mentor to monitor their progress and discuss their experience.

Offered in Spring, Summer and Fall. [3]

This is a specialized, project-driven course that dives into the intersection of finance and natural language processing (NLP). It is designed for students who are passionate about applying cutting-edge machine learning techniques to real-world applications in finance. It first introduces foundational NLP concepts such as text processing, word representations, and model evaluation. Students will then engage in hands-on projects tailored to the unique needs of asset management. These projects, in collaboration with industry experts from AB, include harnessing the power of Large Language Models (LLMs) for investment prompts, mastering the art of financial document summarization, exploring classification challenges in finance, and delving into question answering and speech recognition in a financial context. Prerequisite: Completion of the Transformer course. Ideal for those seeking to elevate their financial prowess with AI-driven insights.

Offered in Spring [3]

 

The study of the evolution of customer behavior. Description – The Nissan Product Planning team are looking for highly talented students to assist them with developing a streamlined mechanism for creating target clusters understanding how these clusters evolve over time (i.e. how different customer groups react to social, financial, technological and political trends). This will assist Nissan to develop a great understanding of what different types of customers want at different time periods.

Offered in Spring [3]

 

 

External Electives

Theoretical, mathematical, and simulation models of neurons, neural networks, or brain systems. Computational approaches to analyzing and understanding data such as neurophysiological, electrophysiological, or brain imaging. Demonstrations simulating neural models. No credit for students who have earned credit for 3270. 

Offered in Spring. [3] 

Mathematical and computational models of the cognitive processes underlying human memory. Attribute-based models, instance theories, neural network models, retrieved-context models, executive function and working memory models. Methods of fitting models to empirical data.

Offered in Fall, alternative years. [3]

This course will focus on the key managerial questions in the health care industry, the unique institutional data that is available, and how to develop models to address these questions. Topics will include benchmarking financial, operational, and clinical performance at both the organizational and market levels. Students will be required to develop a basic familiarity with SAS programming.

[2] 

This course is designed to provide an overview of marketing research that yields consumer insights for use in effective marketing decision making. The course emphasizes two things that are very relevant for a marketing manager: 1) how to evaluate the design of research studies to assess whether the results are valid and meaningful, and 2) how to analyze and interpret market research data for marketing decision making. Towards this end, we will examine a variety of marketing research techniques, including focus groups, projective techniques, depth interviews, observation, ethnography, and survey design. This course will provide students with a “hands- on” experience with these various marketing research techniques, through case discussions and group projects.

Mod 2, Offered second half of Fall semester. 

The broad objective of this course is to provide a fundamental understanding of the quantitative marketing research methods employed by well-managed firms. The course is aimed at the manager who is the ultimate user of the research and is thus responsible for determining the scope and direction of research conducted. In the course, we will cover different types of research designs, techniques of data collection, and data analysis. Emphasis will be on the interpretation and use of results rather than on mathematical derivations. The course focuses on helping managers recognize the role of systematic information gathering and analysis in making marketing decisions, in addition to developing an appreciation for the potential contributions and limitations of marketing research data.

Mod 3, Offered first half of Spring Semester [2].

Marketing decisions are primarily the purview of CEOs, CMOs, consultants, and marketing managers, but, increasingly, marketing has permeated throughout companies such that all managers must consider their customers. Marketing decisions are optimal when they are fact based, and marketing models are informed by both data and judgment. Models will be studied, created, and tested for all elements of marketing: clustering customers into segments, forecasting market sizes, customer relationship management database systems, diffusion rates for new products, advertising budgeting, pricing models, etc.

Mod 4, offered second half of Spring semester. [2] 

This class provides the framework for analyzing the various components needed to value real assets, as well as an introduction to the valuation of financial assets. Topics include the time value of money, capital budgeting, measuring risk in financial markets, market efficiency and an introduction to options.

Mod 1, offered first half of Fall semester. [2] 

This course focuses on providing students with a strong theoretical and applied understanding of the key tools used in equity valuation and stock selection. Approaches to valuation include dividend discount models, cash flow models, and valuation by multiples. Financial statement data are used in developing cash flow forecasts, and market data are used in estimating the cost of capital. The effects of firm financing policy, corporate taxes, and potential investment options are given special consideration. Applications include capital budgeting, the evaluation of potential mergers and acquisitions, and corporate restructuring. The objective of the course is to show how to manage companies to add value.

 Mod 1,offered in first half of Fall semester. Mod 2, offered second half of Fall semester. And Mod 3, offered first half of Spring semester. [2]  

Pre-requisite: MGT 6331

Studies solutions to fundamental problems faced by individual and institutional investors. First, we cover a number of topics in fixed income markets including the different ways of computing bond yields, forecasts of interest rates using the yield curve, and duration and convexity as measures of bond risk. Second, we solve the asset allocation problem to determine an optimal portfolio mix. We review the relevant theory, use an advanced spreadsheet to find an answer, and discuss issues faced by portfolio managers. Third, we use two methods to value options, the Black-Scholes formula and the binomial tree, and show how investors can use options to customize their risk-reward profile. This course is equivalent to MGT 6404 so it is not available for MSF students.

Mod 3, offered first half of Spring semester. Mod 1, offered in first half of Fall semester. [2] 

Pre-requisite: MGT 6331

Every company, regardless of size or industry, relies upon energy and creates carbon emissions in a variety of ways. Increasing demand for energy from emerging economies coupled with the concern over climate change is rapidly changing the nature of energy supply and demand. Companies are increasingly turning to energy conservation, energy efficiency and renewable energy to both create new business opportunities as well as reduce the risks of increasing costs or disruptions in supply. This course will focus on this critical sector of the economy and examine how leading businesses are acting to address this topic.

Mod 2, offered second half of Fall semester. [2] 

Social Enterprise & Entrepreneurship will explore the spectrum of activity in the growing social enterprise arena, where business models and entrepreneurial approaches are increasingly being used to directly address social and environmental issues. Topics addressed will explore nonprofit, hybrid, and for-profit social enterprise models, and the intersection of social entrepreneurship with capital formation issues, international development, technology & innovation, global health, cross-sector models, and microfinance as a case study in social enterprise & innovation. Course content will include a combination of instructor lecture, readings on focus areas, guest speakers representing the leading social entrepreneurs and social enterprises in the field, and a group project that will be integrated with the other course curriculum. Mod 4, offered second half of Spring semester. [2] 

This course builds upon the business process innovation concepts introduced in the introductory operations management course and examines material, information, and cash flows between firms within a supply chain. Topics include supply chain strategy, demand forecasting and inventory management methods for short and long lifecycle products, supply chain collaboration and coordination, and operational methods for managing supply chain risk.

Mod 4, offered second half of Spring semester. [2] 

Prerequisite: MGT 6371.

In the U.S, the health care sector accounts for 17% of gross domestic product. Facing decreasing reimbursements and ever increasing costs, and coupled with pressure to deliver quality under pay-for-performance and bundled payment schemes, health care organizations are under unprecedented pressure to improve efficiency and quality. Consequently, health care organizations need to adopt well-proven operations management concepts to better manage their processes. In this course, we will analyze health care organizations using both qualitative and quantitative principles of operations management to address issues around patient flows, capacity and staff planning, process failure and learning. The course is based on reading current articles, solving case studies, and hands-on data driven exercises. The final project involves students deploying operations management concepts to propose solutions to problems currently faced by a real hospital. The course builds on the core course in operations management, and will benefit students interested in consulting, operations management, and/or health care.

Mod 1, offered first half of Fall semester. [2] 

Prerequisite: MGT 6371.

This course is intended for students who are focused on big data analysis in the Python programming language, from large scale epidemiologic datasets, electronic medical records, or next generation sequence data. It will cover basic programming, including strings, arrays, dictionaries, conditional statements, data visualization, external data sources, and algorithms, with a focus on using programing to solve challenges within the students’ own research projects. At the end of the course, students will have an understanding of the foundation of programming in Python. They will understand the importance and use of regular expressions and efficient data search tools and will demonstrate proficiency in algorithms and data visualization. Evaluations will be based on a midterm exam, homework, a final project, and class participation. The proposed course is not for undergraduates or professional credit.

Offered in Spring [3] 

AM and FM modulation. Also, advanced topics in signal processing are treated.

Offered in Spring [3] 

No credit for students who have earned credit for EECE 4252. 

Computer Science Electives- CS Classes are not recommended for anyone without a CS background

Principles and programming techniques of artificial intelligence. Strategies for searching, representation of knowledge and automatic deduction, learning, and adaptive systems. Survey of applications.

Offered in Fall [3] 

No credit for students who have earned credit for CS 4260. 

The nature of software. The object-oriented paradigm. Software life-cycle models. Requirements, specification, design, implementation, documentation, and testing of software. Object-oriented analysis and design. Software maintenance. 

Offered in Fall [3] 

No credit for students who have earned credit for CS 4278. 

Core concepts necessary to architect, build, test, and deploy complex web-based systems; analysis of key domain requirements in security, robustness, performance, and scalability. 

Offered in Fall [3] 

No credit for students who have earned credit for CS 4288. 

Project-based course building on core concepts necessary to architect, build, test, and deploy complex web-based systems. Students form teams, propose project ideas, architect their solutions, and build the initial minimum viable project for their application. In-class discussions focus on advanced topics in web-development.

Offered in Spring [3] 

No credit for students who have earned credit for CS 4289. 

Principles and practices of big data processing and analytics. Data storage databases and data modeling techniques, data processing and querying, data analytics and applications of machine learning using these systems.

Offered in Spring [3] 

Pre-requisite: CS 3251

Set manipulation techniques, divide-and-conquer methods, the greedy method, dynamic programming, algorithms on graphs, backtracking, branch-and-bound, lower bound theory, NP-hard and NP-complete problems, approximation algorithms.

Offered in Spring [3] 

Pre-requisite: CS 3250

Algorithms for dealing with special classes of graphs. Particular emphasis is given to subclasses of perfect graphs and graphs that can be stored in a small amount of space. Interval, chordal, permutation, comparability, and circular-arc graphs; graph decomposition.

[3] 

Pre-requisite: CS 6310 or Math 4710.

Design and analysis of parallel algorithms for sorting, searching, matrix processing, FFT, optimization, and other problems. Existing and proposed parallel architectures, including SIMD machines, MIMD machines, and VLSI systolic arrays

[3] 

Pre-requisite: CS 6310 

Theoretical and algorithmic foundations of supervised learning, unsupervised learning, and reinforcement learning. Linear and nonlinear regression, kernel methods, support vector machines, neural networks and deep learning methods, instance-based methods, ensemble classifiers, clustering and dimensionality reduction, value and policy iteration. Explainable AI, ethics, and data privacy.

Offered in Spring [3] 

Discussion of state-of-the-art and current research issues in heuristic search, knowledge representation, deduction, and reasoning. Related application areas include: planning systems, qualitative reasoning, cognitive models of human memory, user modeling in ICAI, reasoning with uncertainty, knowledge-based system design, and language comprehension

[3] 

Pre-requisite: CS 4260 or equivalent 

Theory and algorithms for designing systems that learn from data including modern machine learning methods that take advantage of increased complexity to provide improved performance. Data types, data pre-processing, measures of similarity and dissimilarity. Supervised learning: decision trees, logistic regression, support vector machines, Bayesian methods, and neural networks; unsupervised learning: partitional, hierarchical, density-based, and graph clustering algorithms. Feature selection for classification and clustering. Evaluation methods. Reinforcement learning: Markov Decision processes, dynamic programming, Monte Carlo methods, TD-learning.

Offered in Fall [3] 

Pre-requisite: CS 4262 or 5262 or 6360.

*course descriptions and numbers are tentative and subject to change*