MDS Vancouver

UBC’s Vancouver campus Master of Data Science program covers all stages of the value chain, with an emphasis on the skills required to apply meaning to data. Over 10 months, you will learn how to extract data for use in experiments, how to apply state-of-the-art techniques in data analysis, and how to present your findings effectively to domain experts.

Program Benefits

Highlights Across All MDS Programs:

  • 10-month, full-time, accelerated program offers a short-term commitment for long-term gain
  • Condensed one-credit courses allow for in-depth focus on a limited set of topics at one time
  • Capstone project gives students an opportunity to apply their skills
  • Real-world data sets are integrated in all courses to provide practical experience across a range of domains

Highlights Specific To Vancouver Campus Option:

  • Curriculum designed by combined computer science and statistics experts with input from local industry
  • A coordinated approach blending computer science and statistics education in order to give students a broader skill set
  • Courses are taught by a core team of faculty dedicated to teaching MDS full-time and providing support to students during the program.
  • A cosmopolitan city, sprawling campus, and a cohort of up to 100 students, offer an engaging, culturally enriched university experience
  • Strong connections with industry partners in public and private sectors, start-ups, and leading tech companies offer a wide range of networking/career opportunities

Curriculum

The program structure includes 24 one-credit courses offered in four-week segments. Courses are lab-oriented and delivered in-person with some blended online content.

At the end of the six segments, an eight-week, six-credit capstone project is also included, allowing students to apply their newly acquired knowledge, while working alongside other students with real-life data sets.

Fall: September - December

Block 1 (4 weeks, 4 credits)

Programming for Data Science | DSCI 511

Practical introduction to Python programming with a focus on data science. Students will learn how to work with different data types (strings, datetimes, etc.) and how to group, reshape and manipulate data. In addition, students will strengthen their programming foundations by exploring control flow, functions, unit testing, and object-oriented programming. This course emphasizes the background necessary for future Python-based coursework at MDS.
Gittu George (Section 1), Elham Khoda (Section 1), Prajeet Bajpai (Section 2)

Computing Platforms for Data Science | DSCI 521

Essential computing platforms and tools that underpin effective data science work, including Unix-based operating systems, shells, IDEs, and the broader R and Python scientific software stack. It emphasizes hands-on skills in installing, configuring, and troubleshooting these environments, working with project and file system navigation, and managing reproducible workflows through tools like version control (Git/GitHub), Jupyter, R Markdown, and literate programming environments. By the end of the course, students can confidently customize and maintain their development platforms, integrate and run R and Python libraries, apply literate programming practices, and navigate common development challenges with robust problem-solving strategies in place. You will also leave with the beginnings of a personal website to start your data science journey.
Ilya Musabirov (Section 1), Daniel Chan (Section 2)

Programming for Data Manipulation | DSCI 523

Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis.
Payman Nickchi (Section 1), Gittu George (Section 2)

Descriptive Statistics and Probability for Data Science | DSCI 551

Introduces students to the foundational probabilistic principles of statistical reasoning. The course develops core skills in probability and distributional thinking, covering discrete and continuous distributions, conditional and joint probabilities, independence, and the logic of frequentist statistical estimation through maximum likelihood. Students also gain practical experience with Monte Carlo simulation as a tool for understanding probabilistic systems and approximating distributions. Emphasis is placed on interpreting and applying these concepts in data science contexts, ensuring students are equipped to build intuition for uncertainty, variability, and model-based reasoning. By the end of the course, students establish the mathematical and computational foundation required for subsequent courses in statistical inference, regression, and causal modelling.

Andy Tai (Section 1), Alexi Rodríguez-Arelis (Section 2)

Block 2 (4 weeks, 4 credits)

Algorithms and Data Structures | DSCI 512

This course will sharpen your algorithmic thinking. You will learn how to analyze problems, choose the right data structures, and design efficient algorithms to solve real-world data science problems.
Elham Khoda (Section 1), Hedayat Zarkoob (Section 2)

Data Visualization I | DSCI 531

How to (and how not to) visualize data. Graphical grammars via ggplot in R and Altair in Python. Creating effective data visualizations for exploratory data analysis and to communicate insights to others.
Payman Nickchi (Section 1), Andy Tai (Section 2)

Statistical Inference and Computation I | DSCI 552

Statistical and probabilistic foundations of inference, focusing on the frequentist paradigm. Topics include estimation (point and interval), as well as hypothesis testing from both simulation-based and theoretical approaches. The course explores the connections and distinctions between key statistical concepts, emphasizing both theory and computational implementation.
Katie Burak (Section 1), Rodolfo Lourenzutti (Section 2)

Supervised Learning I | DSCI 571

Fundamental concepts and techniques of supervised machine learning, including data splitting, cross-validation, generalization, overfitting, the bias–variance trade-off, the golden rule, and data preprocessing. You will also learn popular machine learning algorithms such as decision trees, k-nearest neighbours, SVMs, naive Bayes, and linear models using the scikit-learn framework.
Prajeet Bajpai (Section 1), Varada Kolhatkar (Section 2)

Block 3 (4 weeks, 4 credits)

Databases and Data Retrieval | DSCI 513

Learn how to work with data stored in relational and NoSQL database systems. Learn SQL through Data Definition Language (DDL) to create schemas, define keys, and enforce integrity constraints, and Data Manipulation Language (DML) to query, join, and aggregate data. Get introduced to various NoSQL databases, focusing on document databases, specifically MongoDB, and querying with the MongoDB Query Language (MQL).
Hedayat Zarkoob (Section 1), Gittu George (Section 2)

Data Science Workflows | DSCI 522

Full lifecycle of data analysis by integrating interactive and scripted methods, ensuring reproducibility, clarity, and collaboration. Through hands-on practice, students will fluidly transition between REPL-driven exploration (e.g., RStudio, IPython) and automated scripting, producing dynamic, literate documents that blend narrative, code, data, results, and visuals via tools like Quarto, R Markdown, and Jupyter Notebooks. The course emphasizes effective project structure; enforcing naming conventions, path and dependency management; function writing; and version control systems (e.g., Git with GitHub) to track and share work. To streamline and scale workflows, students will also learn how to use Make to automatically build reliable, reproducible data science pipelines.
Sky Sheng (Section 1), Daniel Chen (Section 2)

Regression I | DSCI 561

Linear models for predicting a quantitative response variable using multiple categorical and/or quantitative predictors. You will learn model assessment, prediction, variable selection techniques and logistic regression for classification tasks.
Katie Burak (Section 1), Rodolfo Lourenzutti (Section 2)

Feature and Model Selection | DSCI 573

Model analysis and improvement via evaluation metrics, loss functions, feature engineering, feature selection, regularization, ensemble techniques, and explainable ML strategies.
Elham Khoda (Section 1), Prajeet Bajpai (Section 2)

Winter: January - April

Block 4 (4 weeks, 4 credits)

Collaborative Software Development | DSCI 524

Advanced practices and tooling foundational to professional, trustworthy, and scalable data science software. Students create packages in both R and Python using collaborative Git + GitHub workflows (e.g., branching with git flow). The course covers building and testing robust code via unit testing (e.g., testthat, pytest), ensuring code quality with code coverage metrics, and implementing continuous integration and continuous deployment (CI/CD) workflows (including GitHub Actions with matrix configurations). Students will also learn to package and document their software, set up reproducible environments using conda, renv, Docker, and automate deployment pipelines for data science.
Daniel Chen (Section 1), Ilya Musabirov (Section 2)

Communication and Argumentation | DSCI 542

Essential communication skills for data scientists, recognizing that professionals spend significant time presenting findings, explaining concepts, writing documentation, and collaborating with colleagues. Students master universal communication principles through focused practice in articulating and writing about technical concepts and analyses, learning to translate complex analytical findings into clear, compelling narratives that resonate with diverse stakeholders across various fields. Through hands-on practice with storytelling techniques, you'll create presentations and reports that drive decision-making, ensuring your technical expertise translates into organizational impact while maintaining ethical standards and transparent communication of limitations and uncertainties.
Katie Burak (Section 1), Andy Tai (Section 2)

Regression II | DSCI 562

Builds on regression methods that you learned in 561 to cover advanced modeling techniques used in data science. Students will learn to extend linear models to generalized linear models (GLMs) for count, categorical, and ordinal outcomes, apply mixed-effects models for grouped data, and analyze time-to-event data using survival analysis. Additional topics include local and quantile regression methods, as well as strategies for handling missing data through multiple imputation.
Payman Nickchi (Section 1), Alexi Rodriguez-Arelis (Section 2)

Supervised Learning II | DSCI 572

Dive into deep learning with Python and PyTorch, covering optimization, the fundamentals of neural networks, and convolutional neural networks. You will also explore advanced topics such as generative adversarial networks.
Prajeet Bajpai (Section 1), Varada Kolhatkar (Section 2)

Block 5 (4 weeks + 1 week break, 4 credits)

Data Visualization II | DSCI 532

Project course where each student team iteratively develops and deploys a dashboard for interactive data visualization, exploration, and communication based on a self-selected target audience and dataset.
Ilya Musabirov (Section 1), Daniel Chen (Section 2)

Statistical Inference and Computation II | DSCI 553

Introduces you to the Bayesian paradigm in statistics. Up to this point, you have primarily focused on conducting statistical analyses within the frequentist framework. This course offers an opportunity to learn how to use Bayesian reasoning in data modeling, apply Bayesian statistics to regression models, and compare and contrast Bayesian and frequentist methods while evaluating their relative strengths. You will also learn the basics of Markov Chain Monte Carlo (MCMC) for practical model estimation and inference.
Payman Nickchi (Section 1), Alexi Rodriguez-Arelis (Section 2)

Unsupervised Learning | DSCI 563

Uncovering underlying structure in data. You will learn clustering techniques, data representation methods such as dimensionality reduction and word embeddings, and explore applications including topic modeling and recommendation systems.
Hedayat Zarkoob (Section 1), Varada Kolhatkar (Section 2)

Spatial and Temporal Models | DSCI 574

Model fitting and prediction when data exhibit spatial and/or temporal dependence. Topics include ARIMA models, outlier and anomaly detection, and deep learning approaches for temporal data. You will learn how to account for correlation structures to improve forecasting and spatial modelling.
Katie Burak (Section 1), Prajeet Bajpai (Section 2)

Block 6 (4 weeks, 4 credits)

Web and Cloud Computing | DSCI 525

Go beyond the limits of your laptop and learn to tackle large-scale data science tasks. You will build and automate scalable data pipelines and access cloud resources via Web APIs, leverage cloud platforms for distributed machine learning with Spark, and deploy your models into production environments.
Gittu George (Section 1) , Ilya Musabirov (Section 2)

Privacy, Ethics, and Security | DSCI 541

Focuses on the ethical considerations of data science. As future data scientists, you will encounter situations that require you to make decisions with potentially significant impacts on yourself and many others. The course invites you to reflect on these impacts through topics such as misinformation and disinformation, privacy, algorithmic bias, fairness, and more.
Hedayat Zarkoob (Section 1 and 2)

Experimentation and Causal Inference | DSCI 554

Introduces statistical methods for design of experiments and making causal inferences from both experimental and observational data. Topics include multiple comparisons, confounding, randomization, blocking, and power analysis, with a focus on practical applications such as A/B testing. Students will also explore strategies for analyzing observational data, including stratification, regression modeling, sampling schemes, and matched case-control designs.
Payman Nickchi (Section 1), Alexi Rodriguez-Arelis

Advanced Machine Learning | DSCI 575

Explore advanced machine learning methods for natural language processing (NLP) applications, including Markov chains, hidden Markov models, recurrent neural networks, self-attention and transformer architectures, and the fundamentals of large language models.
Elham Khoda (Section 1), Varada Kolhatkar (Section 2)

Spring: May - June

Capstone Project (8-10 Weeks, 6 credits)

Capstone Project | DSCI 591

A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a reproducible analysis pipeline, project report, presentation and possibly other products, such as a dashboard.
MDS Staff

Meet Tarini

The technical competency that Tarini developed during the MDS program helped in her career progression. What she learned at MDS was what are the right questions to ask and most importantly, how do you communicate all of your findings to a more general audience?