MDS Vancouver

UBC’s Vancouver campus Master of Data Science program covers all stages of the value chain, with an emphasis on the skills required to apply meaning to data. Over 10 months, you will learn how to extract data for use in experiments, how to apply state-of-the-art techniques in data analysis, and how to present your findings effectively to domain experts.

Program Benefits

Highlights Across All MDS Programs:

10-month, full-time, accelerated program offers a short-term commitment for long-term gain
Condensed one-credit courses allow for in-depth focus on a limited set of topics at one time
Capstone project gives students an opportunity to apply their skills
Real-world data sets are integrated in all courses to provide practical experience across a range of domains

Highlights Specific To Vancouver Campus Option:

Curriculum designed by combined computer science and statistics experts with input from local industry
A coordinated approach blending computer science and statistics education in order to give students a broader skill set
Courses are taught by a core team of faculty dedicated to teaching MDS full-time and providing support to students during the program.
A cosmopolitan city, sprawling campus, and a cohort of up to 100 students, offer an engaging, culturally enriched university experience
Strong connections with industry partners in public and private sectors, start-ups, and leading tech companies offer a wide range of networking/career opportunities

Curriculum

The program structure includes 24 one-credit courses offered in four-week segments. Courses are lab-oriented and delivered in-person with some blended online content.

At the end of the six segments, an eight-week, six-credit capstone project is also included, allowing students to apply their newly acquired knowledge, while working alongside other students with real-life data sets.

Fall: September - December

Block 1 (4 weeks, 4 credits)

Programming for Data Science | DSCI 511

Program design and data manipulation with Python. Overview of data structures, iteration, flow control, and program design relevant to data exploration and analysis. When and how to exploit pre-existing libraries.

Instructor(s):

Gittu George (Section 1), Elham Khoda (Section 1), Prajeet Bajpai (Section 2)

Computing Platforms for Data Science | DSCI 521

How to install, maintain, and use the data scientific software “stack”. The Unix shell, version control, and problem solving strategies. Literate programming documents.

Instructor(s):

Ilya Musabirov (Section 1), Daniel Chan (Section 2)

Programming for Data Manipulation | DSCI 523

Program design and data manipulation with R. Organizing, filtering, sorting, grouping, reformatting, converting, and cleaning data to prepare it for further analysis.

Instructor(s):

Payman Nickchi (Section 1), Gittu George (Section 2)

Descriptive Statistics and Probability for Data Science | DSCI 551

Fundamental concepts in probability including conditional, joint, and marginal distributions. Statistical view of data coming from a probability distribution.

Instructor(s):

Andy Tai (Section 1), Alexi Rodríguez-Arelis (Section 2)

Block 2 (4 weeks, 4 credits)

Algorithms and Data Structures | DSCI 512

How to choose and use appropriate algorithms and data structures to help solve data science problems. Key concepts such as recursion and algorithmic complexity (e.g., efficiency, scalability).

Instructor(s):

Elham Khoda (Section 1), Hedayat Zarkoob (Section 2)

Data Visualization I | DSCI 531

Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python.

Instructor(s):

Payman Nickchi (Section 1), Andy Tai (Section 2)

Statistical Inference and Computation I | DSCI 552

The statistical and probabilistic foundations of inference, developed jointly through mathematical derivations and simulation techniques. Important distributions and large sample results. Methods for dealing with the multiple testing problem. The frequentist paradigm.

Instructor(s):

Katie Burak (Section 1), Rodolfo Lourenzutti (Section 2)

Supervised Learning I | DSCI 571

Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers.

Instructor(s):

Prajeet Bajpai (Section 1), Varada Kolhatkar (Section 2)

Block 3 (4 weeks, 4 credits)

Databases and Data Retrieval | DSCI 513

Learn how to work with data stored in relational and NoSQL database systems. Learn SQL through Data Definition Language (DDL) to create schemas, define keys, and enforce integrity constraints, and Data Manipulation Language (DML) to query, join, and aggregate data. Get introduced to various NoSQL databases, focusing on document databases, specifically MongoDB, and querying with the MongoDB Query Language (MQL).

Instructor(s):

Hedayat Zarkoob (Section 1), Gittu George (Section 2)

Data Science Workflows | DSCI 522

Interactive vs. scripted/unattended analyses and how to move fluidly between them. Reproducibility through automation and containerization.

Instructor(s):

Sky Sheng (Section 1), Daniel Chen (Section 2)

Regression I | DSCI 561

Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction.

Instructor(s):

Katie Burak (Section 1), Rodolfo Lourenzutti (Section 2)

Feature and Model Selection | DSCI 573

How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization.

Instructor(s):

Elham Khoda (Section 1), Prajeet Bajpai (Section 2)

Winter: January - April

Block 4 (4 weeks, 4 credits)

Collaborative Software Development | DSCI 524

How to exploit practices from collaborative software development techniques in data scientific workflows. Appropriate use of abstraction, the software life cycle, unit testing / continuous integration, and packaging for use by others.

Instructor(s):

Daniel Chen (Section 1), Ilya Musabirov (Section 2)

Communication and Argumentation | DSCI 542

How to interpret and present data science findings to a variety of audiences. Written and spoken presentation skills.

Instructor(s):

Katie Burak (Section 1), Andy Tai (Section 2)

Regression II | DSCI 562

Useful extensions to basic regression, e.g., generalized linear models, mixed effects, smoothing, robust regression, and techniques for dealing with missing data.

Instructor(s):

Payman Nickchi (Section 1), Alexi Rodriguez-Arelis (Section 2)

Supervised Learning II | DSCI 572

Introduction to numerical optimization (e.g., gradient descent). Neural networks and deep learning.

Instructor(s):

Prajeet Bajpai (Section 1), Varada Kolhatkar (Section 2)

Block 5 (4 weeks + 1 week break, 4 credits)

Data Visualization II | DSCI 532

How to make principled and effective choices with respect to marks, spatial arrangement, and colour. Analysis, design, and implementation of interactive figures. How to provide multiple views, deal with complexity, and make difficult decisions about data reduction.

Instructor(s):

Ilya Musabirov (Section 1), Daniel Chen (Section 2)

Statistical Inference and Computation II | DSCI 553

Bayesian reasoning for data science. How to formulate and implement inference using the prior-to-posterior paradigm.

Instructor(s):

Payman Nickchi (Section 1), Alexi Rodriguez-Arelis (Section 2)

Unsupervised Learning | DSCI 563

How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm.

Instructor(s):

Hedayat Zarkoob (Section 1), Varada Kolhatkar (Section 2)

Spatial and Temporal Models | DSCI 574

Model fitting and prediction in the presence of correlation due to temporal and/or spatial association. ARIMA models.

Instructor(s):

Katie Burak (Section 1), Prajeet Bajpai (Section 2)

Block 6 (4 weeks, 4 credits)

Web and Cloud Computing | DSCI 525

How to use the web as a platform for data collection, computation, and publishing. Accessing data via scraping and APIs. Using the cloud for tasks that are beyond the capability of your local computing resources.

Instructor(s):

Gittu George (Section 1) , Ilya Musabirov (Section 2)

Privacy, Ethics, and Security | DSCI 541

The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.

Instructor(s):

Hedayat Zarkoob (Section 1 and 2)

Experimentation and Causal Inference | DSCI 554

Statistical evidence from randomized experiments versus observational studies. Applications of randomization, e.g., A/B testing for website optimization. Methods for dealing with the multiple testing problem.

Instructor(s):

Payman Nickchi (Section 1), Alexi Rodriguez-Arelis

Advanced Machine Learning | DSCI 575

Advanced machine learning methods, with an undercurrent of natural language processing (NLP) applications. Bag of words, recommender systems, topic models, natural language as sequence data, Markov chains, and RNNs for text synthesis. An introduction to popular NLP libraries in Python.

Instructor(s):

Elham Khoda (Section 1), Varada Kolhatkar (Section 2)

Spring: May - June

Capstone Project (8-10 Weeks, 6 credits)

Capstone Project | DSCI 591

A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a reproducible analysis pipeline, project report, presentation and possibly other products, such as a dashboard.

Instructor(s):

MDS Staff

Meet Tarini

The technical competency that Tarini developed during the MDS program helped in her career progression. What she learned at MDS was what are the right questions to ask and most importantly, how do you communicate all of your findings to a more general audience?

Review Admission Requirements Contact Us With Questions