High-Dimensional Data Analysis

Course name: High-Dimensional Data Analysis
Course number: DS-06
Area of instruction: Computer Science and Engineering

Administrative details

Faculty: Computer Science and Engineering
Year of instruction: 1st year of MSc
Semester of instruction: 1st semester
No. of Credits: 5 ECTS
Total workload on average: 180 hours overall
Frontal lecture hours: 2 hours per week.
Frontal tutorial hours: 0 hours per week.
Lab hours: 2 hours per week.
Individual lab hours: 2 hours per week.
Frequency: weekly throughout the semester.
Grading mode: letters: A, B, C, D.

Course outline

This course gives the knowledge in data analysis and interpretation. It starts by learning the mathematical definition of distance and use this to motivate the use of the singular value decomposition (SVD) for dimension reduction and multi-dimensional scaling and its connection to principle component analysis. It also describes the principal component analysis and factor analysis and demonstrates how these concepts are applied to data visualization and data analysis of high-throughput experimental data. Moreover, the course gives a brief introduction to machine learning and apply it to high-throughput data. It presents the general idea behind clustering analysis and descript K-means and hierarchical clustering and demonstrate how these are used in describe prediction algorithms such as k-nearest neighbors along with the concepts of training sets, test sets, error rates and cross-validation. The students will be required to participate in laboratory practicum and solve practical tasks using hardware and Python environment.

Expected learning outcomes

Apply different data analysis for dimension reduction and multi-dimensional scaling
Be able to select best data analysis approach for a particular problem
Be familiar with principal component analysis and factor analysis and understand how these concepts are applied to data visualization and data analysis of high-throughput experimental data

Required background knowledge

Strong mathematical background in Calculus, Linear Algebra, Differential Equations, Statistics and Numerical Methods as well as programming in Python and C/C++.

Prerequisites

The course will benefit if students already know some topics of mathematics and programming. Mathematics:

CSE202 — Analytical Geometry and Linear Algebra I and CSE204 — Analytical Geometry and Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
CSE206 — Probability And Statistics: probability, likelihood, probability density function, conditional probability, Bayesian rule, covariance matrix and properties.
CSE132 — Software Design with Python
Numerical Analysis: DFT, [stochastic] gradient.

Recommendations for students on how to succeed in the course

References:

Materials for self-preparation may include these videos:

3blue1brown playlist on Linear Algebra.
Fourier Transform, Gilbert Strang classic lectures;
This MIT course;
basic python-based course on maths, numpy with the official quickstart guide.

Detailed topics covered in the course

Mathematical Distance
Dimension Reduction
Singular Value Decomposition and Principal Component Analysis
Multiple Dimensional Scaling Plots
Factor Analysis
Dealing with Batch Effects
Clustering
Heatmaps
Basic Machine Learning Concepts

Textbook

T. Tony Cai, Xiaotong Shen, ed. (2011). High-dimensional data analysis. Frontiers of Statistics. Singapore: World Scientific
Christophe Giraud (2015). Introduction to High-Dimensional Statistics. Philadelphia: Chapman and Hall/CRC

Reference material

Peter Bühlmann and Sara van de Geer (2011). Statistics for high-dimensional data: methods, theory and applications. Heidelberg; New York: Springer
Slides will be provided during the course

Required computer resources

NA

Evaluation

Final Project (40%)
Assignments (40%)
Midterm Exam (20%)

MSc: High-Dimensional Data Analysis

Contents