Difference between revisions of "MSc: High-Dimensional Data Analysis"

From IU
Jump to navigation Jump to search
 
(16 intermediate revisions by 2 users not shown)
Line 35: Line 35:
 
== Prerequisites ==
 
== Prerequisites ==
   
* Calculus I [https://eduwiki.innopolis.university/index.php/BSc:MathematicalAnalysisI CSE201 - Mathematical Analysis I]
+
* [https://eduwiki.innopolis.university/index.php/BSc:_Mathematical_Analysis_I CSE201 - Mathematical Analysis I]
* Calculus II [https://eduwiki.innopolis.university/index.php/BSc:MathematicalAnalysisII CSE203 - Mathematical Analysis II]
+
* [https://eduwiki.innopolis.university/index.php/BSc:_Mathematical_Analysis_II CSE203 - Mathematical Analysis II]
 
* [https://eduwiki.innopolis.university/index.php/BSc:_Differential_Equations CSE205 - Differential Equations]
* Linear Algebra
 
 
* Numerical Methods
[https://eduwiki.innopolis.university/index.php/BSc:AnalyticGeometryAndLinearAlgebraI CSE204 - Analytic Geometry And Linear Algebra I]
 
[https://eduwiki.innopolis.university/index.php/BSc:AnalyticGeometryAndLinearAlgebraII CSE204 - Analytic Geometry And Linear Algebra II]
+
* [https://eduwiki.innopolis.university/index.php/MSc:_Advanced_Statistics CSE331 - Advanced Statistics]
* Differential Equations [https://eduwiki.innopolis.university/index.php/BSc:DifferentialEquations CSE205 - Differential Equations]
 
* Numerical Methods []
 
* Statistics []
 
   
 
The course will benefit if students already know some topics of mathematics and programming.
 
The course will benefit if students already know some topics of mathematics and programming.
 
Mathematics:
 
Mathematics:
  +
* Linear Algebra: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
+
* [https://eduwiki.innopolis.university/index.php/BSc:_Analytic_Geometry_And_Linear_Algebra_I1 CSE202] — Analytical Geometry and Linear Algebra I and [https://eduwiki.innopolis.university/index.php/BSc:_Analytic_Geometry_And_Linear_Algebra_II CSE204] — Analytical Geometry and Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
* ProbStat: probability, likelihood, probability density function, conditional probability, Bayesian rule, covariance matrix and properties.
+
* [https://eduwiki.innopolis.university/index.php/BSc:_Probability_And_Statistics CSE206 — Probability And Statistics]: probability, likelihood, probability density function, conditional probability, Bayesian rule, covariance matrix and properties.
  +
* CSE132 — Software Design with Python
 
* Numerical Analysis: DFT, [stochastic] gradient.
 
* Numerical Analysis: DFT, [stochastic] gradient.
* Programming: python and numpy library.
 
   
  +
== Recommendations for students on how to succeed in the course==
 
References:
 
References:
  +
* [https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010/ Linear Algebra]
* Linear Algebra
 
* Statistics for Applications
+
* [https://ocw.mit.edu/courses/18-650-statistics-for-applications-fall-2016/ Statistics for Applications]
* Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
+
* [https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/ Matrix Methods in Data Analysis, Signal Processing, and Machine Learning]
   
Materials for self-preparation may include these videos: 3blue1brown playlist on Linear Algebra. Fourier Transform, Gilbert Strang classic lectures; This MIT course; basic python-based course on maths, numpy with the official quickstart guide.
+
Materials for self-preparation may include these videos:
  +
* [https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab 3blue1brown playlist on Linear Algebra].
  +
* [https://www.youtube.com/watch?v=spUNpyF58BY Fourier Transform], [https://ocw.mit.edu/courses/18-06-linear-algebra-spring-2010/resources/video-lectures/ Gilbert Strang classic lectures];
  +
* [https://ocw.mit.edu/courses/6-042j-mathematics-for-computer-science-spring-2015/ This MIT course];
  +
* [https://github.com/hsu-ai-course/mbp/tree/master/notebooks basic python-based course on maths], numpy with the official [https://numpy.org/doc/stable/user/quickstart.html quickstart guide].
   
 
== Detailed topics covered in the course ==
 
== Detailed topics covered in the course ==

Latest revision as of 10:42, 21 April 2022

High-Dimensional Data Analysis

  • Course name: High-Dimensional Data Analysis
  • Course number: DS-06
  • Area of instruction: Computer Science and Engineering

Administrative details

  • Faculty: Computer Science and Engineering
  • Year of instruction: 1st year of MSc
  • Semester of instruction: 1st semester
  • No. of Credits: 5 ECTS
  • Total workload on average: 180 hours overall
  • Frontal lecture hours: 2 hours per week.
  • Frontal tutorial hours: 0 hours per week.
  • Lab hours: 2 hours per week.
  • Individual lab hours: 2 hours per week.
  • Frequency: weekly throughout the semester.
  • Grading mode: letters: A, B, C, D.

Course outline

This course gives the knowledge in data analysis and interpretation. It starts by learning the mathematical definition of distance and use this to motivate the use of the singular value decomposition (SVD) for dimension reduction and multi-dimensional scaling and its connection to principle component analysis. It also describes the principal component analysis and factor analysis and demonstrates how these concepts are applied to data visualization and data analysis of high-throughput experimental data. Moreover, the course gives a brief introduction to machine learning and apply it to high-throughput data. It presents the general idea behind clustering analysis and descript K-means and hierarchical clustering and demonstrate how these are used in describe prediction algorithms such as k-nearest neighbors along with the concepts of training sets, test sets, error rates and cross-validation. The students will be required to participate in laboratory practicum and solve practical tasks using hardware and Python environment.

Expected learning outcomes

  • Apply different data analysis for dimension reduction and multi-dimensional scaling
  • Be able to select best data analysis approach for a particular problem
  • Be familiar with principal component analysis and factor analysis and understand how these concepts are applied to data visualization and data analysis of high-throughput experimental data

Required background knowledge

Strong mathematical background in Calculus, Linear Algebra, Differential Equations, Statistics and Numerical Methods as well as programming in Python and C/C++.

Prerequisites

The course will benefit if students already know some topics of mathematics and programming. Mathematics:

  • CSE202 — Analytical Geometry and Linear Algebra I and CSE204 — Analytical Geometry and Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
  • CSE206 — Probability And Statistics: probability, likelihood, probability density function, conditional probability, Bayesian rule, covariance matrix and properties.
  • CSE132 — Software Design with Python
  • Numerical Analysis: DFT, [stochastic] gradient.

Recommendations for students on how to succeed in the course

References:

Materials for self-preparation may include these videos:

Detailed topics covered in the course

  • Mathematical Distance
  • Dimension Reduction
  • Singular Value Decomposition and Principal Component Analysis
  • Multiple Dimensional Scaling Plots
  • Factor Analysis
  • Dealing with Batch Effects
  • Clustering
  • Heatmaps
  • Basic Machine Learning Concepts

Textbook

  • T. Tony Cai, Xiaotong Shen, ed. (2011). High-dimensional data analysis. Frontiers of Statistics. Singapore: World Scientific
  • Christophe Giraud (2015). Introduction to High-Dimensional Statistics. Philadelphia: Chapman and Hall/CRC

Reference material

  • Peter Bühlmann and Sara van de Geer (2011). Statistics for high-dimensional data: methods, theory and applications. Heidelberg; New York: Springer
  • Slides will be provided during the course

Required computer resources

NA

Evaluation

  • Final Project (40%)
  • Assignments (40%)
  • Midterm Exam (20%)