BSc: Introduction To Machine Learning.previous version
Introduction to Machine Learning
- Course name: Introduction to Machine Learning
- Course number: R-01
Course characteristics
Key concepts of the class
- Machine learning paradigms
- Machine Learning approaches, and algorithms
What is the purpose of this course?
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems.
Prerequisites
The course will benefit if students already know some topics of mathematics and programming.
Maths:
- CSE202 — Analytical Geometry and Linear Algebra I
- CSE204 — Analytical Geometry and Linear Algebra II
- CSE201 — Mathematical Analysis I
- CSE203 — Mathematical Analysis II
- CSE206 — Probability And Statistics
Programming:
- CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management.
For a more concrete identification of subtopics, please see chapters 2, 3 and 4 of (1), which has done an excellent job in listing and describing all important maths subtopics essential for machine learning students. In addition to that, students are strongly advised to gain a basic understanding of descriptive statistics and data distributions, statistical hypothesis testing, and data sampling, resampling, and experimental design techniques.
(1) Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press.
Course Objectives Based on Bloom’s Taxonomy
What should a student remember at the end of the course?
By the end of the course, the students should be able to recognize and define
- Different learning paradigms
- A wide variety of learning approaches and algorithms
- Various learning settings
- Performance metrics
- Popular machine learning software tools
What should a student be able to understand at the end of the course?
By the end of the course, the students should be able to describe and explain (with examples)
- Difference between different learning paradigms
- Difference between classification and regression
- Concept of learning theory (bias/variance tradeoffs and large margins etc.)
- Kernel methods
- Regularization
- Ensemble Learning
- Neural or Deep Learning
What should a student be able to apply at the end of the course?
By the end of the course, the students should be able to apply
- Classification approaches to solve supervised learning problems
- Clustering approaches to solve unsupervised learning problems
- Ensemble learning to improve a model’s performance
- Regularization to improve a model’s generalization
- Deep learning algorithms to solve real-world problems
Course evaluation
Proposed points | ||
---|---|---|
Labs/seminar classes | 20 | 0 |
Interim performance assessment | 30 | 40 |
Exams | 50 | 60 |
If necessary, please indicate freely your course’s features in terms of students’ performance assessment: None
Grades range
Proposed range | ||
---|---|---|
A. Excellent | 90-100 | |
B. Good | 75-89 | |
C. Satisfactory | 60-74 | |
D. Poor | 0-59 |
If necessary, please indicate freely your course’s grading features: The semester starts with the default range as proposed in the Table [tab:MLCourseGradingRange], but it may change slightly (usually reduced) depending on how the semester progresses.
Resources and reference material
- T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013.
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011.
- Tom M Mitchel. Machine Learning, McGraw Hill
- Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer
Course Sections
The main sections of the course and approximate hour distribution between them is as follows:
Section | Section Title | Teaching Hours |
---|---|---|
1 | Supervised Learning | 24 |
2 | Decision Trees and Ensemble Learning | 8 |
3 | Unsupervised Learning | 8 |
4 | Deep Learning | 12 |
Section 1
Section title:
Supervised Learning
Topics covered in this section:
- Introduction to Machine Learning
- Derivatives and Cost Function
- Data Pre-processing
- Linear Regression
- Multiple Linear Regression
- Gradient Descent
- Polynomial Regression
- Bias-varaince Tradeoff
- Difference between classification and regression
- Logistic Regression
- Naive Bayes
- KNN
- Confusion Metrics
- Performance Metrics
- Regularization
- Hyperplane Based Classification
- Perceptron Learning Algorithm
- Max-Margin Classification
- Support Vector Machines
- Slack Variables
- Lagrangian Support Vector Machines
- Kernel Trick
What forms of evaluation were used to test students’ performance in this section?
|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1
Typical questions for ongoing performance evaluation within this section
- Is it true that in simple linear regression and the squared correlation between X and Y are identical?
- What are the two assumptions that the Linear regression model makes about the Error Terms?
- Fit a regression model to a given data problem, and support your choice of the model.
- In a list of given tasks, choose which are regression and which are classification tasks.
- In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net?
- Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case.
- What is perceptron learning algorithm?
- Write the mathematical form of its minimization objective for a two-dimensional case.
- What is a max-margin classifier?
- Explain the role of slack variable in SVM.
Typical questions for seminar classes (labs) within this section
- How to implement various regression models to solve different regression problems?
- Describe the difference between different types of regression models, their pros and cons, etc.
- Implement various classification models to solve different classification problems.
- Describe the difference between Logistic regression and naive bayes.
- Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems.
- Solve a given optimization problem using the Lagrange multiplier method.
Test questions for final assessment in this section
- What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant?
- Given a fitted regression model to a dataset, interpret its coefficients.
- Explain which regression model would be a better fit to model the relationship between response and predictor in a given data.
- If the number of training examples goes to infinity, how will it affect the bias and variance of a classification model?
- Given a two dimensional classification problem, determine if by using Logistic regression and regularization, a linear boundary can be estimated or not.
- Explain which classification model would be a better fit to for a given classification problem.
- Consider the Leave-one-out-CV error of standard two-class SVM. Argue that under a given value of slack variable, a given mathematical statement is either correct or incorrect.
- How does the choice of slack variable affect the bias-variance tradeoff in SVM?
- Explain which Kernel would be a better fit to be used in SVM for a given data.
Section 2
Section title:
Decision Trees and Ensemble Methods
Topics covered in this section:
- Decision Trees
- Bagging
- Boosting
- Random Forest
- Adaboost
What forms of evaluation were used to test students’ performance in this section?
|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1
Typical questions for ongoing performance evaluation within this section
- What are pros and cons of decision trees over other classification models?
- Explain how tree-pruning works.
- What is the purpose of ensemble learning?
- What is a bootstrap, and what is its role in Ensemble learning?
- Explain the role of slack variable in SVM.
Typical questions for seminar classes (labs) within this section
- Implement different variants of decision trees to solve different classification problems.
- Solve a given classification problem problem using an ensemble classifier.
- Implement Adaboost for a given problem.
Test questions for final assessment in this section
- When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data?
- Argue if an ensemble model would be a better choice for a given classification problem or not.
- Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier.
Section 3
Section title:
Unsupervised Learning
Topics covered in this section:
- K-means Clustering
- K-means++
- Hierarchical Clustering
- DBSCAN
- Mean-shift
What forms of evaluation were used to test students’ performance in this section?
|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1
Typical questions for ongoing performance evaluation within this section
- Which implicit or explicit objective function does K-means implement?
- Explain the difference between k-means and k-means++.
- Whaat is single-linkage and what are its pros and cons?
- Explain how DBSCAN works.
Typical questions for seminar classes (labs) within this section
- Implement different clustering algorithms to solve to solve different clustering problems.
- Implement Mean-shift for video tracking
Test questions for final assessment in this section
- K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize?
- Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method?
- What are the characterics of noise points in DBSCAN?
Section 4
Section title:
Deep Learning
Topics covered in this section:
- Artificial Neural Networks
- Back-propagation
- Convolutional Neural Networks
- Autoencoder
- Variatonal Autoencoder
- Generative Adversairal Networks
What forms of evaluation were used to test students’ performance in this section?
|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1
Typical questions for ongoing performance evaluation within this section
- What is a fully connected feed-forward ANN?
- Explain different hyperparameters of CNNs.
- Calculate KL-divergence between two probability distributions.
- What is a generative model and how is it different from a discriminative model?
Typical questions for seminar classes (labs) within this section
- Implement different types of ANNs to solve to solve different classification problems.
- Calculate KL-divergence between two probability distributions.
- Implement different generative models for different problems.
Test questions for final assessment in this section
- Explain what is ReLU, what are its different variants, and what are their pros and cons?
- Calculate the number of parameters to be learned during training in a CNN, given all important information.
- Explain how a VAE can be used as a generative model.
Exams and retake planning
Exam
Exams will be paper-based and will be conducted in a form of problem solving, where the problems will be similar to those mentioned above and will based on the contents taught in lecture slides, lecture discussions (including white-board materials), lab materials, reading materials (including the text books), etc. Students will be given 1-3 hours to complete the exam.
Retake 1
First retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course.
Retake 2
Second retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course.