Difference between revisions of "BSc: Introduction To Machine Learning"
R.sirgalina (talk | contribs) |
|||
(6 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | |||
= Introduction to Machine Learning = |
= Introduction to Machine Learning = |
||
+ | * '''Course name''': Introduction to Machine Learning |
||
+ | * '''Code discipline''': R-01 |
||
+ | * '''Subject area''': |
||
+ | == Short Description == |
||
− | * <span>'''Course name:'''</span> Introduction to Machine Learning |
||
+ | This course covers the following concepts: Machine learning paradigms; Machine Learning approaches, and algorithms. |
||
− | * <span>'''Course number:'''</span> R-01 |
||
− | == |
+ | == Prerequisites == |
− | === |
+ | === Prerequisite subjects === |
+ | * CSE202 — Analytical Geometry and Linear Algebra I |
||
+ | * CSE204 — Analytical Geometry and Linear Algebra II |
||
+ | * CSE201 — Mathematical Analysis I |
||
+ | * CSE203 — Mathematical Analysis II |
||
+ | * CSE206 — Probability And Statistics |
||
+ | * CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management. |
||
+ | === Prerequisite topics === |
||
− | * Machine learning paradigms |
||
− | * Machine Learning approaches, and algorithms |
||
− | === What is the purpose of this course? === |
||
+ | == Course Topics == |
||
+ | {| class="wikitable" |
||
+ | |+ Course Sections and Topics |
||
+ | |- |
||
+ | ! Section !! Topics within the section |
||
+ | |- |
||
+ | | Supervised Learning || |
||
+ | # Introduction to Machine Learning |
||
+ | # Derivatives and Cost Function |
||
+ | # Data Pre-processing |
||
+ | # Linear Regression |
||
+ | # Multiple Linear Regression |
||
+ | # Gradient Descent |
||
+ | # Polynomial Regression |
||
+ | # Bias-varaince Tradeoff |
||
+ | # Difference between classification and regression |
||
+ | # Logistic Regression |
||
+ | # Naive Bayes |
||
+ | # KNN |
||
+ | # Confusion Metrics |
||
+ | # Performance Metrics |
||
+ | # Regularization |
||
+ | # Hyperplane Based Classification |
||
+ | # Perceptron Learning Algorithm |
||
+ | # Max-Margin Classification |
||
+ | # Support Vector Machines |
||
+ | # Slack Variables |
||
+ | # Lagrangian Support Vector Machines |
||
+ | # Kernel Trick |
||
+ | |- |
||
+ | | Decision Trees and Ensemble Methods || |
||
+ | # Decision Trees |
||
+ | # Bagging |
||
+ | # Boosting |
||
+ | # Random Forest |
||
+ | # Adaboost |
||
+ | |- |
||
+ | | Unsupervised Learning || |
||
+ | # K-means Clustering |
||
+ | # K-means++ |
||
+ | # Hierarchical Clustering |
||
+ | # DBSCAN |
||
+ | # Mean-shift |
||
+ | |- |
||
+ | | Deep Learning || |
||
+ | # Artificial Neural Networks |
||
+ | # Back-propagation |
||
+ | # Convolutional Neural Networks |
||
+ | # Autoencoder |
||
+ | # Variatonal Autoencoder |
||
+ | # Generative Adversairal Networks |
||
+ | |} |
||
+ | == Intended Learning Outcomes (ILOs) == |
||
+ | |||
+ | === What is the main purpose of this course? === |
||
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems. |
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems. |
||
+ | === ILOs defined at three levels === |
||
− | == Prerequisites == |
||
− | The course will benefit if students already know some topics of mathematics and programming. |
||
− | * Programming: Python, numpy, basic object-oriented concepts, memory management, data structures, and algorithms. |
||
− | * Maths: A thorough understanding of mathematics (linear algebra, calculus, probability theory and statistics) is necessary to gain a solid understanding of the internal working of machine learning algorithms. For a more concrete identification of subtopics, please see chapters 2, 3 and 4 of (1), which has done an excellent job in listing and describing all important maths subtopics essential for machine learning students. In addition to that, students are strongly advised to gain a basic understanding of descriptive statistics and data distributions, statistical hypothesis testing, and data sampling, resampling, and experimental design techniques |
||
− | |||
− | [https://www.deeplearningbook.org/ (1) Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press]. |
||
− | |||
− | |||
− | == Course Objectives Based on Bloom’s Taxonomy == |
||
− | |||
− | === What should a student remember at the end of the course? === |
||
− | |||
− | By the end of the course, the students should be able to recognize and define |
||
+ | ==== Level 1: What concepts should a student know/remember/explain? ==== |
||
+ | By the end of the course, the students should be able to ... |
||
* Different learning paradigms |
* Different learning paradigms |
||
* A wide variety of learning approaches and algorithms |
* A wide variety of learning approaches and algorithms |
||
Line 35: | Line 88: | ||
* Popular machine learning software tools |
* Popular machine learning software tools |
||
− | === What should a student be able to |
+ | ==== Level 2: What basic practical skills should a student be able to perform? ==== |
+ | By the end of the course, the students should be able to ... |
||
− | |||
− | By the end of the course, the students should be able to describe and explain (with examples) |
||
− | |||
* Difference between different learning paradigms |
* Difference between different learning paradigms |
||
* Difference between classification and regression |
* Difference between classification and regression |
||
Line 47: | Line 98: | ||
* Neural or Deep Learning |
* Neural or Deep Learning |
||
− | === What should a student be able to apply |
+ | ==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ==== |
+ | By the end of the course, the students should be able to ... |
||
− | |||
− | By the end of the course, the students should be able to apply |
||
− | |||
* Classification approaches to solve supervised learning problems |
* Classification approaches to solve supervised learning problems |
||
* Clustering approaches to solve unsupervised learning problems |
* Clustering approaches to solve unsupervised learning problems |
||
* Ensemble learning to improve a model’s performance |
* Ensemble learning to improve a model’s performance |
||
* Regularization to improve a model’s generalization |
* Regularization to improve a model’s generalization |
||
− | * Deep learning algorithms to solve real-world problems |
+ | * Deep learning algorithms to solve real-world problems |
+ | == Grading == |
||
− | === Course |
+ | === Course grading range === |
+ | {| class="wikitable" |
||
− | |||
− | + | |+ |
|
− | |+ Course grade breakdown |
||
− | ! |
||
− | ! |
||
− | !align="center"| '''Proposed points''' |
||
|- |
|- |
||
+ | ! Grade !! Range !! Description of performance |
||
− | | Labs/seminar classes |
||
− | | 20 |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | A. Excellent || 90-100 || - |
||
− | | Interim performance assessment |
||
− | | 30 |
||
− | |align="center"| 40 |
||
|- |
|- |
||
+ | | B. Good || 75-89 || - |
||
− | | Exams |
||
− | | |
+ | |- |
+ | | C. Satisfactory || 60-74 || - |
||
− | |align="center"| 60 |
||
+ | |- |
||
+ | | D. Poor || 0-59 || - |
||
|} |
|} |
||
+ | === Course activities and grading breakdown === |
||
− | If necessary, please indicate freely your course’s features in terms of students’ performance assessment: None |
||
+ | {| class="wikitable" |
||
− | |||
+ | |+ |
||
− | === Grades range === |
||
− | |||
− | {| |
||
− | |+ Course grading range |
||
− | ! |
||
− | ! |
||
− | !align="center"| '''Proposed range''' |
||
|- |
|- |
||
+ | ! Activity Type !! Percentage of the overall course grade |
||
− | | A. Excellent |
||
− | | 90-100 |
||
− | |align="center"| |
||
|- |
|- |
||
+ | | Labs/seminar classes || 0 |
||
− | | B. Good |
||
− | | 75-89 |
||
− | |align="center"| |
||
|- |
|- |
||
+ | | Interim performance assessment || 40 |
||
− | | C. Satisfactory |
||
− | | 60-74 |
||
− | |align="center"| |
||
|- |
|- |
||
− | | |
+ | | Exams || 60 |
− | | 0-59 |
||
− | |align="center"| |
||
|} |
|} |
||
+ | === Recommendations for students on how to succeed in the course === |
||
− | If necessary, please indicate freely your course’s grading features: The semester starts with the default range as proposed in the Table [[#tab:MLCourseGradingRange|[tab:MLCourseGradingRange]]], but it may change slightly (usually reduced) depending on how the semester progresses. |
||
− | === Resources and reference material === |
||
+ | == Resources, literature and reference materials == |
||
− | * T. Hastie, R. Tibshirani, D. Witten and G. James. ''<span>An Introduction to Statistical Learning. Springer 2013.</span>'' |
||
− | * T. Hastie, R. Tibshirani, and J. Friedman. ''<span>The Elements of Statistical Learning. Springer 2011.</span>'' |
||
− | * Tom M Mitchel. <span>''Machine Learning, McGraw Hill''</span> |
||
− | * Christopher M. Bishop. <span>''Pattern Recognition and Machine Learning, Springer''</span> |
||
− | == |
+ | === Open access resources === |
+ | * T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013. |
||
+ | * T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011. |
||
+ | * Tom M Mitchel. Machine Learning, McGraw Hill |
||
+ | * Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer |
||
+ | === Closed access resources === |
||
− | The main sections of the course and approximate hour distribution between them is as follows: |
||
+ | |||
− | {| |
||
+ | === Software and tools used within the course === |
||
− | |+ Course Sections |
||
+ | |||
− | !align="center"| '''Section''' |
||
+ | = Teaching Methodology: Methods, techniques, & activities = |
||
− | ! '''Section Title''' |
||
+ | |||
− | !align="center"| '''Teaching Hours''' |
||
+ | == Activities and Teaching Methods == |
||
+ | {| class="wikitable" |
||
+ | |+ Activities within each section |
||
+ | |- |
||
+ | ! Learning Activities !! Section 1 !! Section 2 !! Section 3 !! Section 4 |
||
|- |
|- |
||
+ | | Development of individual parts of software product code || 1 || 1 || 1 || 1 |
||
− | |align="center"| 1 |
||
− | | Supervised Learning |
||
− | |align="center"| 24 |
||
|- |
|- |
||
+ | | Homework and group projects || 1 || 1 || 1 || 1 |
||
− | |align="center"| 2 |
||
− | | Decision Trees and Ensemble Learning |
||
− | |align="center"| 8 |
||
|- |
|- |
||
+ | | Midterm evaluation || 1 || 1 || 1 || 1 |
||
− | |align="center"| 3 |
||
− | | Unsupervised Learning |
||
− | |align="center"| 8 |
||
|- |
|- |
||
+ | | Testing (written or computer based) || 1 || 1 || 1 || 1 |
||
− | |align="center"| 4 |
||
+ | |- |
||
− | | Deep Learning |
||
+ | | Discussions || 1 || 1 || 1 || 1 |
||
− | |align="center"| 12 |
||
− | |} |
+ | |} |
+ | == Formative Assessment and Course Activities == |
||
− | === |
+ | === Ongoing performance assessment === |
− | === Section |
+ | ==== Section 1 ==== |
+ | {| class="wikitable" |
||
− | |||
+ | |+ |
||
− | Supervised Learning |
||
+ | |- |
||
− | |||
+ | ! Activity Type !! Content !! Is Graded? |
||
− | === Topics covered in this section: === |
||
+ | |- |
||
− | |||
+ | | Question || Is it true that in simple linear regression <math>{\textstyle R^{2}}</math> and the squared correlation between X and Y are identical? || 1 |
||
− | * Introduction to Machine Learning |
||
+ | |- |
||
− | * Derivatives and Cost Function |
||
+ | | Question || What are the two assumptions that the Linear regression model makes about the Error Terms? || 1 |
||
− | * Data Pre-processing |
||
+ | |- |
||
− | * Linear Regression |
||
+ | | Question || Fit a regression model to a given data problem, and support your choice of the model. || 1 |
||
− | * Multiple Linear Regression |
||
+ | |- |
||
− | * Gradient Descent |
||
+ | | Question || In a list of given tasks, choose which are regression and which are classification tasks. || 1 |
||
− | * Polynomial Regression |
||
+ | |- |
||
− | * Bias-varaince Tradeoff |
||
+ | | Question || In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net? || 1 |
||
− | * Difference between classification and regression |
||
+ | |- |
||
− | * Logistic Regression |
||
+ | | Question || Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case. || 1 |
||
− | * Naive Bayes |
||
+ | |- |
||
− | * KNN |
||
+ | | Question || What is perceptron learning algorithm? || 1 |
||
− | * Confusion Metrics |
||
+ | |- |
||
− | * Performance Metrics |
||
+ | | Question || Write the mathematical form of its minimization objective for a two-dimensional case. || 1 |
||
− | * Regularization |
||
+ | |- |
||
− | * Hyperplane Based Classification |
||
+ | | Question || What is a max-margin classifier? || 1 |
||
− | * Perceptron Learning Algorithm |
||
+ | |- |
||
− | * Max-Margin Classification |
||
+ | | Question || Explain the role of slack variable in SVM. || 1 |
||
− | * Support Vector Machines |
||
+ | |- |
||
− | * Slack Variables |
||
+ | | Question || How to implement various regression models to solve different regression problems? || 0 |
||
− | * Lagrangian Support Vector Machines |
||
+ | |- |
||
− | * Kernel Trick |
||
+ | | Question || Describe the difference between different types of regression models, their pros and cons, etc. || 0 |
||
− | |||
+ | |- |
||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
+ | | Question || Implement various classification models to solve different classification problems. || 0 |
||
− | |||
+ | |- |
||
− | <div class="tabular"> |
||
+ | | Question || Describe the difference between Logistic regression and naive bayes. || 0 |
||
− | |||
+ | |- |
||
− | <span>|a|c|</span> & '''Yes/No'''<br /> |
||
+ | | Question || Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems. || 0 |
||
− | Development of individual parts of software product code & 1<br /> |
||
+ | |- |
||
− | Homework and group projects & 1<br /> |
||
+ | | Question || Solve a given optimization problem using the Lagrange multiplier method. || 0 |
||
− | Midterm evaluation & 1<br /> |
||
+ | |} |
||
− | Testing (written or computer based) & 1<br /> |
||
+ | ==== Section 2 ==== |
||
− | Reports & 0<br /> |
||
+ | {| class="wikitable" |
||
− | Essays & 0<br /> |
||
+ | |+ |
||
− | Oral polls & 0<br /> |
||
+ | |- |
||
− | Discussions & 1<br /> |
||
+ | ! Activity Type !! Content !! Is Graded? |
||
− | |||
+ | |- |
||
− | |||
+ | | Question || What are pros and cons of decision trees over other classification models? || 1 |
||
− | |||
+ | |- |
||
− | </div> |
||
+ | | Question || Explain how tree-pruning works. || 1 |
||
− | === Typical questions for ongoing performance evaluation within this section === |
||
+ | |- |
||
− | |||
+ | | Question || What is the purpose of ensemble learning? || 1 |
||
− | # Is it true that in simple linear regression <math display="inline">R^2</math> and the squared correlation between X and Y are identical? |
||
+ | |- |
||
− | # What are the two assumptions that the Linear regression model makes about the '''Error Terms'''? |
||
+ | | Question || What is a bootstrap, and what is its role in Ensemble learning? || 1 |
||
− | # Fit a regression model to a given data problem, and support your choice of the model. |
||
+ | |- |
||
− | # In a list of given tasks, choose which are regression and which are classification tasks. |
||
+ | | Question || Explain the role of slack variable in SVM. || 1 |
||
− | # In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net? |
||
+ | |- |
||
− | # Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case. |
||
+ | | Question || Implement different variants of decision trees to solve different classification problems. || 0 |
||
− | # What is perceptron learning algorithm? |
||
+ | |- |
||
− | # Write the mathematical form of its minimization objective for a two-dimensional case. |
||
+ | | Question || Solve a given classification problem problem using an ensemble classifier. || 0 |
||
− | # What is a max-margin classifier? |
||
+ | |- |
||
− | # Explain the role of slack variable in SVM. |
||
+ | | Question || Implement Adaboost for a given problem. || 0 |
||
− | |||
+ | |} |
||
− | === Typical questions for seminar classes (labs) within this section === |
||
+ | ==== Section 3 ==== |
||
− | |||
+ | {| class="wikitable" |
||
− | # How to implement various regression models to solve different regression problems? |
||
+ | |+ |
||
− | # Describe the difference between different types of regression models, their pros and cons, etc. |
||
+ | |- |
||
− | # Implement various classification models to solve different classification problems. |
||
+ | ! Activity Type !! Content !! Is Graded? |
||
− | # Describe the difference between Logistic regression and naive bayes. |
||
+ | |- |
||
− | # Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems. |
||
+ | | Question || Which implicit or explicit objective function does K-means implement? || 1 |
||
− | # Solve a given optimization problem using the Lagrange multiplier method. |
||
+ | |- |
||
− | |||
+ | | Question || Explain the difference between k-means and k-means++. || 1 |
||
− | === Test questions for final assessment in this section === |
||
+ | |- |
||
− | |||
+ | | Question || Whaat is single-linkage and what are its pros and cons? || 1 |
||
− | # What does it mean for the standard least squares coefficient estimates of linear regression to be ''scale equivariant''? |
||
+ | |- |
||
+ | | Question || Explain how DBSCAN works. || 1 |
||
+ | |- |
||
+ | | Question || Implement different clustering algorithms to solve to solve different clustering problems. || 0 |
||
+ | |- |
||
+ | | Question || Implement Mean-shift for video tracking || 0 |
||
+ | |} |
||
+ | ==== Section 4 ==== |
||
+ | {| class="wikitable" |
||
+ | |+ |
||
+ | |- |
||
+ | ! Activity Type !! Content !! Is Graded? |
||
+ | |- |
||
+ | | Question || What is a fully connected feed-forward ANN? || 1 |
||
+ | |- |
||
+ | | Question || Explain different hyperparameters of CNNs. || 1 |
||
+ | |- |
||
+ | | Question || Calculate KL-divergence between two probability distributions. || 1 |
||
+ | |- |
||
+ | | Question || What is a generative model and how is it different from a discriminative model? || 1 |
||
+ | |- |
||
+ | | Question || Implement different types of ANNs to solve to solve different classification problems. || 0 |
||
+ | |- |
||
+ | | Question || Calculate KL-divergence between two probability distributions. || 0 |
||
+ | |- |
||
+ | | Question || Implement different generative models for different problems. || 0 |
||
+ | |} |
||
+ | === Final assessment === |
||
+ | '''Section 1''' |
||
+ | # What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant? |
||
# Given a fitted regression model to a dataset, interpret its coefficients. |
# Given a fitted regression model to a dataset, interpret its coefficients. |
||
# Explain which regression model would be a better fit to model the relationship between response and predictor in a given data. |
# Explain which regression model would be a better fit to model the relationship between response and predictor in a given data. |
||
Line 222: | Line 282: | ||
# How does the choice of slack variable affect the bias-variance tradeoff in SVM? |
# How does the choice of slack variable affect the bias-variance tradeoff in SVM? |
||
# Explain which Kernel would be a better fit to be used in SVM for a given data. |
# Explain which Kernel would be a better fit to be used in SVM for a given data. |
||
+ | '''Section 2''' |
||
− | |||
− | === Section 2 === |
||
− | |||
− | === Section title: === |
||
− | |||
− | Decision Trees and Ensemble Methods |
||
− | |||
− | === Topics covered in this section: === |
||
− | |||
− | * Decision Trees |
||
− | * Bagging |
||
− | * Boosting |
||
− | * Random Forest |
||
− | * Adaboost |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
− | |||
− | <div class="tabular"> |
||
− | |||
− | <span>|a|c|</span> & '''Yes/No'''<br /> |
||
− | Development of individual parts of software product code & 1<br /> |
||
− | Homework and group projects & 1<br /> |
||
− | Midterm evaluation & 1<br /> |
||
− | Testing (written or computer based) & 1<br /> |
||
− | Reports & 0<br /> |
||
− | Essays & 0<br /> |
||
− | Oral polls & 0<br /> |
||
− | Discussions & 1<br /> |
||
− | |||
− | |||
− | |||
− | </div> |
||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # What are pros and cons of decision trees over other classification models? |
||
− | # Explain how tree-pruning works. |
||
− | # What is the purpose of ensemble learning? |
||
− | # What is a bootstrap, and what is its role in Ensemble learning? |
||
− | # Explain the role of slack variable in SVM. |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # Implement different variants of decision trees to solve different classification problems. |
||
− | # Solve a given classification problem problem using an ensemble classifier. |
||
− | # Implement Adaboost for a given problem. |
||
− | |||
− | === Test questions for final assessment in this section === |
||
− | |||
# When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data? |
# When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data? |
||
# Argue if an ensemble model would be a better choice for a given classification problem or not. |
# Argue if an ensemble model would be a better choice for a given classification problem or not. |
||
# Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier. |
# Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier. |
||
+ | '''Section 3''' |
||
− | |||
− | === Section 3 === |
||
− | |||
− | === Section title: === |
||
− | |||
− | Unsupervised Learning |
||
− | |||
− | === Topics covered in this section: === |
||
− | |||
− | * K-means Clustering |
||
− | * K-means++ |
||
− | * Hierarchical Clustering |
||
− | * DBSCAN |
||
− | * Mean-shift |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
− | |||
− | <div class="tabular"> |
||
− | |||
− | <span>|a|c|</span> & '''Yes/No'''<br /> |
||
− | Development of individual parts of software product code & 1<br /> |
||
− | Homework and group projects & 1<br /> |
||
− | Midterm evaluation & 1<br /> |
||
− | Testing (written or computer based) & 1<br /> |
||
− | Reports & 0<br /> |
||
− | Essays & 0<br /> |
||
− | Oral polls & 0<br /> |
||
− | Discussions & 1<br /> |
||
− | |||
− | |||
− | |||
− | </div> |
||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # Which implicit or explicit objective function does K-means implement? |
||
− | # Explain the difference between k-means and k-means++. |
||
− | # Whaat is single-linkage and what are its pros and cons? |
||
− | # Explain how DBSCAN works. |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # Implement different clustering algorithms to solve to solve different clustering problems. |
||
− | # Implement Mean-shift for video tracking |
||
− | |||
− | === Test questions for final assessment in this section === |
||
− | |||
# K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize? |
# K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize? |
||
# Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method? |
# Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method? |
||
# What are the characterics of noise points in DBSCAN? |
# What are the characterics of noise points in DBSCAN? |
||
+ | '''Section 4''' |
||
− | |||
− | === Section 4 === |
||
− | |||
− | === Section title: === |
||
− | |||
− | Deep Learning |
||
− | |||
− | === Topics covered in this section: === |
||
− | |||
− | * Artificial Neural Networks |
||
− | * Back-propagation |
||
− | * Convolutional Neural Networks |
||
− | * Autoencoder |
||
− | * Variatonal Autoencoder |
||
− | * Generative Adversairal Networks |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
− | |||
− | <div class="tabular"> |
||
− | |||
− | <span>|a|c|</span> & '''Yes/No'''<br /> |
||
− | Development of individual parts of software product code & 1<br /> |
||
− | Homework and group projects & 1<br /> |
||
− | Midterm evaluation & 1<br /> |
||
− | Testing (written or computer based) & 1<br /> |
||
− | Reports & 0<br /> |
||
− | Essays & 0<br /> |
||
− | Oral polls & 0<br /> |
||
− | Discussions & 1<br /> |
||
− | |||
− | |||
− | |||
− | </div> |
||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # What is a fully connected feed-forward ANN? |
||
− | # Explain different hyperparameters of CNNs. |
||
− | # Calculate KL-divergence between two probability distributions. |
||
− | # What is a generative model and how is it different from a discriminative model? |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # Implement different types of ANNs to solve to solve different classification problems. |
||
− | # Calculate KL-divergence between two probability distributions. |
||
− | # Implement different generative models for different problems. |
||
− | |||
− | === Test questions for final assessment in this section === |
||
− | |||
# Explain what is ReLU, what are its different variants, and what are their pros and cons? |
# Explain what is ReLU, what are its different variants, and what are their pros and cons? |
||
# Calculate the number of parameters to be learned during training in a CNN, given all important information. |
# Calculate the number of parameters to be learned during training in a CNN, given all important information. |
||
# Explain how a VAE can be used as a generative model. |
# Explain how a VAE can be used as a generative model. |
||
− | == |
+ | === The retake exam === |
+ | '''Section 1''' |
||
− | |||
− | === Exam === |
||
− | |||
− | Exams will be paper-based and will be conducted in a form of problem solving, where the problems will be similar to those mentioned above and will based on the contents taught in lecture slides, lecture discussions (including white-board materials), lab materials, reading materials (including the text books), etc. Students will be given 1-3 hours to complete the exam. |
||
− | |||
− | === Retake 1 === |
||
+ | '''Section 2''' |
||
− | First retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course. |
||
+ | '''Section 3''' |
||
− | === Retake 2 === |
||
+ | '''Section 4''' |
||
− | Second retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course. |
Latest revision as of 12:57, 12 July 2022
Introduction to Machine Learning
- Course name: Introduction to Machine Learning
- Code discipline: R-01
- Subject area:
Short Description
This course covers the following concepts: Machine learning paradigms; Machine Learning approaches, and algorithms.
Prerequisites
Prerequisite subjects
- CSE202 — Analytical Geometry and Linear Algebra I
- CSE204 — Analytical Geometry and Linear Algebra II
- CSE201 — Mathematical Analysis I
- CSE203 — Mathematical Analysis II
- CSE206 — Probability And Statistics
- CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management.
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Supervised Learning |
|
Decision Trees and Ensemble Methods |
|
Unsupervised Learning |
|
Deep Learning |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems.
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Different learning paradigms
- A wide variety of learning approaches and algorithms
- Various learning settings
- Performance metrics
- Popular machine learning software tools
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- Difference between different learning paradigms
- Difference between classification and regression
- Concept of learning theory (bias/variance tradeoffs and large margins etc.)
- Kernel methods
- Regularization
- Ensemble Learning
- Neural or Deep Learning
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- Classification approaches to solve supervised learning problems
- Clustering approaches to solve unsupervised learning problems
- Ensemble learning to improve a model’s performance
- Regularization to improve a model’s generalization
- Deep learning algorithms to solve real-world problems
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 90-100 | - |
B. Good | 75-89 | - |
C. Satisfactory | 60-74 | - |
D. Poor | 0-59 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Labs/seminar classes | 0 |
Interim performance assessment | 40 |
Exams | 60 |
Recommendations for students on how to succeed in the course
Resources, literature and reference materials
Open access resources
- T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013.
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011.
- Tom M Mitchel. Machine Learning, McGraw Hill
- Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 | Section 2 | Section 3 | Section 4 |
---|---|---|---|---|
Development of individual parts of software product code | 1 | 1 | 1 | 1 |
Homework and group projects | 1 | 1 | 1 | 1 |
Midterm evaluation | 1 | 1 | 1 | 1 |
Testing (written or computer based) | 1 | 1 | 1 | 1 |
Discussions | 1 | 1 | 1 | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Is it true that in simple linear regression and the squared correlation between X and Y are identical? | 1 |
Question | What are the two assumptions that the Linear regression model makes about the Error Terms? | 1 |
Question | Fit a regression model to a given data problem, and support your choice of the model. | 1 |
Question | In a list of given tasks, choose which are regression and which are classification tasks. | 1 |
Question | In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net? | 1 |
Question | Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case. | 1 |
Question | What is perceptron learning algorithm? | 1 |
Question | Write the mathematical form of its minimization objective for a two-dimensional case. | 1 |
Question | What is a max-margin classifier? | 1 |
Question | Explain the role of slack variable in SVM. | 1 |
Question | How to implement various regression models to solve different regression problems? | 0 |
Question | Describe the difference between different types of regression models, their pros and cons, etc. | 0 |
Question | Implement various classification models to solve different classification problems. | 0 |
Question | Describe the difference between Logistic regression and naive bayes. | 0 |
Question | Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems. | 0 |
Question | Solve a given optimization problem using the Lagrange multiplier method. | 0 |
Section 2
Activity Type | Content | Is Graded? |
---|---|---|
Question | What are pros and cons of decision trees over other classification models? | 1 |
Question | Explain how tree-pruning works. | 1 |
Question | What is the purpose of ensemble learning? | 1 |
Question | What is a bootstrap, and what is its role in Ensemble learning? | 1 |
Question | Explain the role of slack variable in SVM. | 1 |
Question | Implement different variants of decision trees to solve different classification problems. | 0 |
Question | Solve a given classification problem problem using an ensemble classifier. | 0 |
Question | Implement Adaboost for a given problem. | 0 |
Section 3
Activity Type | Content | Is Graded? |
---|---|---|
Question | Which implicit or explicit objective function does K-means implement? | 1 |
Question | Explain the difference between k-means and k-means++. | 1 |
Question | Whaat is single-linkage and what are its pros and cons? | 1 |
Question | Explain how DBSCAN works. | 1 |
Question | Implement different clustering algorithms to solve to solve different clustering problems. | 0 |
Question | Implement Mean-shift for video tracking | 0 |
Section 4
Activity Type | Content | Is Graded? |
---|---|---|
Question | What is a fully connected feed-forward ANN? | 1 |
Question | Explain different hyperparameters of CNNs. | 1 |
Question | Calculate KL-divergence between two probability distributions. | 1 |
Question | What is a generative model and how is it different from a discriminative model? | 1 |
Question | Implement different types of ANNs to solve to solve different classification problems. | 0 |
Question | Calculate KL-divergence between two probability distributions. | 0 |
Question | Implement different generative models for different problems. | 0 |
Final assessment
Section 1
- What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant?
- Given a fitted regression model to a dataset, interpret its coefficients.
- Explain which regression model would be a better fit to model the relationship between response and predictor in a given data.
- If the number of training examples goes to infinity, how will it affect the bias and variance of a classification model?
- Given a two dimensional classification problem, determine if by using Logistic regression and regularization, a linear boundary can be estimated or not.
- Explain which classification model would be a better fit to for a given classification problem.
- Consider the Leave-one-out-CV error of standard two-class SVM. Argue that under a given value of slack variable, a given mathematical statement is either correct or incorrect.
- How does the choice of slack variable affect the bias-variance tradeoff in SVM?
- Explain which Kernel would be a better fit to be used in SVM for a given data.
Section 2
- When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data?
- Argue if an ensemble model would be a better choice for a given classification problem or not.
- Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier.
Section 3
- K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize?
- Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method?
- What are the characterics of noise points in DBSCAN?
Section 4
- Explain what is ReLU, what are its different variants, and what are their pros and cons?
- Calculate the number of parameters to be learned during training in a CNN, given all important information.
- Explain how a VAE can be used as a generative model.
The retake exam
Section 1
Section 2
Section 3
Section 4