Difference between revisions of "MSc: Advanced Statistics"
R.sirgalina (talk | contribs) |
R.sirgalina (talk | contribs) |
||
Line 1: | Line 1: | ||
+ | |||
= Advanced Statistics = |
= Advanced Statistics = |
||
+ | * '''Course name''': Advanced Statistics |
||
+ | * '''Code discipline''': DS-03 |
||
+ | * '''Subject area''': |
||
+ | == Short Description == |
||
− | * <span>'''Course name:'''</span> Advanced Statistics |
||
+ | This course covers the following concepts: Statistical inference; Non parametric statistics; Test of statistical hypotheses; Simple linear regression and correlation analysis; Meta-Analysis. |
||
− | * <span>'''Course number:'''</span> DS-03 |
||
− | == |
+ | == Prerequisites == |
− | === Key concepts of the class === |
||
+ | === Prerequisite subjects === |
||
− | * Statistical inference |
||
+ | * CSE329 - Empirical Methods |
||
− | * Non parametric statistics |
||
− | * Test of statistical hypotheses |
||
− | * Simple linear regression and correlation analysis |
||
− | * Meta-Analysis |
||
+ | === Prerequisite topics === |
||
− | === What is the purpose of this course? === |
||
− | The main purpose of this course is to present the fundamentals of inferential statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. The course covers the statistical analysis of data with limited assumptions on the distribution, with reference to testing hypotheses, measuring correlations, building samples, and performing regressions. |
||
− | == |
+ | == Course Topics == |
+ | {| class="wikitable" |
||
− | This course will benefit from good English language skills. Also it could be great if you acquire basic statistical skills in the perspective of [https://en.wikipedia.org/wiki/Empirical_research Empirical research]: statistical hypothesis testing, dependent and independent variables, distributions, etc.. |
||
+ | |+ Course Sections and Topics |
||
− | * [https://eduwiki.innopolis.university/index.php/MSc:_Empirical_Methods CSE329 - Empirical Methods] |
||
+ | |- |
||
+ | ! Section !! Topics within the section |
||
+ | |- |
||
+ | | Sampling Distributions Associated with the Normal Population || |
||
+ | # Kolmogorov-Smirnov test |
||
+ | # Size of samples, Kolmogorov-Smirnov, Fisher exact |
||
+ | # Logistic regression |
||
+ | |} |
||
+ | == Intended Learning Outcomes (ILOs) == |
||
+ | === What is the main purpose of this course? === |
||
− | == Course Objectives Based on Bloom’s Taxonomy == |
||
+ | The main purpose of this course is to present the fundamentals of inferential statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. The course covers the statistical analysis of data with limited assumptions on the distribution, with reference to testing hypotheses, measuring correlations, building samples, and performing regressions. |
||
− | === |
+ | === ILOs defined at three levels === |
− | |||
− | By the end of the course, the students should be able to: |
||
+ | ==== Level 1: What concepts should a student know/remember/explain? ==== |
||
+ | By the end of the course, the students should be able to ... |
||
* Remember the fundamentals of inferential statistics |
* Remember the fundamentals of inferential statistics |
||
* Remember the specifics and purpose of different hypothesis tests |
* Remember the specifics and purpose of different hypothesis tests |
||
* Distinguish between parametric and non parametric tests |
* Distinguish between parametric and non parametric tests |
||
− | === What should a student be able to |
+ | ==== Level 2: What basic practical skills should a student be able to perform? ==== |
+ | By the end of the course, the students should be able to ... |
||
− | |||
− | By the end of the course, the students should be able to understand: |
||
− | |||
* the basic concepts of inferential statistics |
* the basic concepts of inferential statistics |
||
* the fundamental laws in statistics |
* the fundamental laws in statistics |
||
Line 40: | Line 47: | ||
* the hypotheses test procedure |
* the hypotheses test procedure |
||
− | === What should a student be able to apply |
+ | ==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ==== |
− | |||
By the end of the course, the students should be able to ... |
By the end of the course, the students should be able to ... |
||
− | |||
* To understand the problems related to analyse statistically data not distributed normally |
* To understand the problems related to analyse statistically data not distributed normally |
||
* To know the more recent computationally-intensive techniques that can help to describe samples and to infer properties of populations in absence of normality |
* To know the more recent computationally-intensive techniques that can help to describe samples and to infer properties of populations in absence of normality |
||
* To identify situations when the data is on nominal scales so alternative techniques should be use, and act accordingly. |
* To identify situations when the data is on nominal scales so alternative techniques should be use, and act accordingly. |
||
− | * To be able to run experiment to evaluate hypotheses for situation of scarce data, distributed non normally, on different kinds of scales. |
+ | * To be able to run experiment to evaluate hypotheses for situation of scarce data, distributed non normally, on different kinds of scales. |
+ | == Grading == |
||
− | |||
− | === Course evaluation (Standard) === |
||
+ | === Course grading range === |
||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
+ | {| class="wikitable" |
||
− | |+ Course grade breakdown |
||
+ | |+ |
||
− | ! |
||
− | ! |
||
− | !align="center"| '''Points''' |
||
|- |
|- |
||
+ | ! Grade !! Range !! Description of performance |
||
− | | Weekly quizzes |
||
− | | |
||
− | |align="center"| 10 |
||
|- |
|- |
||
+ | | A. Excellent || 95-100 || - |
||
− | | Midterm |
||
− | | |
||
− | |align="center"| 20 |
||
|- |
|- |
||
+ | | B. Good || 75-94 || - |
||
− | | Final oral exam |
||
− | | |
||
− | |align="center"| 35 |
||
|- |
|- |
||
+ | | C. Satisfactory || 55-74 || - |
||
− | | Final written exam |
||
− | | |
||
− | |align="center"| 30 |
||
|- |
|- |
||
+ | | D. Poor || 0-54 || - |
||
− | | Participation |
||
− | | |
||
− | |align="center"| 5 |
||
|} |
|} |
||
− | === Course |
+ | === Course activities and grading breakdown === |
+ | {| class="wikitable" |
||
− | |||
+ | |+ |
||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
− | |+ Course grade breakdown |
||
− | ! |
||
− | ! |
||
− | !align="center"| '''Points''' |
||
|- |
|- |
||
+ | ! Activity Type !! Percentage of the overall course grade |
||
− | | Weekly quizzes |
||
− | | |
||
− | |align="center"| 15 |
||
|- |
|- |
||
− | | Weekly |
+ | | Weekly quizzes || 10 |
− | | |
||
− | |align="center"| 15 |
||
|- |
|- |
||
+ | | Midterm || 20 |
||
− | | Mid of Semester Project Review |
||
− | | |
||
− | |align="center"| 20 |
||
|- |
|- |
||
− | | Final |
+ | | Final oral exam || 35 |
− | | |
||
− | |align="center"| 30 |
||
|- |
|- |
||
− | | Final |
+ | | Final written exam || 30 |
− | | |
+ | |- |
+ | | Participation || 5 |
||
− | |align="center"| 20 |
||
− | | |
+ | |- |
+ | | Weekly quizzes || 15 |
||
− | |||
− | === Grades range === |
||
− | |||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
− | |+ Course grading range |
||
− | ! |
||
− | ! |
||
− | !align="center"| '''Range''' |
||
|- |
|- |
||
+ | | Weekly Projects Review || 15 |
||
− | | A. Excellent |
||
− | | |
||
− | |align="center"| 95-100 |
||
|- |
|- |
||
+ | | Mid of Semester Project Review || 20 |
||
− | | B. Good |
||
− | | |
||
− | |align="center"| 75-94 |
||
|- |
|- |
||
+ | | Final Report || 30 |
||
− | | C. Satisfactory |
||
− | | |
||
− | |align="center"| 55-74 |
||
|- |
|- |
||
+ | | Final Presentation with Q&A || 20 |
||
− | | D. Poor |
||
− | | |
||
− | |align="center"| 0-54 |
||
|} |
|} |
||
+ | === Recommendations for students on how to succeed in the course === |
||
− | === Cooperation policy and quotations === |
||
− | We encourage vigorous discussion and cooperation in this class. You should feel free to discuss any aspects of the class with any classmates. However, we insist that any written material that is not specifically designated as a Team Deliverable be done by you alone. This includes answers to reading questions, individual reports associated with assignments, and labs. We also insist that if you include verbatim text from any source, you clearly indicate it using standard conventions of quotation or indentation and a note to indicate the source. |
||
+ | == Resources, literature and reference materials == |
||
− | === |
+ | === Open access resources === |
− | |||
* Wasserman L. (2006) All of Nonparametric Statistics. Springer |
* Wasserman L. (2006) All of Nonparametric Statistics. Springer |
||
* Randles, R.H. and Wolfe, D.A. (1991). Introduction to the Theory of Nonparametric Statistics. Melbourne: Robert Krieger. (Ch.1‐Ch.4) |
* Randles, R.H. and Wolfe, D.A. (1991). Introduction to the Theory of Nonparametric Statistics. Melbourne: Robert Krieger. (Ch.1‐Ch.4) |
||
Line 144: | Line 108: | ||
* Hollander, M. and Wolfe, D.A. (1999). Nonparametric Statistical Methods, 2nd ed. New York: John Wiley. |
* Hollander, M. and Wolfe, D.A. (1999). Nonparametric Statistical Methods, 2nd ed. New York: John Wiley. |
||
− | == |
+ | === Closed access resources === |
+ | |||
+ | === Software and tools used within the course === |
||
− | The main sections of the course and approximate hour distribution between them is as follows: |
||
+ | |||
+ | = Teaching Methodology: Methods, techniques, & activities = |
||
+ | == Activities and Teaching Methods == |
||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
+ | {| class="wikitable" |
||
− | |+ Course Sections |
||
+ | |+ Activities within each section |
||
− | !align="center"| '''Section''' |
||
− | ! '''Section Title''' |
||
− | !align="center"| '''Teaching Hours''' |
||
|- |
|- |
||
+ | ! Learning Activities !! Section 1 |
||
− | |align="center"| 1 |
||
− | | Sampling Distributions Associated with the Normal Population |
||
− | |align="center"| 15 |
||
|- |
|- |
||
+ | | Testing (written or computer based) || 1 |
||
− | |align="center"| 2 |
||
− | | Test of Statistical Hypotheses |
||
− | |align="center"| 30 |
||
|- |
|- |
||
+ | | Discussions || 1 |
||
− | |align="center"| 3 |
||
+ | |} |
||
− | | Simple Linear Regression and Correlation Analysis |
||
+ | == Formative Assessment and Course Activities == |
||
− | |align="center"| 15 |
||
− | |} |
||
− | |||
− | === Section 1 === |
||
− | |||
− | ==== Section title: ==== |
||
− | |||
− | Sampling Distributions Associated with the Normal Population |
||
− | |||
− | === Topics covered in this section: === |
||
− | |||
− | * Introduction to the course, toward inference |
||
− | * Student’s t-distribution |
||
− | * Bernoulli and binomial distribution |
||
− | * Chi-square distribution |
||
− | * Snedecor’s F-distribution |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
+ | === Ongoing performance assessment === |
||
+ | ==== Section 1 ==== |
||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
+ | {| class="wikitable" |
||
− | !align="center"| '''Evaluation''' |
||
+ | |+ |
||
− | !align="center"| '''Yes/No''' |
||
|- |
|- |
||
+ | ! Activity Type !! Content !! Is Graded? |
||
− | | Development of individual parts of software product code |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | Question || Let X1,X2, ...,X10 be a random sample from a distribution whose probability density function is <math>{\textstyle f(x)=(1\quad if\;0<x<1}</math> , otherwise 0). Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 0.38, 0.24, test the hypothesis H0 : X UNIF(0, 1) against H1 : X UNIF(0, 1) at a significance level = 0.1. || 1 |
||
− | | Homework and group projects |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | Question || If X1,X2, ...,Xn is a random sample from a distribution with density function <math>{\textstyle f(x)=((1-\theta )x^{\theta }\;if\;0<x<1}</math> , otherwise 0), what is the maximum likelihood estimator of <math>{\textstyle \theta }</math> ? || 1 |
||
− | | Midterm evaluation |
||
− | |align="center"| 1 |
||
|- |
|- |
||
+ | | Question || Let X1,X2, ...,Xn be a random sample of size n from a distribution with a probability density function <math>{\textstyle f(x)=((1-\theta )x^{\theta }\;if\;0<x<1,}</math> otherwise 0), where <math>{\textstyle 0<\theta }</math> is a parameter. Using the maximum likelihood method find an estimator for the parameter <math>{\textstyle \theta }</math> . || 1 |
||
− | | Testing (written or computer based) |
||
− | |align="center"| 1 |
||
|- |
|- |
||
+ | | Question || Suppose you are told that the likelihood of <math>{\textstyle \theta }</math> at <math>{\textstyle \theta =2}</math> is given by 1/4. Is this the probability that <math>{\textstyle \theta =2}</math> ? Explain why or why not. || 1 |
||
− | | Reports |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | Question || If X1,X2, ...,Xn is a random sample from a distribution with density function<math>{\textstyle f(x)=({\frac {1}{\theta }}\;if\;0<x<1,}</math> otherwise 0), then what is the maximum likelihood estimator of <math>{\textstyle \theta }</math> ? || 0 |
||
− | | Essays |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | Question || Let X1,X2, ...,Xn be a random sample from a normal population with mean <math>{\textstyle \mu }</math> and variance <math>{\textstyle \sigma ^{2}}</math> . What are the maximum likelihood estimators of <math>{\textstyle \mu }</math> and <math>{\textstyle \sigma ^{2}}</math> ? || 0 |
||
− | | Oral polls |
||
− | |align="center"| 0 |
||
|- |
|- |
||
+ | | Question || Suppose that you have the following data points: 0.36, 0.32, 0.10, 0.13, 0.45, 0.11, 0.12, 0.09; compute Dn to determine if they come from the uniform distribution [0,0.5]. || 0 |
||
− | | Discussions |
||
− | |align="center"| 1 |
||
− | |} |
||
− | |||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # Deduce the probability mass function <math display="inline">P(X \leq k</math> for a binomial distribution? |
||
− | # Let X1,...,Xk be ''k'' iid random variables distributed with a <math display="inline">\chi^2</math> distribution with n1,...nk degrees of freedom respectively.<br /> |
||
− | What is the distribution of Y=X1+...+Xk? Define it precisely and prove the answer formally? |
||
− | # List at least 3 random variables that “tend to follow” a t distribution? |
||
− | # If X has Chi square function with the 5 degrees of freedom, then what is the probability that X is between 1.145 and 12.83? |
||
− | # If X has a gamma distribution of (1,1), then what is the probability density function of the random variable 2X? |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # Define and provide examples of sample space, events and probability measure. |
||
− | # Write the formula for the coefficients of the simple linear regression. Explain the mathematical procedure you do to derive them and derive them. |
||
− | # Calculate the correlation between two functions and explain its meaning. |
||
− | # Calculate the Pearson coefficient for the given functions. |
||
− | # Deduce the MGF for normal distribution. |
||
− | # State and prove the Bonferroni inequality. |
||
− | |||
− | === Test questions for final assessment in this section === |
||
− | |||
− | |||
− | == Test of Statistical Hypotheses == |
||
− | |||
− | === Topics covered in this section: === |
||
− | |||
− | * Z-test |
||
− | * Student’s t-test |
||
− | * Chi-square test |
||
− | * Snedecor’s F-test |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
− | |||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # Define the concept of power of a statistical test. |
||
− | # Define the purpose of the F Test, its hypotheses, and its structure. |
||
− | # Define the purpose of the t-Test, its hypotheses, and its structure. |
||
− | # Define the purpose of the Chi square Test, its hypotheses, and its structure. |
||
− | # Define the purpose of the Z Test, its hypotheses, and its structure. |
||
− | # Provide concrete numeric examples with explanation on why the power of a test depends on: |
||
− | ## the size of the data sets. |
||
− | ## the magnitude of the effect. |
||
− | ## the level of statistical significance. |
||
− | # Given a statistical test for which we have set a value <math display="inline">\alpha</math> we obtain a p: |
||
− | ## if we can reject H0 <math display="inline">(p < \alpha)</math>, what we typically say about H0 and H1. |
||
− | ## if we cannot reject H0 <math display="inline">(P \geq \alpha)</math>, what we can typically say about H0 and H1. |
||
− | ## when can we say that H0 holds? |
||
− | ## when can we say that H1 holds? |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # Provide a concrete example of a t test, detailing both H0 and H1. |
||
− | # Present the structure of the F test for the analysis of the variance. |
||
− | # Explain what are H0 and H1 in hypothesis testing. |
||
− | # Explain the role of the Bonferroni inequality in hypothesis testing. |
||
− | |||
− | === Test questions for final assessment in this section === |
||
− | |||
− | |||
− | == Simple Linear Regression and Correlation Analysis == |
||
− | |||
− | ==== Topics covered in this section: ==== |
||
− | |||
− | * Kolmogorov-Smirnov test |
||
− | * Size of samples, Kolmogorov-Smirnov, Fisher exact |
||
− | * Logistic regression |
||
− | |||
− | === What forms of evaluation were used to test students’ performance in this section? === |
||
− | |||
− | |||
− | {| style="border-spacing: 2px; border: 1px solid darkgray;" |
||
− | !align="center"| '''Evaluation''' |
||
− | !align="center"| '''Yes/No''' |
||
|- |
|- |
||
+ | | Question || The data on the heights of 12 infants are given below: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test the hypothesis that the data came from some normal population at a significance level = 0.1. || 0 |
||
− | | Development of individual parts of software product code |
||
+ | |} |
||
− | |align="center"| 0 |
||
+ | === Final assessment === |
||
− | |- |
||
+ | '''Section 1''' |
||
− | | Homework and group projects |
||
− | |align="center"| 0 |
||
− | |- |
||
− | | Midterm evaluation |
||
− | |align="center"| 0 |
||
− | |- |
||
− | | Testing (written or computer based) |
||
− | |align="center"| 1 |
||
− | |- |
||
− | | Reports |
||
− | |align="center"| 0 |
||
− | |- |
||
− | | Essays |
||
− | |align="center"| 0 |
||
− | |- |
||
− | | Oral polls |
||
− | |align="center"| 0 |
||
− | |- |
||
− | | Discussions |
||
− | |align="center"| 1 |
||
− | |} |
||
− | |||
− | === Typical questions for ongoing performance evaluation within this section === |
||
− | |||
− | # Let X1,X2, ...,X10 be a random sample from a distribution whose probability density function is <math display="inline">f(x) = (1 \quad if \;0 < x < 1</math>, otherwise 0). Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 0.38, 0.24, test the hypothesis H0 : X UNIF(0, 1) against H1 : X UNIF(0, 1) at a significance level = 0.1. |
||
− | # If X1,X2, ...,Xn is a random sample from a distribution with density function <math display="inline">f(x) = ((1-\theta)x^\theta \; if \; 0 < x < 1</math>, otherwise 0), what is the maximum likelihood estimator of <math display="inline">\theta</math>? |
||
− | # Let X1,X2, ...,Xn be a random sample of size n from a distribution with a probability density function <math display="inline">f(x) = ((1-\theta)x^\theta \; if \; 0 < x < 1,</math> otherwise 0), where <math display="inline">0 < \theta</math> is a parameter. Using the maximum likelihood method find an estimator for the parameter <math display="inline">\theta</math>. |
||
− | # Suppose you are told that the likelihood of <math display="inline">\theta</math> at <math display="inline">\theta=2</math> is given by 1/4. Is this the probability that <math display="inline">\theta=2</math>? Explain why or why not. |
||
− | |||
− | === Typical questions for seminar classes (labs) within this section === |
||
− | |||
− | # If X1,X2, ...,Xn is a random sample from a distribution with density function<math display="inline">f(x) = (\frac{1}{\theta} \;if \; 0 < x < 1,</math> otherwise 0), then what is the maximum likelihood estimator of <math display="inline">\theta</math>? |
||
− | # Let X1,X2, ...,Xn be a random sample from a normal population with mean <math display="inline">\mu</math> and variance <math display="inline">\sigma^2</math>. What are the maximum likelihood estimators of <math display="inline">\mu</math> and <math display="inline">\sigma^2</math>? |
||
− | # Suppose that you have the following data points: 0.36, 0.32, 0.10, 0.13, 0.45, 0.11, 0.12, 0.09; compute Dn to determine if they come from the uniform distribution [0,0.5]. |
||
− | # The data on the heights of 12 infants are given below: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test the hypothesis that the data came from some normal population at a significance level = 0.1. |
||
− | |||
− | === Test questions for final assessment in the course === |
||
− | |||
# Providing full example of two sequences (in case of computational overhead, you can approximate at the first decimal digit). Compute their: |
# Providing full example of two sequences (in case of computational overhead, you can approximate at the first decimal digit). Compute their: |
||
− | + | Covariance. |
|
− | + | Pearson’s correlation coefficient. |
|
− | + | Spearman’s Rank Correlation Coefficient. |
|
− | + | Kendall’s tau Correlation coefficient. |
|
# What is an empirical distribution? |
# What is an empirical distribution? |
||
# Present, prove, and discuss the evaluation of the asymptotic confidence interval for the empirical distribution, detailing the role of the binomial. |
# Present, prove, and discuss the evaluation of the asymptotic confidence interval for the empirical distribution, detailing the role of the binomial. |
||
Line 339: | Line 164: | ||
# Discuss how we could proceed to compute the confidence interval of the Kendall Tau correlation coefficient of the population. |
# Discuss how we could proceed to compute the confidence interval of the Kendall Tau correlation coefficient of the population. |
||
# Suppose that you have the following datapoints: 0.4, 2, 0.6, 2.4, 2.2, 3.6, 3.8, 4; compute Dn to determine if they come from the uniform distribution [0,4]. |
# Suppose that you have the following datapoints: 0.4, 2, 0.6, 2.4, 2.2, 3.6, 3.8, 4; compute Dn to determine if they come from the uniform distribution [0,4]. |
||
− | # Prove that <math |
+ | # Prove that <math>{\textstyle {\tilde {F}}_{n}}</math> is a consistent and unbiased estimator of F. |
+ | |||
+ | === The retake exam === |
||
+ | '''Section 1''' |
Revision as of 11:34, 29 August 2022
Advanced Statistics
- Course name: Advanced Statistics
- Code discipline: DS-03
- Subject area:
Short Description
This course covers the following concepts: Statistical inference; Non parametric statistics; Test of statistical hypotheses; Simple linear regression and correlation analysis; Meta-Analysis.
Prerequisites
Prerequisite subjects
- CSE329 - Empirical Methods
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Sampling Distributions Associated with the Normal Population |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
The main purpose of this course is to present the fundamentals of inferential statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. The course covers the statistical analysis of data with limited assumptions on the distribution, with reference to testing hypotheses, measuring correlations, building samples, and performing regressions.
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Remember the fundamentals of inferential statistics
- Remember the specifics and purpose of different hypothesis tests
- Distinguish between parametric and non parametric tests
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- the basic concepts of inferential statistics
- the fundamental laws in statistics
- the concept of null and alternative hypotheses
- the hypotheses test procedure
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- To understand the problems related to analyse statistically data not distributed normally
- To know the more recent computationally-intensive techniques that can help to describe samples and to infer properties of populations in absence of normality
- To identify situations when the data is on nominal scales so alternative techniques should be use, and act accordingly.
- To be able to run experiment to evaluate hypotheses for situation of scarce data, distributed non normally, on different kinds of scales.
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 95-100 | - |
B. Good | 75-94 | - |
C. Satisfactory | 55-74 | - |
D. Poor | 0-54 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Weekly quizzes | 10 |
Midterm | 20 |
Final oral exam | 35 |
Final written exam | 30 |
Participation | 5 |
Weekly quizzes | 15 |
Weekly Projects Review | 15 |
Mid of Semester Project Review | 20 |
Final Report | 30 |
Final Presentation with Q&A | 20 |
Recommendations for students on how to succeed in the course
Resources, literature and reference materials
Open access resources
- Wasserman L. (2006) All of Nonparametric Statistics. Springer
- Randles, R.H. and Wolfe, D.A. (1991). Introduction to the Theory of Nonparametric Statistics. Melbourne: Robert Krieger. (Ch.1‐Ch.4)
- Hastie, T. Tibshirani, R. and Friedman, J. (2008) The Elements of Statistical Learning 2ed. Springer
- Hollander, M. and Wolfe, D.A. (1999). Nonparametric Statistical Methods, 2nd ed. New York: John Wiley.
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 |
---|---|
Testing (written or computer based) | 1 |
Discussions | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Let X1,X2, ...,X10 be a random sample from a distribution whose probability density function is , otherwise 0). Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 0.38, 0.24, test the hypothesis H0 : X UNIF(0, 1) against H1 : X UNIF(0, 1) at a significance level = 0.1. | 1 |
Question | If X1,X2, ...,Xn is a random sample from a distribution with density function , otherwise 0), what is the maximum likelihood estimator of ? | 1 |
Question | Let X1,X2, ...,Xn be a random sample of size n from a distribution with a probability density function otherwise 0), where is a parameter. Using the maximum likelihood method find an estimator for the parameter . | 1 |
Question | Suppose you are told that the likelihood of at is given by 1/4. Is this the probability that ? Explain why or why not. | 1 |
Question | If X1,X2, ...,Xn is a random sample from a distribution with density function otherwise 0), then what is the maximum likelihood estimator of ? | 0 |
Question | Let X1,X2, ...,Xn be a random sample from a normal population with mean and variance . What are the maximum likelihood estimators of and ? | 0 |
Question | Suppose that you have the following data points: 0.36, 0.32, 0.10, 0.13, 0.45, 0.11, 0.12, 0.09; compute Dn to determine if they come from the uniform distribution [0,0.5]. | 0 |
Question | The data on the heights of 12 infants are given below: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test the hypothesis that the data came from some normal population at a significance level = 0.1. | 0 |
Final assessment
Section 1
- Providing full example of two sequences (in case of computational overhead, you can approximate at the first decimal digit). Compute their:
Covariance. Pearson’s correlation coefficient. Spearman’s Rank Correlation Coefficient. Kendall’s tau Correlation coefficient.
- What is an empirical distribution?
- Present, prove, and discuss the evaluation of the asymptotic confidence interval for the empirical distribution, detailing the role of the binomial.
- Prove, under the simplified hypotheses, the distribution free property of Dn.
- Write the Shannon Theorem and discuss its implications.
- Discuss how we could proceed to compute the confidence interval of the Kendall Tau correlation coefficient of the population.
- Suppose that you have the following datapoints: 0.4, 2, 0.6, 2.4, 2.2, 3.6, 3.8, 4; compute Dn to determine if they come from the uniform distribution [0,4].
- Prove that is a consistent and unbiased estimator of F.
The retake exam
Section 1