MSc: Advanced Statistics

From IU
Revision as of 11:34, 29 August 2022 by R.sirgalina (talk | contribs)
Jump to navigation Jump to search

Advanced Statistics

  • Course name: Advanced Statistics
  • Code discipline: DS-03
  • Subject area:

Short Description

This course covers the following concepts: Statistical inference; Non parametric statistics; Test of statistical hypotheses; Simple linear regression and correlation analysis; Meta-Analysis.

Prerequisites

Prerequisite subjects

  • CSE329 - Empirical Methods

Prerequisite topics

Course Topics

Course Sections and Topics
Section Topics within the section
Sampling Distributions Associated with the Normal Population
  1. Kolmogorov-Smirnov test
  2. Size of samples, Kolmogorov-Smirnov, Fisher exact
  3. Logistic regression

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

The main purpose of this course is to present the fundamentals of inferential statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. The course covers the statistical analysis of data with limited assumptions on the distribution, with reference to testing hypotheses, measuring correlations, building samples, and performing regressions.

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should be able to ...

  • Remember the fundamentals of inferential statistics
  • Remember the specifics and purpose of different hypothesis tests
  • Distinguish between parametric and non parametric tests

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

  • the basic concepts of inferential statistics
  • the fundamental laws in statistics
  • the concept of null and alternative hypotheses
  • the hypotheses test procedure

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

  • To understand the problems related to analyse statistically data not distributed normally
  • To know the more recent computationally-intensive techniques that can help to describe samples and to infer properties of populations in absence of normality
  • To identify situations when the data is on nominal scales so alternative techniques should be use, and act accordingly.
  • To be able to run experiment to evaluate hypotheses for situation of scarce data, distributed non normally, on different kinds of scales.

Grading

Course grading range

Grade Range Description of performance
A. Excellent 95-100 -
B. Good 75-94 -
C. Satisfactory 55-74 -
D. Poor 0-54 -

Course activities and grading breakdown

Activity Type Percentage of the overall course grade
Weekly quizzes 10
Midterm 20
Final oral exam 35
Final written exam 30
Participation 5
Weekly quizzes 15
Weekly Projects Review 15
Mid of Semester Project Review 20
Final Report 30
Final Presentation with Q&A 20

Recommendations for students on how to succeed in the course

Resources, literature and reference materials

Open access resources

  • Wasserman L. (2006) All of Nonparametric Statistics. Springer
  • Randles, R.H. and Wolfe, D.A. (1991). Introduction to the Theory of Nonparametric Statistics. Melbourne: Robert Krieger. (Ch.1‐Ch.4)
  • Hastie, T. Tibshirani, R. and Friedman, J. (2008) The Elements of Statistical Learning 2ed. Springer
  • Hollander, M. and Wolfe, D.A. (1999). Nonparametric Statistical Methods, 2nd ed. New York: John Wiley.

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities Section 1
Testing (written or computer based) 1
Discussions 1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1

Activity Type Content Is Graded?
Question Let X1,X2, ...,X10 be a random sample from a distribution whose probability density function is , otherwise 0). Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 0.38, 0.24, test the hypothesis H0 : X UNIF(0, 1) against H1 : X UNIF(0, 1) at a significance level = 0.1. 1
Question If X1,X2, ...,Xn is a random sample from a distribution with density function , otherwise 0), what is the maximum likelihood estimator of  ? 1
Question Let X1,X2, ...,Xn be a random sample of size n from a distribution with a probability density function otherwise 0), where is a parameter. Using the maximum likelihood method find an estimator for the parameter . 1
Question Suppose you are told that the likelihood of at is given by 1/4. Is this the probability that  ? Explain why or why not. 1
Question If X1,X2, ...,Xn is a random sample from a distribution with density function otherwise 0), then what is the maximum likelihood estimator of  ? 0
Question Let X1,X2, ...,Xn be a random sample from a normal population with mean and variance . What are the maximum likelihood estimators of and  ? 0
Question Suppose that you have the following data points: 0.36, 0.32, 0.10, 0.13, 0.45, 0.11, 0.12, 0.09; compute Dn to determine if they come from the uniform distribution [0,0.5]. 0
Question The data on the heights of 12 infants are given below: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test the hypothesis that the data came from some normal population at a significance level = 0.1. 0

Final assessment

Section 1

  1. Providing full example of two sequences (in case of computational overhead, you can approximate at the first decimal digit). Compute their:

Covariance. Pearson’s correlation coefficient. Spearman’s Rank Correlation Coefficient. Kendall’s tau Correlation coefficient.

  1. What is an empirical distribution?
  2. Present, prove, and discuss the evaluation of the asymptotic confidence interval for the empirical distribution, detailing the role of the binomial.
  3. Prove, under the simplified hypotheses, the distribution free property of Dn.
  4. Write the Shannon Theorem and discuss its implications.
  5. Discuss how we could proceed to compute the confidence interval of the Kendall Tau correlation coefficient of the population.
  6. Suppose that you have the following datapoints: 0.4, 2, 0.6, 2.4, 2.2, 3.6, 3.8, 4; compute Dn to determine if they come from the uniform distribution [0,4].
  7. Prove that is a consistent and unbiased estimator of F.

The retake exam

Section 1