<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://eduwiki.innopolis.university/index.php?action=history&amp;feed=atom&amp;title=BSc%3AStatisticalTechniquesForDataScience.previous_version</id>
	<title>BSc:StatisticalTechniquesForDataScience.previous version - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://eduwiki.innopolis.university/index.php?action=history&amp;feed=atom&amp;title=BSc%3AStatisticalTechniquesForDataScience.previous_version"/>
	<link rel="alternate" type="text/html" href="https://eduwiki.innopolis.university/index.php?title=BSc:StatisticalTechniquesForDataScience.previous_version&amp;action=history"/>
	<updated>2026-05-07T15:38:33Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.36.1</generator>
	<entry>
		<id>https://eduwiki.innopolis.university/index.php?title=BSc:StatisticalTechniquesForDataScience.previous_version&amp;diff=6730&amp;oldid=prev</id>
		<title>M.petrishchev: Created page with &quot;= Statistical Techniques for Data Science =  * &lt;span&gt;'''Course name:'''&lt;/span&gt; Statistical Techniques for Data Science * &lt;span&gt;'''Course number:'''&lt;/span&gt; BS-STDS  == Course c...&quot;</title>
		<link rel="alternate" type="text/html" href="https://eduwiki.innopolis.university/index.php?title=BSc:StatisticalTechniquesForDataScience.previous_version&amp;diff=6730&amp;oldid=prev"/>
		<updated>2022-06-15T13:51:42Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;= Statistical Techniques for Data Science =  * &amp;lt;span&amp;gt;&amp;#039;&amp;#039;&amp;#039;Course name:&amp;#039;&amp;#039;&amp;#039;&amp;lt;/span&amp;gt; Statistical Techniques for Data Science * &amp;lt;span&amp;gt;&amp;#039;&amp;#039;&amp;#039;Course number:&amp;#039;&amp;#039;&amp;#039;&amp;lt;/span&amp;gt; BS-STDS  == Course c...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= Statistical Techniques for Data Science =&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;span&amp;gt;'''Course name:'''&amp;lt;/span&amp;gt; Statistical Techniques for Data Science&lt;br /&gt;
* &amp;lt;span&amp;gt;'''Course number:'''&amp;lt;/span&amp;gt; BS-STDS&lt;br /&gt;
&lt;br /&gt;
== Course characteristics ==&lt;br /&gt;
&lt;br /&gt;
=== Key concepts of the class ===&lt;br /&gt;
&lt;br /&gt;
* Statistical Hypothesis Testing&lt;br /&gt;
* Resampling&lt;br /&gt;
* Statistical ML&lt;br /&gt;
* MCMC&lt;br /&gt;
&lt;br /&gt;
=== What is the purpose of this course? ===&lt;br /&gt;
&lt;br /&gt;
The course covers non-standard statistics, applicable in a wide set of contexts, including non-parametric statistics, simulation methods, and time series analysis.&lt;br /&gt;
&lt;br /&gt;
This course will provide an opportunity for participants to learn: random variables, elementary probability, and distributions; relevant probabilistic inequalities; random vectors, marginal and joint distributions; sequences of random variables and concepts of convergences; Markov chains; processes in continuous time; univariate and multivariate simulation methods; non-parametric and parametric resampling methods.&lt;br /&gt;
&lt;br /&gt;
=== Course Objectives Based on Bloom’s Taxonomy ===&lt;br /&gt;
&lt;br /&gt;
=== - What should a student remember at the end of the course? ===&lt;br /&gt;
&lt;br /&gt;
By the end of the course, the students should be able to recognize and define&lt;br /&gt;
&lt;br /&gt;
* Estimation methods: point estimates, MLE&lt;br /&gt;
* Confidence interval, p-value&lt;br /&gt;
* Estimation and Non-parametric Tests. Kolmogorov-Smirnov Test&lt;br /&gt;
* Sampling. Metropolis-Hastings. Markov Chains. MCMC&lt;br /&gt;
&lt;br /&gt;
=== - What should a student be able to understand at the end of the course? ===&lt;br /&gt;
&lt;br /&gt;
By the end of the course, the students should be able to describe and explain (with examples)&lt;br /&gt;
&lt;br /&gt;
* Describe the Statistical Hypothesis Testing, p-value, Power of a test and Sample size&lt;br /&gt;
* Explain ANOVA, Chi-square tests&lt;br /&gt;
* Smoothing methods with examples&lt;br /&gt;
&lt;br /&gt;
=== - What should a student be able to apply at the end of the course? ===&lt;br /&gt;
&lt;br /&gt;
By the end of the course, the students should be able to apply&lt;br /&gt;
&lt;br /&gt;
* Apply Non-parametric Tests, such as KS-test&lt;br /&gt;
* Apply resampling methods (jackknife, bootstrap)&lt;br /&gt;
* Apply Markov chain Monte-Carlo methods&lt;br /&gt;
&lt;br /&gt;
=== Course evaluation ===&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|+ Course grade breakdown&lt;br /&gt;
!&lt;br /&gt;
!&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| '''Proposed points'''&lt;br /&gt;
|-&lt;br /&gt;
| Labs/seminar classes&lt;br /&gt;
| 20&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 40&lt;br /&gt;
|-&lt;br /&gt;
| Interim performance assessment&lt;br /&gt;
| 30&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 30&lt;br /&gt;
|-&lt;br /&gt;
| Exams&lt;br /&gt;
| 50&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 30&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If necessary, please indicate freely your course’s features in terms of students’ performance assessment: None&lt;br /&gt;
&lt;br /&gt;
=== Grades range ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;tab:ModelsCourseGradingRange&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|+ Course grading range&lt;br /&gt;
!&lt;br /&gt;
!&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| '''Proposed range'''&lt;br /&gt;
|-&lt;br /&gt;
| A. Excellent&lt;br /&gt;
| 90-100&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 80-100&lt;br /&gt;
|-&lt;br /&gt;
| B. Good&lt;br /&gt;
| 75-89&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 65-79&lt;br /&gt;
|-&lt;br /&gt;
| C. Satisfactory&lt;br /&gt;
| 60-74&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 50-64&lt;br /&gt;
|-&lt;br /&gt;
| D. Poor&lt;br /&gt;
| 0-59&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 0-49&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
If necessary, please indicate freely your course’s grading features: The semester starts with the default range as proposed in the Table [[#tab:ModelsCourseGradingRange|1]], but it may change slightly (usually reduced) depending on how the semester progresses.&lt;br /&gt;
&lt;br /&gt;
=== Resources and reference material ===&lt;br /&gt;
&lt;br /&gt;
* Murphy K.P. Machine Learning: A Probabilistic Perspective. Massachusetts Institute of Technology, 2012. — 1067 p.&lt;br /&gt;
* Bishop Christopher. Pattern Recognition and Machine Learning. Springer, 2006. — 738 p.&lt;br /&gt;
* M. Ross. Introduction to Statistics. Prentice Hall. 1989&lt;br /&gt;
* Efron, R. J. Tibshirani. An introduction to the bootstrap. Springer. 1993&lt;br /&gt;
* G. Casella, R. L. Berger. Statistical Inference. Thomson Press. 2006&lt;br /&gt;
* S. Hojsgaard, D. Edwards, S. Lauritzen. Graphical Models with R. Springer. 2012&lt;br /&gt;
* Hastie, T. Tibshirani, R. and Friedman, J. (2008) The Elements of Statistical Learning 2ed. Springer&lt;br /&gt;
* Steven M. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory (v. 1). Prentice Hall. 1993&lt;br /&gt;
&lt;br /&gt;
== Course Sections ==&lt;br /&gt;
&lt;br /&gt;
The main sections of the course and approximate hour distribution between them is as follows:&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|+ Course Sections&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| '''Section'''&lt;br /&gt;
! '''Section Title'''&lt;br /&gt;
!align=&amp;quot;center&amp;quot;| '''Teaching Hours'''&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 1&lt;br /&gt;
| Parametric Statistics&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 42&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 2&lt;br /&gt;
| Non-parametric Statistics&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 24&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 3&lt;br /&gt;
| Sampling and Simulation&lt;br /&gt;
|align=&amp;quot;center&amp;quot;| 24&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Section 1 ===&lt;br /&gt;
&lt;br /&gt;
=== Section title: ===&lt;br /&gt;
&lt;br /&gt;
Parametric Statistics&lt;br /&gt;
&lt;br /&gt;
=== Topics covered in this section: ===&lt;br /&gt;
&lt;br /&gt;
* Review of Probability Theory. Random variables. Density. Distributions. Expected value&lt;br /&gt;
* Exploring the Data Distributions. Multivariate distributions. Plots&lt;br /&gt;
* Data and Sampling Distributions. Standard error. CLT&lt;br /&gt;
* Experiment Design. Confidence intervals. Introduction to Hypotheses Testing&lt;br /&gt;
* A/B Testing, T-test, ANOVA, Chi-square.&lt;br /&gt;
&lt;br /&gt;
=== What forms of evaluation were used to test students’ performance in this section? ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;tabular&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span&amp;gt;|a|c|&amp;lt;/span&amp;gt; &amp;amp;amp; '''Yes/No'''&amp;lt;br /&amp;gt;&lt;br /&gt;
Development of individual parts of software product code &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Homework and group projects &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Midterm evaluation &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Testing (written or computer based) &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Reports &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Essays &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Oral polls &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Discussions &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
=== Typical questions for ongoing performance evaluation within this section ===&lt;br /&gt;
&lt;br /&gt;
# What is the Central Limit Theorem?&lt;br /&gt;
# What is statistic?&lt;br /&gt;
# What is sampling distribution?&lt;br /&gt;
# What is standard error?&lt;br /&gt;
# What are Type I and Type II errors?&lt;br /&gt;
# What is t-statistic error? T-test?&lt;br /&gt;
&lt;br /&gt;
=== Typical questions for seminar classes (labs) within this section ===&lt;br /&gt;
&lt;br /&gt;
# Create a bi-modal dataset, which has the mean less than the median, draw a histogram.&lt;br /&gt;
# Poisson Distribution in practice: seatching patterns of palindromes in DNA.&lt;br /&gt;
# Experiments and A/B testing.&lt;br /&gt;
# A researcher claims that Democrats will win the next election. 4300 voters were polled; 2200 said they would vote Democrat. Decide if you should support or reject null hypothesis. Is there enough evidence at ''alpha'' = 0.05 to support this claim?&lt;br /&gt;
&lt;br /&gt;
=== Test questions for final assessment in this section ===&lt;br /&gt;
&lt;br /&gt;
# Prove Chebyshov inequality&lt;br /&gt;
# Prove Markov inequality&lt;br /&gt;
# What is ANOVA, what is the difference with Chi-square test?&lt;br /&gt;
&lt;br /&gt;
=== Section 2 ===&lt;br /&gt;
&lt;br /&gt;
=== Section title: ===&lt;br /&gt;
&lt;br /&gt;
Non-parametric Statistics&lt;br /&gt;
&lt;br /&gt;
=== Topics covered in this section: ===&lt;br /&gt;
&lt;br /&gt;
* Empirical CDF. Resampling. Jackknife and Bootstrap&lt;br /&gt;
* Density Estimation&lt;br /&gt;
* Estimation and Non-parametric Tests. KS Test&lt;br /&gt;
* Non-parametric Tests. Kruskal-Wallis Test. Multi-arm Bandits&lt;br /&gt;
&lt;br /&gt;
=== What forms of evaluation were used to test students’ performance in this section? ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;tabular&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span&amp;gt;|a|c|&amp;lt;/span&amp;gt; &amp;amp;amp; '''Yes/No'''&amp;lt;br /&amp;gt;&lt;br /&gt;
Development of individual parts of software product code &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Homework and group projects &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Midterm evaluation &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Testing (written or computer based) &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Reports &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Essays &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Oral polls &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Discussions &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
=== Typical questions for ongoing performance evaluation within this section ===&lt;br /&gt;
&lt;br /&gt;
# What is Empirical CDF?&lt;br /&gt;
# How to apply resampling? Jackknife and Bootstrap?&lt;br /&gt;
# What is Kernel Density Estimation?&lt;br /&gt;
# What is Smoothing?&lt;br /&gt;
&lt;br /&gt;
=== Typical questions for seminar classes (labs) within this section ===&lt;br /&gt;
&lt;br /&gt;
# Implement Kernel Density Estimation.&lt;br /&gt;
# Apply KS-test.&lt;br /&gt;
# Apply Kruskal-Wallis Test&lt;br /&gt;
# Implement Multi-arm Bandits&lt;br /&gt;
&lt;br /&gt;
=== Test questions for final assessment in this section ===&lt;br /&gt;
&lt;br /&gt;
# What is epsilon-greedy algorithm?&lt;br /&gt;
# Perform 1 sample KS test in Python and Scipy. Compare KS test to visual approaches for checking normality assumptions&lt;br /&gt;
# Plot CDF and ECDF to visualize parametric and empirical cumulative distribution functions&lt;br /&gt;
&lt;br /&gt;
=== Section 3 ===&lt;br /&gt;
&lt;br /&gt;
=== Section title: ===&lt;br /&gt;
&lt;br /&gt;
Sampling and Simulation&lt;br /&gt;
&lt;br /&gt;
=== Topics covered in this section: ===&lt;br /&gt;
&lt;br /&gt;
* Sampling. Metropolis-Hastings.&lt;br /&gt;
* Rejection Sampling. Gibbs Sampling&lt;br /&gt;
* Thompson Sampling. Upper confidence bound&lt;br /&gt;
* Markov Chains. MCMC&lt;br /&gt;
* Time Series: Tools and Applications&lt;br /&gt;
&lt;br /&gt;
=== What forms of evaluation were used to test students’ performance in this section? ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;tabular&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span&amp;gt;|a|c|&amp;lt;/span&amp;gt; &amp;amp;amp; '''Yes/No'''&amp;lt;br /&amp;gt;&lt;br /&gt;
Development of individual parts of software product code &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Homework and group projects &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
Midterm evaluation &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Testing (written or computer based) &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Reports &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Essays &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Oral polls &amp;amp;amp; 0&amp;lt;br /&amp;gt;&lt;br /&gt;
Discussions &amp;amp;amp; 1&amp;lt;br /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
=== Typical questions for ongoing performance evaluation within this section ===&lt;br /&gt;
&lt;br /&gt;
# What is Thompson Sampling?&lt;br /&gt;
# What Upper confidence bound algorithm?&lt;br /&gt;
# What is stationary distribution?&lt;br /&gt;
&lt;br /&gt;
=== Typical questions for seminar classes (labs) within this section ===&lt;br /&gt;
&lt;br /&gt;
# Given density function, implement Accept-Reject sampling&lt;br /&gt;
# Run Metropolis Hastings and Accept-Reject (on the same f(x)) (n=1000, 10000, 100000). Compare results&lt;br /&gt;
# Apply Gibbs Sampling&lt;br /&gt;
# Apply tools for time series analysis and prediction&lt;br /&gt;
&lt;br /&gt;
=== Test questions for final assessment in this section ===&lt;br /&gt;
&lt;br /&gt;
# Consider a transition matrix of a Markov Chain (MC).&lt;br /&gt;
#* Show that this is a regular MC (A MC is called regular if for some integer n all entries of transition matrix after n steps are strictly positive).&lt;br /&gt;
#* Find the limiting probability vector w&lt;br /&gt;
# Compare Gibbs Sampling to Metropolis Hastings.&lt;/div&gt;</summary>
		<author><name>M.petrishchev</name></author>
	</entry>
</feed>