IU:TestPage
Revision as of 15:36, 9 November 2022 by R.sirgalina (talk | contribs)
Information Retrieval
- Course name: Information Retrieval
- Code discipline: XYZ
- Subject area: Data Science; Computer systems organization; Information systems; Real-time systems; Information retrieval; World Wide Web
Short Description
This course covers the following concepts: Indexing; Relevance; Ranking; Information retrieval; Query.
Prerequisites
Prerequisite subjects
- CSE204 — Analytic Geometry And Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
- CSE113 — Philosophy I - (Discrete Math and Logic): graphs, trees, binary trees, balanced trees, metric (proximity) graphs, diameter, clique, path, shortest path.
- CSE206 — Probability And Statistics: probability, likelihood, conditional probability, Bayesian rule, stochastic matrix and properties. Analysis: DFT, [discrete] gradient.
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Information retrieval basics |
|
Text processing and indexing |
|
Vector model and vector indexing |
|
Advanced topics. Media processing |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Search engine and recommender system essential parts,
- Quality metrics of information retrieval systems,
- Contemporary approaches to semantic data analysis,
- Indexing strategies.
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- How to design a recommender system from scratch,
- How to evaluate quality of a particular information retrieval system,
- Core ideas and system implementation and maintenance,
- How to identify and fix information retrieval system problems.
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- Implement a proper index for an unstructured dataset,
- Plan quality measures for a new recommender service,
- Run initial data analysis and problem evaluation for a business task, related to information retrieval.
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 84-100 | - |
B. Good | 72-83 | - |
C. Satisfactory | 60-71 | - |
D. Poor | 0-59 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Labs/seminar classes | 35 |
Interim performance assessment | 70 |
Exams | 0 |
Recommendations for students on how to succeed in the course
Resources, literature and reference materials
Open access resources
- Manning, Raghavan, Schütze, An Introduction to Information Retrieval, 2008, Cambridge University Press
- Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 2011, Addison-Wesley
- Buttcher, Clarke, Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010, MIT Press
- Course repository in github.
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 | Section 2 | Section 3 | Section 4 |
---|---|---|---|---|
Development of individual parts of software product code | 1 | 1 | 1 | 1 |
Homework and group projects | 1 | 1 | 1 | 1 |
Testing (written or computer based) | 1 | 1 | 1 | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Enumerate limitations for web crawling. | 1 |
Question | Propose a strategy for A/B testing. | 1 |
Question | Propose recommender quality metric. | 1 |
Question | Implement DCG metric. | 1 |
Question | Discuss relevance metric. | 1 |
Question | Crawl website with respect to robots.txt. | 1 |
Question | What is typical IR system architecture? | 0 |
Question | Show how to parse a dynamic web page. | 0 |
Question | Provide a framework to accept/reject A/B testing results. | 0 |
Question | Compute DCG for an example query for random search engine. | 0 |
Question | Implement a metric for a recommender system. | 0 |
Question | Implement pFound. | 0 |
Section 2
Activity Type | Content | Is Graded? |
---|---|---|
Question | Build inverted index for a text. | 1 |
Question | Tokenize a text. | 1 |
Question | Implement simple spellchecker. | 1 |
Question | Implement wildcard search. | 1 |
Question | Build inverted index for a set of web pages. | 0 |
Question | build a distribution of stems/lexemes for a text. | 0 |
Question | Choose and implement case-insensitive index for a given text collection. | 0 |
Question | Choose and implement semantic vector-based index for a given text collection. | 0 |
Section 3
Activity Type | Content | Is Graded? |
---|---|---|
Question | Embed the text with an ML model. | 1 |
Question | Build term-document matrix. | 1 |
Question | Build semantic index for a dataset using Annoy. | 1 |
Question | Build kd-tree index for a given dataset. | 1 |
Question | Why kd-trees work badly in 100-dimensional environment? | 1 |
Question | What is the difference between metric space and vector space? | 1 |
Question | Choose and implement persistent index for a given text collection. | 0 |
Question | Visualize a dataset for text classification. | 0 |
Question | Build (H)NSW index for a dataset. | 0 |
Question | Compare HNSW to Annoy index. | 0 |
Question | What are metric space index structures you know? | 0 |
Section 4
Activity Type | Content | Is Graded? |
---|---|---|
Question | Extract semantic information from images. | 1 |
Question | Build an image hash. | 1 |
Question | Build a spectral representation of a song. | 1 |
Question | Whats is relevance feedback? | 1 |
Question | Build a "search by color" feature. | 0 |
Question | Extract scenes from video. | 0 |
Question | Write a voice-controlled search. | 0 |
Question | Semantic search within unlabelled image dataset. | 0 |
Final assessment
Section 1
- Implement text crawler for a news site.
- What is SBS (side-by-side) and how is it used in search engines?
- Compare pFound with CTR and with DCG.
- Explain how A/B testing works.
- Describe PageRank algorithm.
Section 2
- Explain how (and why) KD-trees work.
- What are weak places of inverted index?
- Compare different text vectorization approaches.
- Compare tolerant retrieval to spellchecking.
Section 3
- Compare inverted index to HNSW in terms of speed, memory consumption?
- Choose the best index for a given dataset.
- Implement range search in KD-tree.
Section 4
- What are the approaches to image understanding?
- How to cluster a video into scenes and shots?
- How speech-to-text technology works?
The retake exam
Section 1
Section 2
Section 3
Section 4