Difference between revisions of "IU:TestPage"

From IU
Jump to navigation Jump to search
Line 1: Line 1:
   
  +
= Information Retrieval =
= IT Product Development =
 
* '''Course name''': IT Product Development
+
* '''Course name''': Information Retrieval
* '''Code discipline''': CSE807
+
* '''Code discipline''': XYZ
  +
* '''Subject area''': Data Science; Computer systems organization; Information systems; Real-time systems; Information retrieval; World Wide Web
* '''Subject area''': Software Engineering
 
   
 
== Short Description ==
 
== Short Description ==
Line 11: Line 11:
   
 
=== Prerequisite subjects ===
 
=== Prerequisite subjects ===
  +
* CSE204 — Analytic Geometry And Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
 
  +
* CSE113 — Philosophy I - (Discrete Math and Logic): graphs, trees, binary trees, balanced trees, metric (proximity) graphs, diameter, clique, path, shortest path.
   
 
=== Prerequisite topics ===
 
=== Prerequisite topics ===
Line 22: Line 23:
 
! Section !! Topics within the section
 
! Section !! Topics within the section
 
|-
 
|-
| From idea to MVP ||
+
| Information retrieval basics ||
# Introduction to Product Development
+
# Introduction to IR, major concepts.
  +
# Crawling and Web.
# Exploring the domain: User Research and Customer Conversations
 
  +
# Quality assessment.
# Documenting Requirements: MVP and App Features
 
  +
|-
# Prototyping and usability testing
 
  +
| Text processing and indexing ||
  +
# Building inverted index for text documents. Boolean retrieval model.
  +
# Language, tokenization, stemming, searching, scoring.
  +
# Spellchecking and wildcard search.
  +
# Suggest and query expansion.
  +
# Language modelling. Topic modelling.
 
|-
 
|-
| Development and Launch ||
+
| Vector model and vector indexing ||
  +
# Vector model
# Product backlog and iterative development
 
  +
# Machine learning for vector embedding
# Estimation Techniques, Acceptance Criteria, and Definition of Done
 
  +
# Vector-based index structures
# UX/UI Design
 
# Software Engineering vs Product Management
 
 
|-
 
|-
  +
| Advanced topics. Media processing ||
| Hypothesis-driven development ||
 
  +
# Image and video processing, understanding and indexing
# Hypothesis-driven product development
 
  +
# Content-based image retrieval
# Measuring a product
 
  +
# Audio retrieval
# Controlled Experiments and A/B testing
 
  +
# Hum to search
  +
# Relevance feedback
 
|}
 
|}
 
== Intended Learning Outcomes (ILOs) ==
 
== Intended Learning Outcomes (ILOs) ==
Line 48: Line 56:
 
==== Level 1: What concepts should a student know/remember/explain? ====
 
==== Level 1: What concepts should a student know/remember/explain? ====
 
By the end of the course, the students should be able to ...
 
By the end of the course, the students should be able to ...
  +
* Search engine and recommender system essential parts,
* Explain what are the main principles for building an effective customer conversation
 
  +
* Quality metrics of information retrieval systems,
* Describe various classification of prototypes and where each one is applied
 
  +
* Contemporary approaches to semantic data analysis,
* State the characteristics of a DEEP product backlog
 
  +
* Indexing strategies.
* Elaborate on the main principles of an effective UI/UX product design (hierarchy, navigation, color, discoverability, understandability)
 
* List the key commonalities and differences between the mentality of a software engineer and a product manager
 
* Explain what is hypothesis-driven development
 
* Describe the important aspects and elements of a controlled experiment
 
   
 
==== Level 2: What basic practical skills should a student be able to perform? ====
 
==== Level 2: What basic practical skills should a student be able to perform? ====
 
By the end of the course, the students should be able to ...
 
By the end of the course, the students should be able to ...
  +
* How to design a recommender system from scratch,
* Design effective customer conversations
 
  +
* How to evaluate quality of a particular information retrieval system,
* Prototype UI, design and conduct usability tests
 
  +
* Core ideas and system implementation and maintenance,
* Prototype user interface
 
  +
* How to identify and fix information retrieval system problems.
* Design and conduct usability testing
 
* Populate and groom a product backlog
 
* Conduct Sprint Planning and Review
 
* Choose product metrics and apply GQM
 
* Integrate a third-party Analytics tools
 
* Design, run and conclude Controlled experiments
 
   
 
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ====
 
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ====
 
By the end of the course, the students should be able to ...
 
By the end of the course, the students should be able to ...
  +
* Implement a proper index for an unstructured dataset,
 
  +
* Plan quality measures for a new recommender service,
  +
* Run initial data analysis and problem evaluation for a business task, related to information retrieval.
 
== Grading ==
 
== Grading ==
   
Line 79: Line 81:
 
! Grade !! Range !! Description of performance
 
! Grade !! Range !! Description of performance
 
|-
 
|-
| A. Excellent || 90-100 || -
+
| A. Excellent || 84-100 || -
 
|-
 
|-
| B. Good || 75-89 || -
+
| B. Good || 72-83 || -
 
|-
 
|-
| C. Satisfactory || 60-74 || -
+
| C. Satisfactory || 60-71 || -
 
|-
 
|-
| D. Fail || 0-59 || -
+
| D. Poor || 0-59 || -
 
|}
 
|}
   
Line 94: Line 96:
 
! Activity Type !! Percentage of the overall course grade
 
! Activity Type !! Percentage of the overall course grade
 
|-
 
|-
| Assignment || 50
+
| Labs/seminar classes || 35
 
|-
 
|-
  +
| Interim performance assessment || 70
| Quizzes || 15
 
 
|-
 
|-
| Peer review || 15
+
| Exams || 0
|-
 
| Demo day || 20
 
 
|}
 
|}
   
Line 109: Line 109:
   
 
=== Open access resources ===
 
=== Open access resources ===
  +
* Manning, Raghavan, Schütze, An Introduction to Information Retrieval, 2008, Cambridge University Press
 
  +
* Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 2011, Addison-Wesley
  +
* Buttcher, Clarke, Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010, MIT Press
   
 
=== Closed access resources ===
 
=== Closed access resources ===
Line 115: Line 117:
   
 
=== Software and tools used within the course ===
 
=== Software and tools used within the course ===
  +
* Firebase Analytics and A/B Testing,
 
 
= Teaching Methodology: Methods, techniques, & activities =
 
= Teaching Methodology: Methods, techniques, & activities =
   
 
== Activities and Teaching Methods ==
 
== Activities and Teaching Methods ==
 
{| class="wikitable"
 
{| class="wikitable"
|+ Teaching and Learning Methods within each section
+
|+ Activities within each section
 
|-
 
|-
! Teaching Techniques !! Section 1 !! Section 2 !! Section 3
+
! Learning Activities !! Section 1 !! Section 2 !! Section 3 !! Section 4
 
|-
 
|-
| Problem-based learning (students learn by solving open-ended problems without a strictly-defined solution) || 1 || 1 || 1
+
| Development of individual parts of software product code || 1 || 1 || 1 || 1
 
|-
 
|-
| Project-based learning (students work on a project) || 1 || 1 || 1
+
| Homework and group projects || 1 || 1 || 1 || 1
 
|-
 
|-
  +
| Testing (written or computer based) || 1 || 1 || 1 || 1
| Differentiated learning (provide tasks and activities at several levels of difficulty to fit students needs and level) || 1 || 1 || 1
 
|-
+
|}
  +
== Formative Assessment and Course Activities ==
| развивающего обучения (задания и материал "прокачивают" ещё нераскрытые возможности студентов); || 1 || 1 || 1
 
  +
|-
 
  +
=== Ongoing performance assessment ===
| концентрированного обучения (занятия по одной большой теме логически объединяются); || 1 || 1 || 1
 
  +
|-
 
  +
==== Section 1 ====
| inquiry-based learning || 1 || 1 || 1
 
|}
 
 
{| class="wikitable"
 
{| class="wikitable"
  +
|+
|+ Activities within each section
 
 
|-
 
|-
  +
! Activity Type !! Content !! Is Graded?
! Learning Activities !! Section 1 !! Section 2 !! Section 3
 
 
|-
 
|-
| Lectures || 1 || 1 || 1
+
| Question || Enumerate limitations for web crawling. || 1
 
|-
 
|-
  +
| Question || Propose a strategy for A/B testing. || 1
| Interactive Lectures || 1 || 1 || 1
 
 
|-
 
|-
| Lab exercises || 1 || 1 || 1
+
| Question || Propose recommender quality metric. || 1
 
|-
 
|-
  +
| Question || Implement DCG metric. || 1
| Development of individual parts of software product code || 1 || 1 || 1
 
 
|-
 
|-
| Group projects || 1 || 1 || 1
+
| Question || Discuss relevance metric. || 1
 
|-
 
|-
  +
| Question || Crawl website with respect to robots.txt. || 1
| Quizzes (written or computer based) || 1 || 1 || 1
 
 
|-
 
|-
  +
| Question || What is typical IR system architecture? || 0
| Peer Review || 1 || 1 || 1
 
 
|-
 
|-
| Discussions || 1 || 1 || 1
+
| Question || Show how to parse a dynamic web page. || 0
 
|-
 
|-
  +
| Question || Provide a framework to accept/reject A/B testing results. || 0
| Presentations by students || 1 || 1 || 1
 
 
|-
 
|-
  +
| Question || Compute DCG for an example query for random search engine. || 0
| Written reports || 1 || 1 || 1
 
 
|-
 
|-
  +
| Question || Implement a metric for a recommender system. || 0
| Experiments || 0 || 0 || 1
 
  +
|-
  +
| Question || Implement pFound. || 0
 
|}
 
|}
  +
==== Section 2 ====
== Formative Assessment and Course Activities ==
 
 
=== Ongoing performance assessment ===
 
 
==== Section 1 ====
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
Line 173: Line 172:
 
! Activity Type !! Content !! Is Graded?
 
! Activity Type !! Content !! Is Graded?
 
|-
 
|-
  +
| Question || Build inverted index for a text. || 1
| Quiz || 1. What is a product? What are the techniques for describing a product idea in a clear concise manner? <br> 2. What user research techniques do you know? In what situations are they applied? <br> 3. What are the key customer conversation principles according to the Mom Test technique? Bring an example of bad and good questions to ask. <br> 4. What are the 4 phases of the requirements engineering process? <br> 5. How do we document requirements? What techniques do you know? || 1
 
 
|-
 
|-
  +
| Question || Tokenize a text. || 1
| Presentation || Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 0
 
 
|-
 
|-
  +
| Question || Implement simple spellchecker. || 1
| Individual Assignments || A1: Product Ideation and Market Research <br> Formulate 3 project ideas in the following format: <br> X helps Y to do Z – where X is your product’s name, Y is the target user, and Z is what user activity product help with. <br> <br> Submit Link to Screenshot board and Feature Analysis Table: <br> - Pick and explore 5 apps similar to your idea <br> - Take screenshots along the way and collect them on a board. <br> - Make a qualitative analysis table for app features. <br> <br> Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 1
 
 
|-
 
|-
  +
| Question || Implement wildcard search. || 1
| Group Project Work || A2: Forming Teams and Identifying Stakeholders <br> Students are distributed into teams. <br> Meet your team <br> Discuss the idea <br> Agree on the roles <br> Setup task tracker (Trello or similar) <br> Identify 3-5 stakeholders and how to approach them <br> Compose a set of 5 most important questions you would ask from each stakeholder when interviewing them <br> <br> Submit <br> A pdf with the idea description, roles distribution among the team, identified stakeholders, ways to approach them, a set of questions for each stakeholder. <br> An invite link to join your task tracker <br> <br> A3: Domain Exploration and Requirements <br> User Research Process: <br> Compose the questionnaire for each stakeholder type. <br> Talk to 5-7 stakeholders. <br> Keep updating the questionnaire throughout the process <br> Compose an interview results table <br> Produce personas <br> Summarize most important learning points <br> Describe features your MVP will have (use case diagram + user story mapping) <br> <br> Submit a pdf report with: <br> Personas + corresponding questionnaires <br> Interview results table (can provide a link to spreadsheet, make sure to open access) <br> Learning points summary <br> MVP features. <br> <br> Optional: <br> Start implementation of the functionality you are certain about. <br> <br> Assignment 4. UI design, Prototyping, MVP, and Usability Testing <br> Break down MVP features into phases and cut down the specification to implement MVP V1 <br> Produce low and high fidelity designs for your product. <br> Review the phases breakdown. <br> Follow either the Prototyping or MVP path to complete the assignment. <br> <br> Prototyping path: <br> Make a clickable prototype with Figma or a similar tool <br> Make 5-10 offline stakeholders use your prototype, observe them and gather feedback <br> Embed your prototype into an online usability testing tool (e.g. Maze). <br> Run an online usability test with 5-10 online stakeholders. <br> Summarize key learning points <br> <br> MVP path: <br> Review your MVP phases. <br> Build MVP V1 <br> Make 5-10 offline stakeholders use your MVP, observe them and gather feedback <br> Integrate an online usability testing tool to observe user sessions (e.g. Smartlook). <br> Distribute the MVP to 5-10 online stakeholders and run an online usability test. <br> Summarize key learning points <br> <br> <br> Submit all of the below in one PDF: <br> Link to sketches and designs. <br> Link to your MVP/Clickable prototype. <br> Link to online usability test. <br> Names of people you conducted the tests with and which stakeholder type are they. <br> Key learning points summary. <br> <br> Make sure all links are accessible/viewable. || 1
 
  +
|-
  +
| Question || Build inverted index for a set of web pages. || 0
  +
|-
  +
| Question || build a distribution of stems/lexemes for a text. || 0
  +
|-
  +
| Question || Choose and implement case-insensitive index for a given text collection. || 0
  +
|-
  +
| Question || Choose and implement semantic vector-based index for a given text collection. || 0
 
|}
 
|}
==== Section 2 ====
+
==== Section 3 ====
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
Line 187: Line 194:
 
! Activity Type !! Content !! Is Graded?
 
! Activity Type !! Content !! Is Graded?
 
|-
 
|-
  +
| Question || Embed the text with an ML model. || 1
| Quiz || 1. What does the acronym MVP stand for? What types of MVP do you know of? <br> 2. Define roles, activities, and artefacts of Scrum. What differentiates Scrum from other Agile frameworks, e.g. Kanban? <br> 3. What does DEEP criteria stand for when discussing Product Backlog? Explain each of the aspects with examples. <br> 4. Describe how Scrum activities are performed. Which of them are essential and which of them can vary depending on the product. || 1
 
 
|-
 
|-
  +
| Question || Build term-document matrix. || 1
| Presentation || Prepare a 5-mins presentation describing your: <br> product backlog <br> sprint results <br> MVP-launch plan <br> Each team will present at the class. The assessment will be based on the presentation delivery, reasoning for decision making and asking questions and providing suggestions for other teams. || 0
 
 
|-
 
|-
  +
| Question || Build semantic index for a dataset using Annoy. || 1
| Group Project Work || Assignment 5. Launching an MVP <br> 1. Populate and groom product backlog: <br> Comply with the DEEP criteria. <br> 2. Run two one-week sprints: <br> Conduct two Sprint plannings, i.e. pick the tasks for Sprint Backlog. <br> Conduct two Sprint reviews <br> Run one Sprint Retrospective <br> 3. Make a launch plan and release: <br> You need to launch in the following two weeks. <br> Decide what functionality will go into the release. <br> Release your first version in Google Play. <br> Hint: Focus on a small set of features solving a specific problem for a specific user, i.e. MVP. <br> 4. Prepare a 5-mins presentation describing your: <br> product backlog <br> sprint results <br> MVP-launch plan. <br> Demo for your launched MVP. <br> Each team will present at the class. The assessment will be based on the presentation delivery, reasoning for decision making and asking questions and providing suggestions for other teams. <br> 5. Submit a PDF with: <br> Backlogs and Launch plan <br> Link to the launched product <br> Assignment 6. AC, DoD and Midterm Presentation <br> 1. Produce acceptance criteria for 3-5 most important user stories in your product. <br> 2. Produce definition of done checklist <br> 3. Estimate the items in your product backlog <br> 4. Prepare a midterm presentation for 10-mins in which you cover: <br> The problem you are trying to solve <br> Your users and customers (personas) <br> Your solution and it's core value proposition <br> Current state of your product <br> Clear plan for the upcoming weeks <br> Your team and distribution of responsibilities <br> Demo <br> Retrospective and learning points <br> Link to your app <br> <br> Submit a pdf with: <br> Items 1, 2, 3 <br> link to the presentation <br> || 1
 
  +
|-
  +
| Question || Build kd-tree index for a given dataset. || 1
  +
|-
  +
| Question || Why kd-trees work badly in 100-dimensional environment? || 1
  +
|-
  +
| Question || What is the difference between metric space and vector space? || 1
  +
|-
  +
| Question || Choose and implement persistent index for a given text collection. || 0
  +
|-
  +
| Question || Visualize a dataset for text classification. || 0
  +
|-
  +
| Question || Build (H)NSW index for a dataset. || 0
  +
|-
  +
| Question || Compare HNSW to Annoy index. || 0
  +
|-
  +
| Question || What are metric space index structures you know? || 0
 
|}
 
|}
==== Section 3 ====
+
==== Section 4 ====
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
Line 199: Line 222:
 
! Activity Type !! Content !! Is Graded?
 
! Activity Type !! Content !! Is Graded?
 
|-
 
|-
  +
| Question || Extract semantic information from images. || 1
| Quiz || 1. What are common product hypotheses present? How can we formulate them as questions about our UX? <br> 2. Explain what is hypothesis-driven development <br> 3. Describe the important aspects and elements of a controlled experiment || 1
 
 
|-
 
|-
  +
| Question || Build an image hash. || 1
| Presentation || Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 0
 
 
|-
 
|-
  +
| Question || Build a spectral representation of a song. || 1
| Group project work || Assignment 7: Development, Observation, and Product Events. <br> 1. Continue with your development process: <br> - Hold sprint planning and reviews. <br> - Revisit estimations and keep track for velocity calculation. <br> - Host demos and release new versions to your users <br> <br> 2. Observing users: <br> - Integrate a user sessions recording tool into your product <br> - As a team: watch 100 user sessions and outline common user behavior patterns. <br> - Each team member: give product to 3 new people and observe them use it. <br> <br> 3. Product events: <br> Create a product events table. <br> Integrate a free analytics tool that supports events reporting (e.g. Amplitude, MixPanel). <br> <br> Write and submit a report: <br> - describe user behavior patterns (main ways how people use your product). <br> - learning points from the observations <br> - add the events table. <br> - describe which analytics tool you chose and why <br> <br> Assignment 8: GQM, Metrics, and Hypothesis-testing. <br> 1. GQM and Metrics Dashboard <br> - Compose a GQM for your product. <br> - Identify your focus and L1 metrics <br> - Setup an Analytics Dashboard with the metrics you chose. <br> - Add the instructors to your Analytics Dashboard. <br> <br> Hypothesis-testing: <br> - answer clarity and hypotheses: do users understand your product, is it easy for them to get started, and do they return? <br> - suggest product improvements to increase clarity, ease of starting and retention. <br> - based on the suggestions formulate 3 falsifiable hypotheses <br> - design a simple test to check each of them <br> - pick one test that could be conducted by observing your users <br> - conduct the test <br> <br> Submit: <br> - GQM, Focus and L1 Metrics breakdown. <br> - Report on the hypothesis-testing activities <br> - Access link to the dashboard. <br> Assignment 9: Running an A/B test <br> Compose an A/B test: <br> - Design a change in your product <br> - Hypothesis: Clearly state what you expect to improve as the result of the change. <br> - Parameter and Variants: Describe both A and B variants (and other if you have more). <br> - Intended sample size. <br> - OEC: Determine the target metric to run the experiment against. <br> <br> Then do one of the two options: <br> Option 1: Conduct the A/B test using a remote control and A/B testing tool (Firebase, Optimizely or like) <br> <br> Option 2: Do the statistical math yourself <br> Conduct an A/B test and collect data. <br> Do the math manually using the standard Student T-test. <br> <br> Submit a PDF with: <br> - the A/B test description <br> - report on how the experiment went. <br> - either screenshots from the tool or math calculations. || 1
 
  +
|-
  +
| Question || Whats is relevance feedback? || 1
  +
|-
  +
| Question || Build a "search by color" feature. || 0
  +
|-
  +
| Question || Extract scenes from video. || 0
  +
|-
  +
| Question || Write a voice-controlled search. || 0
  +
|-
  +
| Question || Semantic search within unlabelled image dataset. || 0
 
|}
 
|}
 
=== Final assessment ===
 
=== Final assessment ===
 
'''Section 1'''
 
'''Section 1'''
  +
# Implement text crawler for a news site.
 
  +
# What is SBS (side-by-side) and how is it used in search engines?
  +
# Compare pFound with CTR and with DCG.
  +
# Explain how A/B testing works.
  +
# Describe PageRank algorithm.
 
'''Section 2'''
 
'''Section 2'''
  +
# Explain how (and why) KD-trees work.
 
  +
# What are weak places of inverted index?
  +
# Compare different text vectorization approaches.
  +
# Compare tolerant retrieval to spellchecking.
 
'''Section 3'''
 
'''Section 3'''
  +
# Compare inverted index to HNSW in terms of speed, memory consumption?
 
  +
# Choose the best index for a given dataset.
  +
# Implement range search in KD-tree.
  +
'''Section 4'''
  +
# What are the approaches to image understanding?
  +
# How to cluster a video into scenes and shots?
  +
# How speech-to-text technology works?
   
 
=== The retake exam ===
 
=== The retake exam ===
Line 219: Line 265:
   
 
'''Section 3'''
 
'''Section 3'''
  +
  +
'''Section 4'''

Revision as of 14:19, 9 November 2022

Information Retrieval

  • Course name: Information Retrieval
  • Code discipline: XYZ
  • Subject area: Data Science; Computer systems organization; Information systems; Real-time systems; Information retrieval; World Wide Web

Short Description

Prerequisites

Prerequisite subjects

  • CSE204 — Analytic Geometry And Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
  • CSE113 — Philosophy I - (Discrete Math and Logic): graphs, trees, binary trees, balanced trees, metric (proximity) graphs, diameter, clique, path, shortest path.

Prerequisite topics

Course Topics

Course Sections and Topics
Section Topics within the section
Information retrieval basics
  1. Introduction to IR, major concepts.
  2. Crawling and Web.
  3. Quality assessment.
Text processing and indexing
  1. Building inverted index for text documents. Boolean retrieval model.
  2. Language, tokenization, stemming, searching, scoring.
  3. Spellchecking and wildcard search.
  4. Suggest and query expansion.
  5. Language modelling. Topic modelling.
Vector model and vector indexing
  1. Vector model
  2. Machine learning for vector embedding
  3. Vector-based index structures
Advanced topics. Media processing
  1. Image and video processing, understanding and indexing
  2. Content-based image retrieval
  3. Audio retrieval
  4. Hum to search
  5. Relevance feedback

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should be able to ...

  • Search engine and recommender system essential parts,
  • Quality metrics of information retrieval systems,
  • Contemporary approaches to semantic data analysis,
  • Indexing strategies.

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

  • How to design a recommender system from scratch,
  • How to evaluate quality of a particular information retrieval system,
  • Core ideas and system implementation and maintenance,
  • How to identify and fix information retrieval system problems.

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

  • Implement a proper index for an unstructured dataset,
  • Plan quality measures for a new recommender service,
  • Run initial data analysis and problem evaluation for a business task, related to information retrieval.

Grading

Course grading range

Grade Range Description of performance
A. Excellent 84-100 -
B. Good 72-83 -
C. Satisfactory 60-71 -
D. Poor 0-59 -

Course activities and grading breakdown

Activity Type Percentage of the overall course grade
Labs/seminar classes 35
Interim performance assessment 70
Exams 0

Recommendations for students on how to succeed in the course

Resources, literature and reference materials

Open access resources

  • Manning, Raghavan, Schütze, An Introduction to Information Retrieval, 2008, Cambridge University Press
  • Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 2011, Addison-Wesley
  • Buttcher, Clarke, Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010, MIT Press

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities Section 1 Section 2 Section 3 Section 4
Development of individual parts of software product code 1 1 1 1
Homework and group projects 1 1 1 1
Testing (written or computer based) 1 1 1 1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1

Activity Type Content Is Graded?
Question Enumerate limitations for web crawling. 1
Question Propose a strategy for A/B testing. 1
Question Propose recommender quality metric. 1
Question Implement DCG metric. 1
Question Discuss relevance metric. 1
Question Crawl website with respect to robots.txt. 1
Question What is typical IR system architecture? 0
Question Show how to parse a dynamic web page. 0
Question Provide a framework to accept/reject A/B testing results. 0
Question Compute DCG for an example query for random search engine. 0
Question Implement a metric for a recommender system. 0
Question Implement pFound. 0

Section 2

Activity Type Content Is Graded?
Question Build inverted index for a text. 1
Question Tokenize a text. 1
Question Implement simple spellchecker. 1
Question Implement wildcard search. 1
Question Build inverted index for a set of web pages. 0
Question build a distribution of stems/lexemes for a text. 0
Question Choose and implement case-insensitive index for a given text collection. 0
Question Choose and implement semantic vector-based index for a given text collection. 0

Section 3

Activity Type Content Is Graded?
Question Embed the text with an ML model. 1
Question Build term-document matrix. 1
Question Build semantic index for a dataset using Annoy. 1
Question Build kd-tree index for a given dataset. 1
Question Why kd-trees work badly in 100-dimensional environment? 1
Question What is the difference between metric space and vector space? 1
Question Choose and implement persistent index for a given text collection. 0
Question Visualize a dataset for text classification. 0
Question Build (H)NSW index for a dataset. 0
Question Compare HNSW to Annoy index. 0
Question What are metric space index structures you know? 0

Section 4

Activity Type Content Is Graded?
Question Extract semantic information from images. 1
Question Build an image hash. 1
Question Build a spectral representation of a song. 1
Question Whats is relevance feedback? 1
Question Build a "search by color" feature. 0
Question Extract scenes from video. 0
Question Write a voice-controlled search. 0
Question Semantic search within unlabelled image dataset. 0

Final assessment

Section 1

  1. Implement text crawler for a news site.
  2. What is SBS (side-by-side) and how is it used in search engines?
  3. Compare pFound with CTR and with DCG.
  4. Explain how A/B testing works.
  5. Describe PageRank algorithm.

Section 2

  1. Explain how (and why) KD-trees work.
  2. What are weak places of inverted index?
  3. Compare different text vectorization approaches.
  4. Compare tolerant retrieval to spellchecking.

Section 3

  1. Compare inverted index to HNSW in terms of speed, memory consumption?
  2. Choose the best index for a given dataset.
  3. Implement range search in KD-tree.

Section 4

  1. What are the approaches to image understanding?
  2. How to cluster a video into scenes and shots?
  3. How speech-to-text technology works?

The retake exam

Section 1

Section 2

Section 3

Section 4