lectures

====== Lectures ====== In alphabetical order. ^Cross-Language Information Retrieval ^ Douglas W. Oard | |In this session, I will explain what we know about how to build information retrieval systems that can be used to find information in languages that the users is not able understand. I’ll start by explaining what we know about how to rank documents written in one language based on a query that is expressed in another language. To illustrate this, I will decompose the problem into three parts: (1) what to translate, (2) what translations are possible, and (3) how to use those translations in the ranking process. Along the way I will draw on evaluation results obtained using standard test collections to illustrate the effect of alternative design choices. I’ll then turn my attention to how users could make effective use of systems that rank documents that, by assumption, those users cannot understand without the help of translation technology. I’ll start this discussion by briefly reviewing the present state of the art for the design of machine translation systems. I will then focus on two types of user studies: (1) controlled quantitative studies using highly structured tasks, and (2) qualitative observational studies using a more natural range of tasks. I’ll conclude the session with some remarks about where the research frontier is today, and how future developments in related fields might help to create new opportunities.|| ^Crowdsourcing for Information retrieval (Short Lecture) ^ Gareth Jones | |Crowdsouring is a form of human computation.in which people undertake tasks that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system enlists a crowd of humans to help solve a problem. The availability of crowdsourcing services is now making human computation easily available to the research community. There is currently significant interest in the exploring the use of crowdsourcing services to support existing research activities in information and data processing technologies; Interest in crowdsourcing information retrieval has largely focussed on the use of crowdsourcing for relevance assessment in the creation of test collections. However, it is also being explored for tasks such as the creation of search queries for test collections, and more generally to understand potential human information needs. This presentation will introduce the topic of crowdsourcing including the issues of recruitment, management and payment of crowdsource workers, and the design of tasks which can be assigned to workers to support information retrieval research.|| ^E-Discovery ^ Douglas W. Oard | |Civil litigation relies on each side making relevant evidence available to the other, a process known in the USA as "discovery" and in many other places as “disclosure”. The explosive growth of information in digital form has led to an increasing focus on how search technology can best be applied to balance costs and responsiveness in what has come to be known as “e-discovery”. This is now a multi-billion dollar business, one in which new vendors are entering the market frequently, usually with impressive claims about the efficacy of their products or services. Courts, attorneys, and companies are actively looking to understand what should constitute best practice, both in the design of search technology and in how that technology is employed. In this session I will provide an overview of the e-discovery process, and then I will use that background to motivate a discussion of which aspects of that process the TREC Legal Track sought to model and what we learned. I will spend much of the session describing two novel aspects of evaluation design: (1) recall-focused evaluation in large collections, and (2) modeling an interactive process for "responsive review" with fairly high fidelity. I’ll conclude the session by describing some open research questions.|| ^Evaluation ^ Mark Sanderson | |The lecture will cover the means used to evaluate IR systems. The talk will include descriptions of test collections and the evaluation measures used. This task has proven to be challenging, as it turns out that IR systems and users are harder to model and measure than was perhaps once thought. By the end of the presentation, attendees will understand the methods of gathering feedback from users that has applications beyond IR.|| ^Introduction to Information Retrieval (Short Lecture) ^ Mark Sanderson | |Information Retrieval (IR) is the study of finding relevant information in unstructured collections of material based on a poorly specified query. In this lecture, I will describe the early history of IR, the range of approaches used by search engines to retrieve relevant material. These approaches allow the modern search engines, which we are all familiar with, to be astonishingly accurate in locating what we seek. Here, I will introduce text processing, link analysis, retrieval models, and extraction of data from interaction logs. By the end of the presentation, attendees will understand the key principles that underlie the algorithms of the major search engines.|| ^Modern Information Retrieval Models ^ Iadh Ounis | |The development of effective information retrieval (IR) models has been at the forefront of IR research in the past three decades. Essentially, a retrieval model specifies how documents and queries should be represented and how relevance should be decided. IR models have undergone several major paradigm shifts over the history of the field, moving from simple but theoretically founded models to more complex machine-learned ranking models, through sophisticated probabilistic and field-based models. In this lecture, after briefly surveying classical information retrieval models such as the vector space model, the emphasis will be on the most recent and effective approaches such as language models, divergence from randomness models, and the learning-to-rank paradigm. Finally, going beyond the document relevance independence assumption inherent within most probabilistic models, the lecture will also cover recent developments in models for search results diversification.|| ^Mulitmedia Information Retrieval ^ Gareth Jones | |Digital information is increasingly available in multimedia and multimodal forms incorporating various combinations of audio-visual media, including images, video, speech, music, and textual metadata. Information retrieval for multimedia content poses many challenges beyond those of text retrieval. These include the need to analyse discover the content to enable the material to be indexing for searching, and the requirement to develop interactive user interfaces to enable effective navigation and audition of retrieved content. The presentation will introduce the key issues of multimedia information retrieval inlcuding recogntion and indexing of audio and visual content, exploitation of textual metadata in multimedia information retrieval, and design of user interfaces for interaction in multimedia information retrieval.|| ^User-Oriented Factors in Information Retrieval ^ Kalervo Järvelin | |Discussion of theories in IR tends to focus on mathematical models (e.g., ‘language models)’, on the one hand, or on user-oriented models (e.g., ‘cognitive models’), on the other. The primary interest often is to explain the quality of the ranked search result list, measured in some way. Neither broad approach provides much predictive power that one would expect from theories of IR even in predicting the quality of the ranked list. The retrieval models in IR specify the interactions of query and document representations for result ranking but not retrieval effectiveness. User-oriented models indicate factors affecting retrieval effectiveness, but their effects and interactions are very complex to study systematically. There is a dilemma between realism, control of factors and enough data. The present lecture approaches theory of IR through from two directions: task-based IR modeling and evaluation, and simulation of IR interaction. We aim at interaction models, which take the concurrent use of multiple information systems into account in task context. Regarding simulation, we aim at theories that predict the effects of human searcher oriented factors on interactive retrieval. Such factors include search strategies, search goals and cost constraints, scanning and assessment behavior, and relevance scoring. In simulation, it is possible to design and run experiments, where a number of factors, such as the test collection and its topics, the interface characteristics, and the search engine with its underlying retrieval model are held constant, and the user-oriented factors are systematically varied. This helps in developing social theories of Information Retrieval Interaction.|| ^Web Search: Information Retrieval in Practice ^ Fernando Diaz | |Although core information retrieval techniques have existed since the 1960s, designing for web search introduces unique ranking signals and evaluation methodologies. In this lecture, we will be studying how classic information retrieval ranking methods are modified or augmented for this unique corpus. We will also be studying evaluation methodologies used in production web search engines. ||