PhD Theses

Performance prediction and evaluation in Recommender Systems: an Information Retrieval perspective

Alejandro Bellogín, November 2012

Abstract

Personalised recommender systems aim to help users access and retrieve relevant items from large collections, by automatically identifying products or services of likely interest based on observed evidence of the users’ preferences. For many reasons, user preferences are difficult to guess, and therefore recommender systems have a considerable variance in their success ratio in estimating the user’s tastes. In such a scenario, self-predicting the chances that a recommendation is accurate before actually submitting it to a user becomes an interesting capability from many perspectives. Performance prediction has been studied in the context of search engines in the Information Retrieval field, but there is little research of this problem in the recommendation domain. This thesis investigates the definition and formalisation of performance prediction methods for recommender systems. Specifically, we study adaptations of search performance predictors from the Information Retrieval field, and propose new predictors drawing from Information Theory and Social Graph Theory. We show the instantiation of information-theoretical performance prediction methods on both rating and access log data, and the application of social-based predictors to social network structures.

Recommendation performance prediction is a relevant problem per se, because of its potential application to many uses. We primarily evaluate the quality of the proposed solutions in terms of the correlation between the predicted and the observed performance on test data. Given that the evaluation of recommender systems is an open area to a significant extent, the thesis addresses the evaluation methodology as a part of the researched problem. We analyse how the variations in the evaluation procedure may alter the apparent behaviour of performance predictors, and we propose approaches to avoid misleading observations. In addition to the stand-alone assessment of the proposed predictors, we research the use of the predictive capability in the context of the dynamic adjustment of hybrid methods combining several recommenders. We research approaches where the combination leans towards the algorithm that is predicted to perform best in each case, aiming to enhance the performance of the resulting hybrid configuration. The thesis reports positive empirical evidence confirming both a significant predictive power for the proposed methods in different experiments, and consistent improvements in the performance of dynamic hybrid recommenders employing the proposed predictors.

Full text


Exploiting the Conceptual Space in Hybrid Recommender Systems: a Semantic-based Approach

Iván Cantador, November 2008

Abstract

The ever-increasing volume and complexity of information flowing into our daily lives challenge the limits of human processing capabilities in a wide array of information seeking and e-commerce activities. In this context, users need help to cope with this wealth of information, in order to reach the most interesting products, while still getting novelty, surprise and relevance. Recommender systems suggest users products or services they may be interested in, by taking into account or predicting their tastes, priorities or goals. For that purpose, user profiles or usage data are compared with some reference characteristics, which may belong to the information objects (content-based approach), or to other users in the same environment (collaborative filtering approach). Inspired by Information Retrieval and Machine Learning techniques, both approaches are based on statistical or heuristic models that attempt to capture the correlations between users and objects. Commercial applications like Amazon online store (www.amazon.com), Google News (news.google.com) or YouTube (www.youtube.com), are examples of significant success stories of recommendation techniques. However, several limitations of the current recommender systems remain, such as the sparsity of user preference and item content feature spaces, the difficulty of recommending items to users with few preferences declared, or the lack of flexibility to incorporate contextual factors into the recommendation methods.

Some of these limitations can be related to a limited understanding and exploitation of the semantics underlying both user profiles and item descriptions. In this respect, an enhancement of the semantic knowledge, and its representation, describing interests and contents, is envisioned as a potential direction to deal with those limitations. This thesis explores the development of an ontology-based knowledge model to link the (explicit and implicit) meanings involved in user interests and resource contents. Upon this knowledge representation, several content-based and collaborative recommendation models are proposed and evaluated. The proposed model supports contextual techniques to extend the reach of recommendation and improve their accuracy. A refinement of the collaborative filtering space by semantic layers is proposed to find focused similarities, which enable further and more accurate recommendations.

Full text (pdf)


Personalized Information Retrieval in Context by Exploiting Semantic Knowledge and Implicit User Feedback

David Vallet Weadon, September 2008

Abstract

Personalization in information retrieval aims at improving the user’s experience by incorporating the user subjectivity into the retrieval methods and models. The exploitation of implicit user interests and preferences has been identified as an important direction to enhance current mainstream retrieval technologies and anticipate future limitations as worldwide content keeps growing, and user expectations keep rising. Without requiring further efforts from users, personalization aims to compensate the limitations of user need representation formalisms (such as the dominant keyword-based or document-based) and help handle the scale of search spaces and answer sets, under which a user query alone is often not enough information for the system to provide effective results. However, the general set of user interests that a retrieval system can learn over a period of time, and bring to bear in a specific retrieval session, can be fairly vast, diverse, and to a large extent unrelated to a particular user search in process. This means that even on the basis of correctly learned user preferences, the system could make wrong guesses or get intrusive. Rather than introducing all user preferences en bloc, an optimum search adaptation could be achieved if the personalization system was able to select only those preferences which are pertinent to the ongoing user actions. In other words, although personalization alone is a key aspect of modern retrieval systems, it is the application of context awareness into personalization what can really produce a step forward in future retrieval applications.

Context modeling has been long acknowledged as a key aspect in a wide variety of problem domains, among which Information Retrieval is a prominent one. In this work, we focus on the representation of live retrieval user contexts, based on implicit feedback techniques. The particular notion of context considered in this thesis is defined as the set of themes under which retrieval user activities occur within a unit of time. Our proposal of contextualized personalization is based on the semantic relation between the user profile and the user context. Only those preferences related to the current context should be used, disregarding those that are out of context. The use of semantic-driven representations of the domain of discourse, as a common, enriched representational ground for content meaning, user interests, and contextual conditions, is proposed as a key enabler of effective means for a) a rich user model representation, b) context acquisition at runtime and, most importantly, c) the discovery of semantic connections between the context and concepts of user interest, in order to filter those preferences that have chances to be intrusive within the current course of user activities.

Full text (pdf)

Comments are closed.