Advanced Studies Diplomas


Performance prediction in recommender systems: application to the dynamic optimisation of aggregative methods

Alejandro Bellogín, July 2009

Abstract
Performance prediction has gained increasing attention in the Information Retrieval (IR) field since the half of the past decade and has now become an established research topic in the field. Predicting the performance of an IR system, subsystem, module, function, or input, enables an array of dynamic optimisation strategies which select at runtime the option which is predicted to work best in a particular situation, or adjust on the fly its participation as part of a larger system or a hybrid approach. The present work restates the problem in the subarea of Recommender Systems (RS) where it has barely been addressed so far. We research meaningful definitions of performance in the context of RS, and the elements to which it can sensibly apply. We take as a driving direction the application of performance prediction to achieve improvements in specific combination problems in the RS field. We formalise the notion of performance prediction in specific terms within this frame, and we investigate the potential adaptation of performance predictors defined in other areas of IR (mainly query performance in ad hoc retrieval), as well as the definition of new ones based on theories and tools from Information Theory. The proposed methods are tested empirically with positive results, finding four predictors which outperform standard algorithms at all sparsity levels, two of them showing significant correlation with performance measures. 

Full text: PDF



An Ontology-Based Approach to Semantic Awareness in Information Retrieval

Miriam Fernández, July 2007

Abstract
Semantic search has been one of the motivations of the Semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based knowledge bases to improve search over large document repositories. The approach includes an ontology-based scheme for the semi-automatic annotation of documents, and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. Semantic search is combined with keyword-based search to achieve tolerance to knowledge base incompleteness. The proposal is tested with sample experiments showing improvements with respect to keyword-based search, and providing directions for the continuation of the research.

Full text: PDF



Personalized Information Retrieval in Context Using Ontological Knowledge

David Vallet Weadon, June 2007

Abstract
Personalization in Information Retrieval (IR) aims at improving the user’s experience by incorporating the user subjectivity to the retrieval process. The exploitation of implicit user interests and preferences has been identified as an important direction to overcome the potential stagnation of current mainstream retrieval technologies as worldwide content keeps growing, and user expectations keep rising. The general set of user interests that a retrieval system can learn over a period of time, and bring to bear in a specific retrieval session, can be fairly vast, diverse, and to a large extent unrelated to a particular user search in process. Rather than introducing all user preferences en bloc, an optimum search adaptation could be achieved if the personalization system was able to select only those preferences which are pertinent to the ongoing user actions. In other words, an optimal personalization of search results should take into account user interests in the context of the current search. Context modeling has been long acknowledged as a key aspect in a wide variety of problem domains, among which IR is a prominent one. In this work, we focus on the representation and exploitation of both the persistent and the live retrieval user context. We claim that, although personalization alone is a key aspect of modern retrieval systems, it is the conjunction of personalization and context awareness what can really produce a step forward in future retrieval applications. This work is based on the hypothesis that not all user preferences are relevant all the time, and only those that are semantically close to the current context should be used, disregarding those preferences that are out of context. The notion of context considered here is defined as the set of themes under which retrieval user activities occur within a unit of time. The use of ontology-driven representations of the domain of discourse, as a common, enriched representational ground for content meaning, user interests, and contextual conditions, is proposed as a key enabler of effective means for a) a rich user model representation, b) context capture at runtime and c) the analysis of the semantic connections between the context and concepts of user interest, in order to filter those preferences that have chances to be intrusive within the current course of user activities.

Full text: PDF



Semi-automatic Semantic-based Web Service Classification

Miguel Ángel Corella, June 2006

Abstract
With the expectable growth of the number of Web services available on the WWW and service repositories (e.g. UDDI registries), the need for mechanisms that enable the automatic organization and discovery of services becomes increasingly important. Service classification using standard or proprietary taxonomies is a common and simple facility in this context, complementarily to more sophisticated service management retrieval techniques. Nevertheless, service classification in taxonomies as it is performed nowadays presents some issues (e.g. large classification taxonomies, distributed administration tasks, user knowledge, etc.) leading to service missclassification or even no service classification at all. I present here a semi-automatic mechanism, allowing users to obtain a ranked lists of the service categories in which their services will better fit, consisting of a heuristic used to estimate the probability of a service belonging to a service category. In order to enable this estimation, semantic-based service descriptions are used instead of syntactic ones (WSDL) as the conceptual view of the service (i.e. which domain concepts are involved in a service) will provide with better accuracy than the interaction view (i.e. which messages, parts or datatypes are used by the service). The complete design, development and evaluation of this heuristic-based classification mechanism can be found in this work.

Full text: PDF



Agent-Person Interaction in Semantic Web Services - A proposal for a Mediating System

Mariano Rico, September 2004

Abstract
The future world depicted by [Tim Berners-Lee et al, 2001] in their famous Scientific American article talk about intelligent agents, at least very clever, that could talk each other and interact with the human user, providing amazing capabilities and incredible possibilities. Since then, a lot of work has been done, and now we have the tools for starting the creation of such semantic agents. Specially remarkable are the WSMO and IRS initiatives because they provide the first semantic agents with discovering and invocation capabilities. However, none of them deals with the user interaction. Both can talk to traditional Web Services, or even talk each other, but none of them can talk to a human user. I think this interaction is vital because many times, the initial and final clients of these semantic agents are humans. Jacinta is another semantic agent, specialized in human-agent interaction, designed for filling this interactional gap.

Full text: PDF (in Spanish)



Semantic Web Services description - Requirements for automatic location, composition, invocation and interoperation

Rubén Lara, September 2003

Abstract
There is a need for integration of different systems (both data and functionality integration) not only within company boundaries but also across company boundaries in the case of business collaborations between enterprises. Current efforts to achieve integration inside a company (Enterprise Application Integration) or between different companies (eCommerce) are time and money-consuming. The integration strategies used so far are not flexible and scalable enough to make them fully usable and to provide dynamic collaborations between enterprises. Web Services represent an emerging technology to overcome the limitations of current approaches. Nevertheless, Web Services present a very limited support for automation, i.e. what Web Services are used as part of a business process, how they are invoked and how they interact has to be defined at design time. The result is a set of rigid Web Services that cannot reconfigure dynamically to adapt to changes without direct human intervention. Semantic Web Services have the potential of realizing open, dynamic and scalable integration. For this reason, different efforts in the area have arisen with the purpose of making this vision a reality. Nevertheless, these efforts are still far from providing a usable solution to overcome integration problems. A common limitation of these efforts is that they not cover all the requirements to enable such an automation of services. Therefore, defining a complete set of requirements for service automation is an essential challenge in order to improve, combine and evolve these arising technologies to reach a usable solution for integration. The aim of this work is to identify the limitations of current approaches and to define a set of requirements on service descriptions to enable automatic location, combination, invocation and interoperation of services.

Full text: PDF



Finding Hubs for Personalized Web Search

Daniel Olmedilla, September 2003

Abstract
The web has evolved rapidly during the last years. It is getting bigger and bigger really fast and the amount of information that Internet provides to users is becoming unmanageable. Current existing search engines are really useful but many times users do not find what they search for. A search engine performs the following steps: it receives a query, searches in its index for relevant documents, then it ranks the documents found relevant and shows the results. One problem of this approach is that current rank algorithms are global ranks, it means that two different users will always get the same results even though they have nothing in common. This leads to the problem that the user has to go through the slow and time consuming process of filtering and selecting the pages he finds interesting from all of those returned by a search engine as a result of the query sorted according to a global ranking. If the system knew about user interests it could filter and personalize the results for him automatically instead of letting him do it manually and waste his time. In this information about the user is gathered analysing his user behaviour when he surfs the internet and from some pages he founds interesting pages (like his bookmarks). With this information the system provides a personalised ranking which better matches user's preferences and interests reducing the time spent by the user in the process.

Full text: PDF