Search results

41 records were found.

We describe the design and operation of an information retrieval engine, based on the vector model and intended to serve as a basis for experimentation in research tasks, as well as a resource for teaching. However, the engine is fully operational and can be used in environments documentaries. Built on a relational database, facilitates the observation and manipulation of structures and intermediate results, performs the fundamental operations from SQL statements, allowing easy modification of their internal operations and therefore experimentation.
Interaction with the user in Information Retrieval Systems allows the formulation of more efficient queries that produce better results. Relevance feedback is a process where users identify relevant documents in an initial list of retrieval documents, and the system then creates a new query based on positive and negative examples of those documents. The user interface is an important element in the process. Usually followed process in relevance feedback technique and experimental results are shown. They allow estimating the degree of improvement in the results of the retrieval by means of these systems.
This article describes some of the activities of the REINA research group about Web information retrieval. These activities have focused on proving the retrieval that can be expected from diverse informative present in the elements of web pages, besides the text that the user visualizes normally in the browser. Our aim was to try to the performance when mixing or combining these elements. Combining terms from diverse elements in one unique index can be obtained using the frequency of the terms in the vector space model, when uses a TFxIDF scheme. The BODY field is obviously the most powerful, but the text of the ANCHORs of the backlinks that receive the pages add a considerable improvement retrieval performance. The content of the METa tags, nevertheless, pay little to the improvement in the retrieval performance.
The objective of this communication is to make a review of the evolution, in the last 10 years, in the field of the Web information retrieval. With the implantation of the different cybermetrics techniques the evolution from the studies of the Web has been spectacular and is at the moment an inexhaustible field of study.
The exponential growth of web and distributed data characteristics, high volatility, unstructured data, redundant and highly heterogeneous, have introduced new problems in information retrieval processes. Therefore it is necessary to open new avenue of research that allow us to obtain good levels of accuracy. The papers are based on exploiting the hypertext features of the site is reaching great fame. The cybermetrics is providing many options for working with links and is offering some interesting options at this time, and much of the techniques used in the same may be useful in the processes of information retrieval on the web.
This paper describes our work at CLEF 2007 Robust Task. We have participated in the monolingual (English, French and Portuguese) and the bilingual (English to French) subtask. At CLEF 2006 our research group obtained very good results applying local query expansion using windows of terms in the robust task. This year we have used the same expansion technique, but taking into account some criteria of robustness: MAP, GMAP, MMR, GS@10, P@10, number of failed topics, number of topics bellow 0.1 MAP, and number of topics with P@10=0. In bilingual retrieval experiments three machine translation programs were used to translate topics. For the target language, translations were merged before performing a monolingual retrieval. We also applied the same local expansion technique. This year the results were disappointing. We think out that the r...
Automatic categorisation can be understood as a learning process during which a programme recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was classified manually and the results of both procedures were compared.
An introduction to the power laws, enunciated by Michalis Faloutsos, is made and that allows us to make a characterization of the Web through the analysis of their topology. Their most important characteristics are described and how calculate some of the values of the most interesting functions.
The automatic categorization can be viewed as a learning process, during which a program captures the characteristics that distinguish each category or class from others, ie those who must have documents to belong to that category. On the other hand, few experiments have been carried out yet with documents in Spanish. It shows the possibilities of elaborating pattern vectors which collect the characteristics of different classes or categories of documents by techniques based on those applied in the expansion of queries by relevance. At the same time, describes an experiment involving the application of these techniques to a collection of press releases in Spanish, for categorization. The results are, overall, qualified, or even better than those obtained in similar experiments, for some categories, these results improve
his paper describes our work at CLEF 2006 Robust task. This task is an ad-hoc task that explores methods for stable retrieval by focusing on poorly performing topics. We have realized experiments for all subtask: monolingual (EN, ES, FR and IT), bilingual (IT→ES) and multilingual (ES→[EN ES FR IT]) retrieval. For monolingual retrieval we have focused our work on local query expansion, i.e. using only the information from retrieved documents. External corpora, such as the Web, were not used. Our document retrieval system is simple; it is based on vector space model. Some local expansion techniques were applied for training topics. The best improvement was achieved using association thesauri, which were constructed employing co-occurrence relations in term windows, not in complete document. This technique is effective and can be easily i...
Want to know more?If you want to know more about this cutting edge product, or schedule a demonstration on your own organisation, please feel free to contact us or read the available documentation at http://www.keep.pt/produtos/retrievo/?lang=en