[ES] El trabajo, encuadrado dentro de las tecnologías de Recuperación de Información (RI) en ambiente WEB, se centra en el estudio particular de las dificultades causadas por sofisticadas introducciones de SPAM, deteriorando los resultados de las búsquedas efectuadas por los motores de búsqueda.
[EN] A desirable property of learning algorithms is the ability of incorporating new data in an incremental way. Incremental algorithms have received attention on the last few years. Particulary Bayesian networks, this is due to the hardness of the task. In Bayesian networks one example can change the whole structure of the Bayesian network. In this theses we focus on incremental induction of Tree Augmented Naive Bayes (TAN) algorithm. A incremental version of TAN saves computing time, is more suite to data mining and concept drift. But, as usual in Bayesian learning TAN is restricted to discrete attributes. Complementary to the incremental TAN, we propose an incremental discretization algorithm, necessary to evaluate TAN in domains with continuous attribute. Discretization is a fundamental pre-processing step for some well- known algo...
[ES] Esta investigación es un estudio cuantitativo de los sitios Web de hospitales y centros de investigación de Castilla y León, fundamentado en la cibermetría. A través de medidores e indicadores es posible comprender cuáles son las tendencias en la construcción y el diseño de este tipo de sitios Web. Además se muestran los resultados de forma visual, con grafos de redes. La investigación puede ser de interés para mejorar los sitios Web actuales, así como para la construcción y el diseño de los venideros. [EN] This research is a quantitative study of the hospitals and research centers of Castilla y León Websites, based on the cybermetrics.Through measures and indicators is possible to understand design and construction trends of this type of Website. In addition the results are displayed visually, with network graphs.The research ma...
This document shows the current state of the digital libraries. Besides, it does a tour across thedifferent definitions enunciated in the latter years. Provided that the fields of performance aswell as the nature of the digital libraries are very varied, it proposes several classifications. Alsoit treats other fix aspects on the construction and design of digital libraries. In this respect, thereare exposed some of the most interesting concepts of the application in the distributed systems.Nevertheless, the construction of digital libraries is not anything trivial, for this reason, some ofthe more important troubles are exposed.One of the more important performance fields of the digital libraries is the universe ofInternet and, more specifically, the web content management. For this reason, there is included awhole chapter dedicated to...
[EN] Today most major universities have a service dedicated to publishing books and magazines. Few universities are posed research activity without a clear correlation in a device for transmitting knowledge and research results often take the form of a Secretariat Service or publications, or directly into a university. For the study of the Andalusian production and university have used two sources of statistical and bibliographical two.
We describe the design and operation of an information retrieval engine, based on the vector model and intended to serve as a basis for experimentation in research tasks, as well as a resource for teaching. However, the engine is fully operational and can be used in environments documentaries. Built on a relational database, facilitates the observation and manipulation of structures and intermediate results, performs the fundamental operations from SQL statements, allowing easy modification of their internal operations and therefore experimentation.
Interaction with the user in Information Retrieval Systems allows the formulation of more efficient queries that produce better results. Relevance feedback is a process where users identify relevant documents in an initial list of retrieval documents, and the system then creates a new query based on positive and negative examples of those documents. The user interface is an important element in the process. Usually followed process in relevance feedback technique and experimental results are shown. They allow estimating the degree of improvement in the results of the retrieval by means of these systems.
This article describes some of the activities of the REINA research group about Web information retrieval. These activities have focused on proving the retrieval that can be expected from diverse informative present in the elements of web pages, besides the text that the user visualizes normally in the browser. Our aim was to try to the performance when mixing or combining these elements. Combining terms from diverse elements in one unique index can be obtained using the frequency of the terms in the vector space model, when uses a TFxIDF scheme. The BODY field is obviously the most powerful, but the text of the ANCHORs of the backlinks that receive the pages add a considerable improvement retrieval performance. The content of the METa tags, nevertheless, pay little to the improvement in the retrieval performance.
The objective of this communication is to make a review of the evolution, in the last 10 years, in the field of the Web information retrieval. With the implantation of the different cybermetrics techniques the evolution from the studies of the Web has been spectacular and is at the moment an inexhaustible field of study.
