URL, Summary and Percentage: Click Here for the Next 16,433 matches: Why a URL, Summary and Percentage representation is not eno

Thomas Tan

School of Computing Science, Middlesex University,

Bounds Green Rd,

London N11 2NQ, United Kingdom.

Thomas12@mdx.ac.uk

INTRODUCTION

Internet search services have provided a means for users to locate and access information on the World Wide Web. Even with the proliferation of such services, the limitations of current search engines become more apparent as the body of poorly organised information on the web increases.

TRADITIONAL INFORMATION RETRIEVAL RESEARCH Traditional information retrieval research (Salton and McGill, 83; Witten, Moffat and Bell, 94) has largely concerned itself with improving the effectiveness in terms of processing speed, resource requirements, precision and recall, of indexing and retrieval mechanisms.

Currently, most information retrieval systems including web search engines, employ the statistical and probabilistic techniques (Excite, 97; Autonomy, 97) developed from traditional approaches to determine the relevancy of a document to a user's query. These techniques represent the current state of the art in relevancy determination. However, in an age where there can be such a thing as too much information, the problem of serving only desired information has become more significant than the perfunctory provision of data.

Current systems of retrieval, typified by those on the web, attempt to mitigate the inadequacies in precision by providing the user with a plethora of search features such as Boolean keyword matching and proximity searching. These features generally tend to increase the rate of recall instead of precision, which aggravates the problem by retrieving more documents and not necessarily more precise matches.

INFORMATION EXPLORATION AND VISUALISATION INTERFACES Current models of displaying retrieval results are limiting in that they convey the many perspectives of both collection and document content inadequately in sequential hierarchical fashion. The human perception system is more adept at recognising highly visual multidimensional content than performing thought intensive processes such as reading. With low cost, high quality displays increasingly becoming a standard component of today's computing environment, efforts to address information retrieval problems are increasingly being directed to enhance user exploration of the information space, and to the effective presentation and visualisation of retrieval results.

Examples of such efforts are the Bifocal Display (Spence, 97) which presents the context of the current focus of interest while providing a smooth transition between the focus and the context of an information space; the TileBar system (Hearst, 95)

 

which provides an effective simultaneous and compact visualisation of multidimensional relevance (statistical) data from returned document sets; the HyperSpace system (Beale, McNab and Witten, 96), and the Scatter/Gather system (Hearst and Pedersen, 96).

AIM OF RESEARCH

The aim of the research proposed here is to investigate and devise information retrieval techniques, that will provide improved information retrieval performance through the effective presentation and visualisation of enhanced retrieval results. A critical survey of information visualisation techniques by the author is currently ongoing. It is also of the author's belief that in order to conceive of what enhanced retrieval results may be presented, it is important to understand the many properties of a text corpora. An essential second stage will be to identify what those properties are and how they can be used to better represent a document.

It is envisaged that this work will culminate in the prototyping, evaluation (through user studies) and development of information retrieval systems, that will allow the user to make more informed judgements aboutthe retrieved documents. This will be accomplished through effective document representations other than the ubiquitous URL, Summary and Percentage representation.

ACKNOWLEDGEMENTS

This work is supported by a postgraduate studentship from the School of Computing Science, Middlesex University.

REFERENCES

Autonomy (1997) 'Autonomy Agentware Technology White Paper', http://www.

agentware.com/main/tech/whitepaper.htm, 27thMarch 1998. Beale, R., McNab, R., Witten, I. (1996) 'VisualisingSequences of Queries: A New

Tool For Information Retrieval', R.Beale@cs.bham.ac.uk, School of Computer

Science, University of Birmingham, UK.

Excite (1997) 'Information Retrieval Technology and Intelligent Concept Extraction',

http://www.excite.com/Info/tech.html, 27thMarch 1998.

Hearst, M. (1995) 'TileBars: Visualisation of Term Distribution Information in Full

Text Information Access', in Proceedings of ACM CHI Conference, 1995. Hearst, M., Pederson, J. (1996) 'Reexamining the Cluster Hypothesis: Scatter/Gather

on Retrieval Results', in Proceedings of the Nineteenth Annual International

ACM SIGIR Conference, Zurich, June 1996.

Salton, G., McGill, M. (1983) Introduction to Modern Information Retrieval. New

York: McGraw Hill.

Spence, R. (1997) 'The Acquisition of Insight'. http://www.ee.ic.ac.uk/research/

information/www/bobs/bobs.html. 21stApril 1998.

Witten, I., Moffat, A., Bell, T. (1994) Managing Gigabytes, Van Nostrand Reinhold.

RSS: Syndicate content Syndicate content