A Novel Approach for Compound Document Matching

Springmann, Michael

A Novel Approach for Compound Document Matching

Authors

Michael Springmann

Type

Bulletin

Date

2006/7

Appears in

Bulletin of the IEEE Technical Committee on Digital Libraries (TCDL)

Publisher

IEEE Technical Committee on Digital Libraries

Abstract

Future digital libraries will not only contain pure text documents, but increasingly will hold massive amounts of compound documents that comprise many multimedia objects, e.g., texts, images, audio, and video. Already existing collections of documents, e.g., all electronic health records of one clinic can form a digital library with millions of multimedia objects and a total storage of several terabytes. It is therefore important to provide ways for effective and efficient retrieval for those collections. This paper proposes a novel approach for compound document matching using a filter-and-refinement algorithm for similarity-based retrieval within documents, which may consist of arbitrarily many objects of various media types. At the same time, this approach increases the effectiveness by establishing only semantically meaningful matches and providing greater expressiveness in queries by restricting the number of allowed matches to a single query object.

Download

http://www.ieee-tcdl.org/Bulletin/v2n2/springmann/springmann.html

Staff members

Michael Springmann

Research Projects

DELOS Network of Excellence on Digital Libraries