Garofalakis, Minos ; Kumar, Amit (2003) Correlating XML data streams using tree-edit distance embeddings In: PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 9 - 11, 2003, San Diego California.
Full text not available from this repository.
Official URL: http://doi.org/10.1145/773153.773168
Related URL: http://dx.doi.org/10.1145/773153.773168
Abstract
We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2 n log* n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright of this article belongs to Association for Computing Machinery. |
ID Code: | 123558 |
Deposited On: | 30 Sep 2021 10:45 |
Last Modified: | 30 Sep 2021 10:45 |
Repository Staff Only: item control page