iDiff: Informative Summarization of Differences in Multidimensional Aggregates.

Sarawagi, Sunita (2001) iDiff: Informative Summarization of Differences in Multidimensional Aggregates. Data Mining and Knowledge Discovery, 5 (4). pp. 255-276. ISSN 13845810

Full text not available from this repository.

Official URL: http://doi.org/10.1023/A:1011494927464

Related URL: http://dx.doi.org/10.1023/A:1011494927464

Abstract

Multidimensional OLAP products provide an excellent opportunity for integrating mining functionality because of their widespread acceptance as a decision support tool and their existing heavy reliance on manual, user-driven analysis. Most OLAP products are rather simplistic and rely heavily on the user's intuition to manually drive the discovery process. Such ad hoc user-driven exploration gets tedious and error-prone as data dimensionality and size increases. Our goal is to automate these manual discovery processes. In this paper we present an example of such automation through a iDiff operator that in a single step returns summarized reasons for drops or increases observed at an aggregated level. We formulate this as a problem of summarizing the difference between two multidimensional arrays of real numbers. We develop a general framework for such summarization and propose a specific formulation for the case of OLAP aggregates. We develop an information theoretic formulation for expressing the reasons that is compact and easy to interpret. We design an efficient dynamic programming algorithm that requires only one pass of the data and uses a small amount of memory independent of the data size. This allows easy integration with existing OLAP products. Our prototype has been tested on the Microsoft OLAP server, DB2/UDB and Oracle 8i. Experiments using the OLAP benchmark demonstrate (1) scalability of our algorithm as the size and dimensionality of the cube increases and (2) feasibility of getting interactive answers with modest hardware resources.

Item Type:Article
Source:Copyright of this article belongs to Springer Nature Switzerland AG
Keywords:multidimensional databases;OLAP;OLAP-mining integration;difference mining;data summarization;advanced aggregates
ID Code:128424
Deposited On:20 Oct 2022 09:31
Last Modified:20 Oct 2022 09:31

Repository Staff Only: item control page