SCAD: collective discovery of attribute values

Bakalov, Anton ; Fuxman, Ariel ; Talukdar, Partha Pratim ; Chakrabarti, Soumen (2011) SCAD: collective discovery of attribute values In: 20th international conference on World wide web.

Full text not available from this repository.

Official URL: http://doi.org/10.1145/1963405.1963469

Related URL: http://dx.doi.org/10.1145/1963405.1963469

Abstract

Search engines today offer a rich user experience, no longer restricted to "ten blue links". For example, the query "Canon EOS Digital Camera" returns a photo of the digital camera, and a list of suitable merchants and prices. Similar results are offered in other domains like food, entertainment, travel, etc. All these experiences are fueled by the availability of structured data about the entities of interest. To obtain this structured data, it is necessary to solve the following problem: given a category of entities with its schema, and a set of Web pages that mention and describe entities belonging to the category, build a structured representation for the entity under the given schema. Specifically, collect structured numerical or discrete attributes of the entities. Most previous approaches regarded this as an information extraction problem on individual documents, and made no special use of numerical attributes. In contrast, we present an end-to-end framework which leverages signals not only from the Web page context, but also from a collective analysis of all the pages corresponding to an entity, and from constraints related to the actual values within the domain. Our current implementation uses a general and flexible Integer Linear Program (ILP) to integrate all these signals into holistic decisions over all attributes. There is one ILP per entity and it is small enough to be solved in under 38 milliseconds in our experiments. We apply the new framework to a setting of significant practical importance: catalog expansion for Commerce search engines, using data from Bing Shopping. Finally, we present experiments that validate the effectiveness of the framework and its superiority to local extraction.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Association for Computing Machinery
ID Code:130939
Deposited On:01 Dec 2022 10:08
Last Modified:01 Dec 2022 10:08

Repository Staff Only: item control page