Home Research People Publications Events 

Project: Attribute Extraction from Product Descriptions

Retailers have been collecting a growing amount of sales data that contains quite detailed information about customers and related transactions; in contrast, the information about the actual products that were sold is often sparse and limited. Most retailers treat their products as atomic entities with very few related attributes (typically brand, size, or color). At the same time, they offer their products to customers (on a web site) that describes each product in detail, specifying the product’s physical attributes, but typically in natural language, making it difficult to be used directly in many applications. The task we tackle in this project requires a system that can process product descriptions and extract relevant attributes and values, and then form pairs by associating values with the attributes they describe.  The system we are building is able to extract attribute-value pairs from product descriptions with minimal human supervision. it is intended an an enabler for a variety of applications including assortment planning, brand management and catalog mapping. Our methodology includes machine learning techniques for labeled and unlabeled data (co-EM), natural language processing techniques and an active learning feedback loop that interactively refines the inferred hypotheses.

People:

bulletAndrew Fano
bulletRayid Ghani
bulletMarko Krema
bulletKatharina Probst

Papers:

Semi-Supervised Learning to Extract Attribute-Value Pairs from Product Descriptions on the Web.
Katharina Probst, Rayid Ghani, Marko Krema, Andrew Fano, Yan Liu.
Web Mining Workshop  - held at European Conference on Machine Learning (ECML/PKDD 2006).

Text Mining for Product Attribute Extraction.
Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, Andrew Fano.
SIGKDD Explorations. Vol 8. Issue 1. 2006