Colour in Context
Research group Computer Vision Center |
The success of the bag-of-words framework is highly dependent on the quality of the visual vocabulary. In this work we investigate visual vocabularies which are used to represent images whose local features are described by both shape and color. To extend BOW to multiple cues, two properties are especially important: cue binding and cue weighting. A visual vocabulary is said to have the binding property when two independent cues appearing at the same location in an image remain coupled in the final image representation. The property of cue weighting implies that it is possible to adapt the relevance of each cue depending on the dataset. The importance of cue weighting can be seen from the success of Multiple Kernel Learning (MKL) techniques where weights for each cue are automatically learned.
We describe a novel technique for feature combination in the bag-of-words model
of image classification. Our approach builds discriminative compound words from
primitive cues learned independently from training images. Our main observation
is that modeling joint-cue distributions independently is more statistically robust
for typical classification problems than attempting to empirically estimate the dependent,
joint-cue distribution directly. We use Information theoretic vocabulary
compression to find discriminative combinations of cues and the resulting vocabulary
of portmanteau1 words is compact, has the cue binding property, and supports
individual weighting of cues in the final image representation.
Code Available
Data Available
Literature
Fahad Shahbaz Khan, Joost van de Weijer, Andrew D. Bagdanov and Maria Vanrell Portmanteau Vocabularies for Multi-Cue Image Representations , Proc. NIPS 2011 , Granada, Spain, 2011. (Poster)