Search results

30 records were found.

We present a method that exploits an information theoretic framework to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the active speaker among different candidates. The results show that our method is able to exploit the shared speech information contained in audio and video signals to recover their common source.
This work addresses the problem of detecting the speaker on audio-visual sequences by evaluating the synchrony between the audio and video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework so as to get confidence levels associated to the classifier outputs. Such an approach allows to evaluate the whole classification process efficiency, and in particular, to evaluate the advantage of performing or not the feature extraction. As a result, it is shown that introducing a feature extraction step prior to the classification increases the ability of the classifier to produce good relative instance scores.
Want to know more?If you want to know more about this cutting edge product, or schedule a demonstration on your own organisation, please feel free to contact us or read the available documentation at http://www.keep.pt/produtos/retrievo/?lang=en