My research activities focuses on describing, structuring,
indexing, mining and linking multimedia contents. In this general
framework, I dedicate particular attention to multimodal
statistical modeling, speech recognition and spoken content
processing, unsupervised multimedia data mining and automatic link
authoring. My research activities builds upon various scientific
foundations such as multimedia signal processing, indexing and
information retrieval, natural language processing, multimedia
data mining and statistical machine learning.
Recently, I have been conducting activities in:
- Multimodal video modeling:
Integration the audio, visual and eventually text modality into
Spoken content analysis:
Exploiting spoken content for multimedia content processing
- multimodal conditional random fields for video structuring
- multimodal geolocalization of video clips using image,
metadata and social data
- learning the dependencies in Bayesian networks for event
detection in videos
- multistream hidden Markov models and segmental models for
mining: Discovering reoccurring motifs in multimedia
streams in a totally unsupervised fashion
- new decoding paradigms and knowledge-driven speech
- tight integration of ASR and NLP for robust spoken document
analysis: topic segmentation, keyword extraction, etc.
- transducer based multilevel spoken content indexing
- efficient pattern matching and representation for example
based spoken term detection
I am also quite active in benchmark initiatives with the
organization of the French spoken technology evaluations ESTER 2003,
2005, 2009 and the ETAPE 2012 follow-up, the Affect task at MediaEval
2011, 2012 and 2013 and the Spoken Web Search Task at
- unsupervised structure inference from an homogeneous
collection of data
- indexing and hashing for large scale fast near duplicate
video motif discovery
- cross modal clustering for unsupervised discovery of
audiovisually consistent repeated segments
- efficient segmental pattern matching for word and near
duplicate audio discovery
I participate in several initiatives regarding animation of the
scientific communitee: president of the French-speaking Speech
Communication Association (AFCP), co-founder of the Speech and Language in Multimedia
(SLIM) special interest group of ISCA, co-founder and general
chair of the 1st Joint
IEEE and ISCA Workshop on Speech, Language and Audio in Multimedia
(SLAM), program coordination chair of Interspeech 2013.
Check the texmix demo on
navigating broadcast news archives for an idea of what it is I do in
spoken content processing.
Recent participation in projects (contribution
to the project)
I am currently involved in the following projects
Over the last few years, I have participated to the following
- OpenSEM: an EIT ICT Labs open portal
for semantic access to videos (video
and spoken content analysis, navigation portal, program
comittee of MediaEval
- Rev-TV: using virtual reality for television program edition (speech
recognition, lip sync)
- Attelage de Systèmes Hétérogènes (ASH): harnessing
heteregeneous speech recognition systems for collaborative
speech recognition (speech recognition, knowledge
la Parole (ETAPE): evaluation campaign on TV stream
transcription for the French language (on behalf of the AFCP)
multimodal search engines (audio event
detection, multimodal integration, video
Participation in the activities of the MUSCLE European Network of
- Rapsodis: improving speech recognition with syntax and
multimedia stream structuring (multimedia integration,
video structuring, speech transcription, transcribed text
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
French spoken document rich transcription evaluation campaign (campaign
organization; BN rich transcription system development)
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Bingqing Qu. Learning the structure of a TV show from a
collection of homogeneous occurrences (in collaboration with INA)
- Abir NCibi. On conditional random fields for video stream
- Ludivine Kuznik. Browsing news archives (in collaboration with
- Cédric Penet. Multimodal content based analysis for video on
demand (in collaboration with Technicolor)
- Stefan Ziegler. Landmark driven speech recognition
- Julien Fayolle. Information retrieval in TV streams
Past Ph. D. students:
More Ph. D. in which I have been or I am involved in (but not
supervising in any way):
- Camille Guinaudeau. Automatic structuring of TV
streams. Ph. D. thesis, INSA Rennes, December
2011 (in French).
- Armando Muscariello. Variability
patterns in audio data with template matching. Ph. D.
thesis, Université de Rennes 1, January 2011.
- Gwénolé Lecorvé. Unsupervised topic adaptation
for robust speech recognition. Ph. D. thesis,
INSA Rennes, November 2010 (in French, awarded best Ph. D.
thesis in speech communication in 2011 by AFCP, the French
regional branch of ISCA).
- Siwar Baghdadi. Sparse
events detection in videos with Bayesian networks. Ph.
D. thesis, Université de Rennes 1, February 2010 (in French).
- Wen Xuan Teng. Rapid speaker adaptation using a
variable subspace of reference models. Ph.
D. thesis, Université de Rennes 1, December 2008.
- Stephane Huet.Morpho-syntactic knowledge and
topic adaptation to improve speech recognition.
Ph. D. thesis, Université de Rennes 1, December 2007
- Manolis Delakis. Multimodal tennis video
structure analysis with segment models. Ph. D.
thesis, Université de Rennes 1, October 2006.
- Ewa Kijak. Multimodal sport video
structuring with stochastic models. Ph. D.
thesis, Université de Rennes 1, 2003 (in French).
- Romain Tavenard. Indexation de séquences de descripteurs pour
exploiter audio et vidéo.
- Xavier Naturel. Automatic structuring of TV
streams. Ph. D. thesis, Université de Rennes 1,
2007 (in French).
- Mathieu Ben. Robust approaches for automatic speaker
verification using normalization and hierarchical adaptation.
Ph. D. thesis, Université de Rennes 1, 2004 (in French).
I actively participated in the development of the following free
- SPro, a
speech signal processing toolkit
generic tools for audio segmentation
a large vocabulary decoder for speech recognition
These toolkits are the base (with a little help from HTK) of the IRENE broadcast
news indexing platform , orginally developped for the French Ester
rich transcription evaluation campaign in collaboration with
François Yvon. Also
check out my free
ESTER resources page.
In the framework of the ASR/NLP work group I am coanimating, we
have developed several pieces of code related to spoken document
analysis. Among others, worth mentioning are:
- IRISA News Topic Segmenter: wrapper to topic-segmenter for the
segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a topic
characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic relations
(and a bunch of relations from a large newspaper corpus)
These toolkits are not open-source freely distributed softwares
but we are nevertheless willing to share. Feel free to contact me
should you be interested in any of those.
Selected recent publications
Check out my complete list of publications.
- Benjamin Lecouteux, Georges Linarès, Yannick Estève and
Guillaume Gravier. Dynamic
combination of automatic speech recognition systems by driven
decoding. To appear in IEEE Trans. on Audio,
Speech and Language Processing, 2013.
- Armando Muscariello, Guillaume Gravier and Frédéric Bimbot. Unsupervised motif acquisition in
speech via seeded discovery and template matching combination.
In IEEE Trans. on Audio, Speech and Language,
- Camille Guinaudeau, Guillaume Gravier, and Pascale Sébillot. Enhancing lexical cohesion measure with
confidence measures, semantic relations and language model
interpolation for multimedia spoken content topic
segmentation. Computer Speech and Language,
- Guillaume Gravier, Claire-Hélène Demarty, Siwar Baghdadi and
Patrick Gros. Classification-oriented structure learning in
Bayesian networks for multimodal event detection in videos. In
Multimedia Tools and Applications, 2012
- Guillaume Gravier, Camille Guinaudeau, Gwénolé Lecorvé, and
Pascale Sébillot. Exploiting speech for automatic TV
delinearization: From streams to cross-media semantic
navigation. In Journal of Image and Video
Processing, 2011, 2011:689780.
- Mathieu Ben and Guillaume Gravier. Unsupervised mining of audiovisually
consistent segments in videos with application to structure
analysis. In IEEE Intl. Conf. on
Multimedia and Exhibition, 2011.
- Gwénolé Lecorvé, Guillaume Gravier, and Pascale Sébillot. Automatically finding semantically consistent
N-grams to add new words in LVCSR systems. In IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing,
- Stéphane Huet, Guillaume Gravier, and Pascale Sébillot. Morpho-syntactic
post-processing of N-best lists for improved French
automatic speech recognition. Computer
Speech and Language, (24):663-684, 2010.
- Sylvain Galliano, Guillaume Gravier, and Laura Chaubard. The ESTER 2
evaluation campaign for the rich transcription of French radio
broadcasts. In Proc. Annual Intl. Speech
Communication Association Conference (Interspeech), pages
- Manolis Delakis, Guillaume Gravier, and Patrick Gros. Audiovisual
Integration with Segment Models for Tennis Video Parsing.
Computer Vision and Image Understanding, 111(2):142-154, August
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis
at ELAN Informatique from 1996 to
1997. I received a Ph. D. in Signal and Image Processing (Toward
speech modeling with Markov random fields) at the Ecole National Superieure des
Telecommunications (ENST Paris) in 2000. After a one year
post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002.
Since 2002, I am a research fellow at the Centre National pour la Recherche
Scientifique (CNRS), working at the Institut de Recherche en Informatique
et Systèmes Aléatoires (IRISA). I received the Habilitation à
Diriger des Recherches (HDR) de l'Université de Rennes 1, spécialité
Informatique, in 2009.
Guillaume Gravier, Irisa, Campus de
Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71