My research activities focuses on describing, structuring,
indexing, mining and linking multimedia contents. In this general
framework, I dedicate particular attention to multimodal statistical
modeling, speech recognition and spoken content processing,
unsupervised multimedia data mining and automatic link authoring. My
research activities builds upon various scientific foundations such as
multimedia signal processing, indexing and information retrieval,
natural language processing, multimedia data mining and statistical
Recently, I have been conducting activities in:
- Multimodal video
modeling: Integration the audio, visual and eventually text
modality into statistical models
analysis: Exploiting spoken content for multimedia content
- multimodal conditional random fields for video structuring
- multimodal geolocalization of video clips using image, metadata
and social data
- learning the dependencies in Bayesian networks for
event detection in videos
- multistream hidden Markov models and segmental models for video
Unsupervised multimedia mining:
in multimedia streams in a totally
- new decoding paradigms and knowledge-driven speech recognition
- tight integration of ASR and NLP for robust spoken document
analysis: topic segmentation, keyword extraction, etc.
- transducer based multilevel spoken content indexing
- efficient pattern matching and representation for example based
spoken term detection
I am also quite active in benchmark initiatives with the organization
of the French spoken technology evaluations ESTER 2003,
2005, 2009 and the ETAPE 2012 follow-up, the Affect task at MediaEval
2011, 2012 and 2013 and the Spoken Web Search Task at MediaEval
- unsupervised structure inference from an homogeneous collection
- indexing and hashing for large scale fast near duplicate video
- cross modal clustering for unsupervised discovery of
audiovisually consistent repeated segments
- efficient segmental pattern matching for word and near
duplicate audio discovery
I participate in several initiatives regarding animation of the
scientific communitee: president of the French-speaking Speech Communication
Association (AFCP), co-founder of the Speech and Language in Multimedia
(SLIM) special interest group of ISCA, co-founder and general chair
of the 1st Joint IEEE and
ISCA Workshop on Speech, Language and Audio in Multimedia (SLAM),
program coordination chair of Interspeech
Check the texmix demo on
broadcast news archives for an idea of what it is I do in spoken
Recent participation in
projects (contribution to the project)
I am currently involved in the following projects
Over the last few years, I have participated to the following projects
- OpenSEM: an EIT ICT Labs
open portal for semantic access to videos (video
and spoken content analysis, navigation portal, program comittee of MediaEval 2011)
- Rev-TV: using virtual reality for television program edition (speech
- Attelage de Systèmes
Hétérogènes (ASH): harnessing heteregeneous speech
recognition systems for collaborative speech recognition (speech
Parole (ETAPE): evaluation campaign on TV stream transcription for
French language (on behalf of the AFCP)
detection, multimodal integration, video structure
Participation in the activities of the MUSCLE European Network of
- Rapsodis: improving speech recognition with syntax
integration, video structuring, speech transcription, transcribed text
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
French spoken document rich transcription evaluation campaign (campaign
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Bingqing Qu. Learning the structure of a TV show from a
collection of homogeneous occurrences (in collaboration with INA)
- Abir NCibi. On conditional random fields for video stream
- Ludivine Kuznik. Browsing news archives (in collaboration with
- Cédric Penet. Multimodal content based analysis for video
on demand (in collaboration with Technicolor)
- Stefan Ziegler. Landmark driven speech recognition
- Julien Fayolle. Information retrieval in TV streams
Past Ph. D. students:
More Ph. D. in which I have been or I am involved in (but not
supervising in any way):
- Camille Guinaudeau. Automatic
structuring of TV streams. Ph. D.
thesis, INSA Rennes, December 2011 (in French).
- Armando Muscariello. Variability
repeating patterns in audio data with template matching. Ph. D.
Université de Rennes 1, January 2011.
- Gwénolé Lecorvé. Unsupervised
topic adaptation for robust speech recognition. Ph. D.
INSA Rennes, November 2010 (in French, awarded best
Ph. D. thesis in speech communication in 2011 by AFCP, the
French regional branch of ISCA).
- Siwar Baghdadi. Sparse events
detection in videos
with Bayesian networks. Ph. D. thesis, Université de
February 2010 (in French).
- Wen Xuan Teng. Rapid speaker
adaptation using a variable subspace of reference models. Ph.
Université de Rennes 1, December 2008.
- Stephane Huet.Morpho-syntactic knowledge and topic
adaptation to improve speech recognition. Ph. D. thesis, Université
- Manolis Delakis. Multimodal
tennis video structure analysis with segment models. Ph. D.
thesis, Université de Rennes 1, October 2006.
- Ewa Kijak. Multimodal
sport video structuring with stochastic models. Ph. D.
Université de Rennes 1, 2003 (in French).
- Romain Tavenard. Indexation de séquences de descripteurs
pour exploiter audio et vidéo.
- Xavier Naturel. Automatic structuring of TV streams.
Rennes 1, 2007 (in French).
- Mathieu Ben. Robust approaches for automatic speaker verification
using normalization and hierarchical adaptation. Ph. D. thesis,
Université de Rennes 1, 2004 (in French).
I actively participated in the development of the following free
a speech signal processing toolkit
generic tools for audio segmentation
a large vocabulary decoder for speech recognition
These toolkits are the base (with a little help from HTK) of the
IRENE broadcast news indexing platform ,
orginally developped for the French
Ester rich transcription evaluation campaign in collaboration with François Yvon. Also
check out my free ESTER
In the framework of the ASR/NLP work group I am coanimating, we have
pieces of code related to spoken document analysis. Among others, worth
- IRISA News Topic Segmenter: wrapper to
topic-segmenter for the segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a
topic characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic
relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not open-source freely distributed softwares but
we are nevertheless willing to share. Feel free to contact
should you be interested in any of those.
Selected recent publications
Check out my complete
list of publications.
- Benjamin Lecouteux, Georges Linarès, Yannick Estève
and Guillaume Gravier. Dynamic
combination of automatic speech recognition systems by driven decoding.
appear in IEEE Trans. on Audio, Speech and Language Processing,
- Armando Muscariello, Guillaume Gravier and Frédéric
Bimbot. Unsupervised motif
acquisition in speech via seeded discovery and template matching
combination. In IEEE Trans. on Audio, Speech and Language,
- Camille Guinaudeau, Guillaume Gravier, and Pascale
Sébillot. Enhancing lexical cohesion measure with confidence
measures, semantic relations and language model interpolation for
multimedia spoken content topic segmentation. Computer
Language, 26(2):90-104, 2012.
- Guillaume Gravier, Claire-Hélène Demarty, Siwar
Baghdadi and Patrick Gros. Classification-oriented structure learning
in Bayesian networks for multimodal event detection in videos. In
Multimedia Tools and Applications, 2012
- Guillaume Gravier, Camille Guinaudeau, Gwénolé
Lecorvé, and Pascale Sébillot. Exploiting speech for automatic TV delinearization:
From streams to cross-media semantic navigation. In
Journal of Image and Video Processing, 2011, 2011:689780.
- Mathieu Ben and Guillaume Gravier. Unsupervised mining of audiovisually consistent
segments in videos with application to structure analysis.
In IEEE Intl. Conf. on Multimedia and Exhibition, 2011.
- Gwénolé Lecorvé, Guillaume Gravier, and
Pascale Sébillot. Automatically finding semantically consistent N-grams
to add new words in LVCSR systems. In IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing, 2011.
- Stéphane Huet,
Guillaume Gravier, and Pascale Sébillot. Morpho-syntactic post-processing of
N-best lists for improved French automatic speech recognition.
Computer Speech and Language,
- Sylvain Galliano,
Guillaume Gravier, and Laura Chaubard. The ESTER
2 evaluation campaign for the rich transcription of French radio
In Proc. Annual Intl. Speech Communication Association
Gravier, and Patrick Gros. Audiovisual Integration with
Segment Models for Tennis Video Parsing. Computer Vision
and Image Understanding, 111(2):142-154, August 2008.
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I
received a Ph. D. in Signal and Image Processing (Toward speech
modeling with Markov random fields) at the Ecole
Telecommunications (ENST Paris) in 2000.
After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002. Since
2002, I am a research fellow at the Centre
Scientifique (CNRS), working at the Institut
Recherche en Informatique et
Systèmes Aléatoires (IRISA). I received the
Habilitation à Diriger des Recherches (HDR) de
l'Université de Rennes 1, spécialité Informatique,
Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71