Research topics
From a general point of view, my research activities are rooted in
the field of multimedia content analysis for structuring and indexing
purposes, with a shift from the informatino retrieval paradigm towards
navigation in structured collections. My work stands at the frontier
between speech processing and multimedia, with a strong background on
statistical modeling in these two areas. In particular, my interests
are in:
- Multimodal video
modeling: The aim of this research is to devise models that can
integrate the
audio, visual and eventually textual information and represent their
relations (temporal
synchronisation model, correlations, etc.) for the analysis and
structuring of videos and for audiovisual ASR. Current
activities include:
- learning the dependencies in Bayesian networks for
event detection in videos
- multimodal topic segmentation
- speech-driven structuring of TV streams
- Spoken content
analysis: detecting and tracking audio events in videos; speaker
segmentation and tracking;
speech recognition; topic segmentation; spoken document indexing. I am
currently
interested in the following topics:
- combining ASR and NLP for robust spoken document
analysis
- integrating knowledge (e.g.,
phonetic landmarks) in
HMM-based ASR
- spoken term detection from audio queries
- Unsupervised motif discovery:
discovering
reoccurring
motifs in multimedia streams in a totally
unsupervised fashion
- word and near duplicate audio discovery
- unsupervised video mining for structure analysis
I am also quite active in benchmark initiatives with the organization
of the French spoken technology evaluations ESTER 2003,
2005, 2009 and the ETAPE 2012 follow-up and of the Affect task at MediaEval
2011 and 2012.
Check the texmix demo on
navigating
broadcast news archives for an idea of what it is I do!
Recent participation in
projects (contribution to the project)
I am currently involved in the following projects
- OpenSEM: an EIT ICT Labs
open portal for semantic access to videos (video
and spoken content analysis, navigation portal, program comittee of MediaEval 2011)
- Rev-TV: using virtual reality for television program edition (speech
recognition,
lip
sync)
- Attelage de Systèmes
Hétérogènes (ASH): harnessing heteregeneous speech
recognition systems for collaborative speech recognition (speech
recognition,
knowledge
integration)
- Évaluations
en
Traitement Automatique de la
Parole (ETAPE): evaluation campaign on TV stream transcription for
the
French language (on behalf of the AFCP)
- Quaero:
multimodal
search
engines
(audio event
detection, multimodal integration, video structure
analysis)
Over the last few years, I have participated to the following projects
- Rapsodis: improving speech recognition with syntax
and semantics
- Demi-Ton:
multimedia
stream
structuring
(multimedia
integration, video structuring, speech transcription, transcribed text
analysis)
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
- ESTER:
French spoken document rich transcription evaluation campaign (campaign
organization;
BN
rich
transcription system development)
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Participation in the activities of the MUSCLE European Network of
Excellence.
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Abir NCibi. On conditional random fields for video stream
structuring
- Ludivine Kuznik. Browsing news archives (in collaboration with
INA )
- Cédric Penet. Multimodal content based analysis for video
on demand (in collaboration with Technicolor)
- Stefan Ziegler. Landmark driven speech recognition
- Julien Fayolle. Information retrieval in TV streams
Past Ph. D. students:
- Camille Guinaudeau. Automatic
structuring of TV streams. Ph. D.
thesis, INSA Rennes, December 2011 (in French).
- Armando Muscariello. Variability
tolerant
discovery of arbitrary
repeating patterns in audio data with template matching. Ph. D.
thesis,
Université de Rennes 1, January 2011.
- Gwénolé Lecorvé. Unsupervised
topic adaptation for robust speech recognition. Ph. D. thesis,
INSA Rennes, November 2010 (in French, awarded best
Ph. D. thesis in speech communication in 2011 by AFCP, the
French regional branch of ISCA).
- Siwar Baghdadi. Sparse events
detection in videos
with Bayesian networks. Ph. D. thesis, Université de
Rennes 1,
February 2010 (in French).
- Wen Xuan Teng. Rapid speaker
adaptation using a variable subspace of reference models. Ph.
D.
thesis,
Université de Rennes 1, December 2008.
- Stephane Huet.Morpho-syntactic knowledge and topic
adaptation to improve speech recognition. Ph. D. thesis, Université
de
Rennes
1,
December 2007 (in French)
- Manolis Delakis. Multimodal
tennis video structure analysis with segment models. Ph. D.
thesis, Université de Rennes 1, October 2006.
- Ewa Kijak. Multimodal
sport video structuring with stochastic models. Ph. D.
thesis,
Université de Rennes 1, 2003 (in French).
More Ph. D. in which I have been or I am involved in (but not
supervising in any way):
- Romain Tavenard. Indexation de séquences de descripteurs
pour exploiter audio et vidéo.
- Xavier Naturel. Automatic structuring of TV streams.
Ph.
D.
thesis,
Université de Rennes 1, 2007 (in French).
- Mathieu Ben. Robust approaches for automatic speaker verification
using normalization and hierarchical adaptation. Ph. D. thesis,
Université de Rennes 1, 2004 (in French).
Software development
I am actively participating in the development of the following free
software toolkits:
- SPro,
a speech signal processing toolkit
- AudioSeg,
generic tools for audio segmentation
- Sirocco,
a large vocabulary decoder for speech recognition
These toolkits are the base (with a little help from HTK) of the
IRENE broadcast news indexing platform ,
orginally developped for the French
Ester rich transcription evaluation campaign in collaboration with François Yvon. Also
check out my free ESTER
resources page.
In the framework of the ASR/NLP work group I am coanimating, we have
developed
several
pieces of code related to spoken document analysis. Among others, worth
mentioning are:
- IRISA News Topic Segmenter: wrapper to
topic-segmenter for the segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a
topic characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic
relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not open-source freely distributed softwares but
we are nevertheless willing to share. Feel free to contact
me
should you be interested in any of those.
Selected recent publications
- Guillaume Gravier, Camille Guinaudeau, Gwénolé
Lecorvé, and Pascale Sébillot. Exploiting speech for automatic TV delinearization:
From streams to cross-media semantic navigation. In
Journal of Image and Video Processing, 2011, 2011:689780.
- Camille Guinaudeau, Guillaume Gravier, and Pascale
Sébillot. Enhancing lexical cohesion measure with confidence
measures, semantic relations and language model interpolation for
multimedia spoken content topic segmentation. Computer
Speech and Language, 26(2):90-104, 2012.
- Mathieu Ben and Guillaume Gravier. Unsupervised mining of audiovisually consistent
segments in videos with application to structure analysis.
In IEEE Intl. Conf. on Multimedia and Exhibition, 2011.
- Gwénolé Lecorvé, Guillaume Gravier, and
Pascale Sébillot. Automatically finding semantically consistent N-grams
to add new words in LVCSR systems. In IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing, 2011.
- Stéphane Huet,
Guillaume Gravier, and Pascale Sébillot.
Morpho-syntactic post-processing of
N-best lists for improved French automatic speech recognition.
Computer Speech and Language,
(24):663-684,
2010.
- Armando
Muscariello, Guillaume
Gravier, and Frédéric Bimbot. Audio
keyword
extraction
by
unsupervised
word discovery. In Conf.
of the Intl. Speech Communication Association (Interspeech), pages
2843-2846, 2009.
- Sylvain Galliano,
Guillaume Gravier, and Laura Chaubard. The ESTER
2 evaluation campaign for the rich transcription of French radio
broadcasts.
In Proc. Annual Intl. Speech Communication Association
Conference (Interspeech),
pages 2583-2586,
2009.
- Manolis
Delakis, Guillaume
Gravier, and Patrick Gros. Audiovisual Integration with
Segment Models for Tennis Video Parsing. Computer Vision
and Image Understanding, 111(2):142-154, August 2008.
- Siwar
Baghdadi, Guillaume
Gravier, Claire-Hélène Demarty, and Patrick Gros. Structure
learning
in
Bayesian
network
based video indexing. In IEEE Intl.
Conf. on Multimedia and Exhibition, pages 667-680, 2008.
Check out my complete
list of publications.
Short bio
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I
received a Ph. D. in Signal and Image Processing (Toward speech
modeling with Markov random fields) at the Ecole
National
Superieure
des
Telecommunications (ENST Paris) in 2000.
After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002. Since
2002, I am a research fellow at the Centre
National
pour
la
Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et
Systèmes Aléatoires (IRISA). I received the
Habilitation à Diriger des Recherches (HDR) de
l'Université de Rennes 1, spécialité Informatique,
in 2009.
Guillaume
Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71
firstname.secondname@irisa.fr