the author's ugly face         
Research topics
Projects
Ph. D. students
Software
Publications
Short bio
Contact

Research topics

From a general point of view, my research activities are rooted in the field of multimedia content analysis for structuring and indexing purposes, with a shift from the informatino retrieval paradigm towards navigation in structured collections. My work stands at the frontier between speech processing and multimedia, with a strong background on statistical modeling in these two areas. In particular, my interests are in:

  1. Multimodal video modeling: The aim of this research is to devise models that can integrate the audio, visual and eventually textual information and represent their relations (temporal synchronisation model, correlations, etc.) for the analysis and structuring of videos and for audiovisual ASR. Current activities include:
  2. Spoken content analysis: detecting and tracking audio events in videos; speaker segmentation and tracking; speech recognition; topic segmentation; spoken document indexing. I am currently interested in the following topics:
  3. Unsupervised motif discovery: discovering reoccurring motifs in multimedia streams in a totally unsupervised fashion
I am also quite active in benchmark initiatives with the organization of the French spoken technology evaluations ESTER 2003, 2005, 2009 and the ETAPE 2012 follow-up and of the Affect task at MediaEval 2011 and 2012.

Check the texmix demo on navigating broadcast news archives for an idea of what it is I do!

Recent participation in projects (contribution to the project)

I am currently involved in the following projects

Over the last few years, I have participated to the following projects
Participation in the activities of the MUSCLE European Network of Excellence.

Ph. D. students

Ongoing Ph. D. thesis I am supervising:

Past Ph. D. students:

More Ph. D. in which I have been or I am involved in (but not supervising in any way):

Software development

I am actively participating in the development of the following free software toolkits:

These toolkits are the base (with a little help from HTK) of the IRENE broadcast news indexing platform , orginally developped for the French Ester rich transcription evaluation campaign in collaboration with François Yvon. Also check out my free ESTER resources page.

In the framework of the ASR/NLP work group I am coanimating, we have developed several pieces of code related to spoken document analysis. Among others, worth mentioning are:

These toolkits are not open-source freely distributed softwares but we are nevertheless willing to share. Feel free to contact me should you be interested in any of those.

Selected recent publications

Check out my complete list of publications.

Short bio

I obtained a master degree in Applied Mathematics at the Institut National des Sciences Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I received a Ph. D. in Signal and Image Processing (Toward speech modeling with Markov random fields) at the Ecole National Superieure des Telecommunications (ENST Paris) in 2000. After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology group at IBM T. J. Watson research center from 2001 to 2002. Since 2002, I am a research fellow at the Centre National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA). I received the Habilitation à Diriger des Recherches (HDR) de l'Université de Rennes 1, spécialité Informatique, in 2009.

Guillaume Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71
firstname.secondname@irisa.fr