Regularized Dynamic Time Warping Kernel (KDTW )

Regularized Dynamic Time Warping Kernel (KDTW)

P-F. Marteau, September 2014

Last updated: May 16, 2016

1 introduction
2 Definition
3 Algorithmic complexity
4 Interpretation
5 Simple Kernel PCA experiment with Euclidean distance, DTW and K_DTW
5.1 Datasets
5.2 Kernel PCA
6 Octave/Matlab code
6.1 K_DTW Octave/Matlab code
6.2 Octave/Matlab code for reproducing the previous experiment

1 introduction

We introduce in this short presentation a regularized version of the Dynamic Time Warping (DTW) distance, that we call K_DTW. K_DTW is in fact a similarity measure constructed from DTW with the property that K_DTW(.,.) is a positive definite kernel (homogeneous to an inner product in the so-called Reproducing Kernel Hilbert Space). Following earlier work by Cuturi & al. [1], namely the so-called Global Alignment kernel (GA-kernel), the derivation of K_DTW is detailed in Marteau & Gibet 2014 [4]. K_DTW is a convolution kernel as defined in [2]. After giving a recursive definition of K_DTW, we present hereinafter a small experiment to highlight the benefit of a positive definite time elastic kernel to process time series data. Some code is provided (as is) by the end of the presentation to make it easier to reproduce some of our results.

2 Definition

The recursive definition of K_DTW kernel as defined in [4] is as follows:

_DTW(X_p,Y _q) = K_DTW^xy(X_p,Y _q) + K_DTW^xx(X_p,Y _q)

where

K_DTW^xy(X_p,Y _q) = β ⋅ e^{-d_E²(x(p),y(q))∕σ}

∑

( xy
{ h (p - 1, q)KDT W (xyXp- 1,Yq)
h (p - 1, q - 1 )K DT W (Xp- 1,Yq -1)
( h (p,q - 1)KxDyT W (Xp,Yq -1)

and

K_DTW^xx(X_p,Y _q) = β⋅

∑

( xx -d2(x(p),y(p))∕σ
{ h (p - 1, q)KDT W (Xp- 1,Yq)e E 2
Δp,qh (p,q)KxxDTW (Xp- 1,Yq-1)e-dE(x(p),y(q))∕σ
( h (p,q - 1)Kxx (Xp,Yq -1)e-d2E(x(q),y(q))∕σ
DT W

(1)

where

Δ_p,q is the Kronecker’s symbol,
h is a symmetric binary non negative function, usually in {0, 1}, used to define a symmetric corridor around the main diagonal to limit the "time elasticity" of the kernel,
σ ∈ ℝ⁺ is a bandwidth parameter which weights the local contributions, i.e. the distances between locally aligned positions,
β ∈ [1∕3, 1] is a normalization factor. Given an order of magnitude for time series lengths, β and σ are dependent. Namely if σ tends to ∞ (is large comparatively to the variance of the time series samples), then β will have to tend to . If σ tends to 0 (is small comparatively to the variance of the time series samples) then β will have to tend to 1. By default, β = , which prevents the "infinity explosion" of the kernel.
d_E(.,.) is the Euclidean distance defined on ℝ^k (or any negative conditionally definite kernel define on ℝ^k).

3 Algorithmic complexity

The recursive definition given above shows that the algorithmic complexity of K_DTW is O(N²), where N is the length of the longest of the two time series given in entry, and when no corridor size is specified. This complexity drop down to O(C.N) when a symmetric corridor of size C is exploited.

4 Interpretation

The main idea leading to the construction of positive definite kernels from a given elastic distance used to match two time series is simply the following: instead of keeping only one of the best alignment paths, the new kernel will try to sum up the costs of all the existing sub-sequence alignment paths with some weighting factor that will favor good alignments while penalizing bad alignments. In addition, this weighting factor can be optimized. Thus, basically we replace the min (or max) operator by a summation operator and introduce a symmetric corridor function (the function h in the recursive equation above) to optionally limit the summation and the computational cost. Then we add a regularizing recursive term (K^xx) such that the proof of the positive definiteness property can be understood as a direct consequence of the Haussler’s convolution theorem [2]. Please see [4] for more details.

5 Simple Kernel PCA experiment with Euclidean distance, DTW and K_DTW

5.1 Datasets

For the first experiment we use the Cylinder(C) Bell(B) Funel(F) TRAIN data set as available at the UCR Time Series Classification-Clustering Datasets repository [3]. The dataset contains 30 time series which belong to the three categories (C, B, F) and take the forms presented in figure 1.

Figure 1: Cylinder (red), Bell (green) and Funnel (blue) function shapes

The second experiment concerns the Gun_Point TRAIN dataset that also belongs to the UCR time series dataset repository. It contains two classes and 50 time series. For more details please see [3].

5.2 Kernel PCA

For the two datasets, a KPCA [5] is performed respectively on the normalized exp(-d_E(.,.)²∕20) kernel matrix (left subfigure, where d_E stands for the Euclidean distance), the exp(-DTW(.,.)²∕20) kernel matrix (center subfigure) and the K_DTW(.,.) kernel matrix (right subfigure, with σ = 15 for the CBF dataset and σ = 1 for the Gun_Point dataset). The Gaussian like kernel is not required for K_DTW since it is somehow already in use into the computation of the local kernels (c.f. the definition above).

All the times series are then projected on the first three eigen vectors obtained by KPCA as presented on figures 2 and 3.

Figure 2: Euclidean Gaussian kernel (left), DTW Gaussian kernel (center), K_DTW kernel (right) for the CBF dataset

Figure 3: Euclidean Gaussian kernel (left), DTW Gaussian kernel (center), K_DTW kernel (right) for the Gun_Point dataset

Note that, although the DTW(.,.) matrix is not definite in general, exp(DTW(.,.)²∕20 matrix is definite on the CBF dataset but not on the Gun_Point dataset. This may explain the good clustering obtained by the Gaussian DTW kernel on CBF data set and its drawback on the Gun_Point dataset for which KPCA is not well funded. By contrast, these gram matrices are always positive definite for the Euclidean and K_DTW kernels. Figure 4 presents the normalized spectra of the gram matrices for these kernels to give some highlighting about the dimensionality reduction that can be achieved with each kernel while maintaining a good separability between the classes.

Figure 4: Normalized spectra of the gram matrices evaluated on CBF (left), on Gun_Point (right)

The experiment shows that, although The DTW Gaussian kernel works particularly well on the CBF dataset, K_DTW achieves a better natural clustering in average than the Euclidean or DTW Gaussian kernels. K_DTW seems to act as a compromise between Euclidean Distance and DTW distance substituting Gaussian kernel. Figure 5 below shows some time series belonging to different classes that are poorly separated by the K_DTW kernel. For the Gun_Point dataset the two presented instances seem very close. For the CBF dataset this is likely due to a loss of selectivity induced by the summation operator of K_DTW.

Figure 5: Analysis of some time series poorly separated by K_DTW for the CBF and Gun_Point datasets

6 Octave/Matlab code

6.1 K_DTW Octave/Matlab code

The Octave/Matlab code (without corridor) for evaluating the K_DTW similarity (pd kernel) between two time series is available here.

6.2 Octave/Matlab code for reproducing the previous experiment

The Octave/Matlab code (without corridor) used to evaluate the KPCA on the Euclidean distance, DTW and K_DTW kernel matrices is available here.

References

[1] M. Cuturi, J.-P. Vert, Ø. Birkenes, and Tomoko Matsui. A kernel for time series based on global alignments. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, volume 2, pages II–413–II–416, April 2007.

[2] David Haussler. Convolution kernels on discrete structures. Technical Report UCS-CRL-99-10, University of California at Santa Cruz, Santa Cruz, CA, USA, 1999.

[3] E. J. Keogh, X. Xi, L. Wei, and C.A. Ratanamahatana. The UCR time series classification-clustering datasets, 2006. http://wwwcs.ucr.edu/ eamonn/time_series_data/.

[4] Pierre-François Marteau and Sylvie Gibet. On Recursive Edit Distance Kernels with Application to Time Series Classification. IEEE Transactions on Neural Networks and Learning Systems, pages 1–14, June 2014. 14 pages.

[5] Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput., 10(5):1299–1319, July 1998.