This web page is a compagnon page for the article "NegPSpan: efficient extraction of negative sequential patterns with embedding constraints" published in the Data Mining Journal (DMKD). The web site contains:
Mining frequent sequential patterns consists in extracting recurrence behaviors, modeled as patterns, in a large dataset of sequences. Such pattern informs about was occurs in a sequence, ie what does happen. With frequent negative sequential patterns (NSP), behaviors are described by both events that happens and some of them that are absent. Few approaches have been proposed to mine such kind of behaviors but does not necessarily clearly state that extracted NSP sets differ. This article defines some syntaxes and semantics of negative patterns to position the proposed approach NegPSpan wrt to the state of the art approach eNSP. The NegPSpan approach extracts NSP using a PrefixSpan depth-first scheme and enabling maxgap constraints while eNSP can not (due to its semantics). Our experiments on synthetic and real datasets show that NegPSpan extracts meaningful NSP and it processes larger datasets thanks to significantly lower memory requirements and better computing times than eNSP.
NegPSpan is a complete algorithm and it uses an anti-monotonicity property of the support measure. It is usually stated that such properties does not hold for negative sequential patterns, but we identifies some semantics and partial orders for which it is possible to have such property that enables to reuse decades of research on frequent pattern mining, including the notion of closed-patterns that could be redefined for negative sequentials patterns.
This investigation of the semantics of negative sequential patterns raised interesting questions about the meaning of the negation in a sequential pattern and how it is intuitivelly understood. We are interested in understanding your understanding of the semantic of negative sequential patterns. You can fill the survey (5 questions, no need to be an expert in pattern mining):
The articles we pusblished on negative sequential patterns are the following (with video presentations of the articles):
The archive below contains source code in C++ of algorithms that are presented and evaluated in the article: 1) implementation of our algorithm NegPSpan et 2) implementation of eNSP of Dong et al. This source code requires a recent C++ compiler (compatible with C++11 norms). All details about compilation and execution of the code is given in the README file (we provide a Makefile).
Our experiments use synthetic datasets for which, we can control the main parameters. In the following, we provide the synthetiser itself but also the datasets that have been generated to draw curves of the article. All datasets are in the IBM format (one line provide the description of an itemset detailing, in that order, the sequence id, the itemset timestamp, the size of the itemset and the list of items (integers) in the itemset.
The script to reproduce the experiments of the DMKD article on synthetic data (and the Figures) are here benchmark.zip.
Classical real-life datasets for sequential pattern mining tasks can be found here. We simply changed their format to fit our software requirements.
Due to their intrusive nature, care pathway data we used for real experiments can not be provided.
This section is an experimental feature using the AllGo platform (based on Docker and REST technologies) to provide an access to a functional executable online, without any installation requirement. This platform is maintained by SED/Inria/IRISA. Note that we do not track the executions (eg their IP origin and so on) neither collect datasets you are evaluating with this tool.
This is an experimental tool: you may have trouble if several people use it at the same time. I f s/he is interested on evaluating our algorithm, your are strongly invited to download the source code on your own computer.
The purpose of this section is to enable readers to run example of our papers and even to test their own new example. The focus is not computation times. We are interested in enabling readers to better understand the semantic of negative patterns (depending on semantic choices and algorithms).
Usage details are given at the bottom of the page.
To make run the algorithm, the main steps are the following: