Simplified Regular Expressions



As an alternative to automata in Timbuk specifications Timbuk provides two formats for Tree Regular Expressions. The standard one uses the classical syntax of Regular Tree Expression of TATA.

Regexp


Ops append:2 rev:1 nil:0 cons:2 A:0 B:0 c:0
Vars X Y Z U Xs

TRS R1
append(nil,X) -> X
append(cons(X,Y), Z) -> cons(X, append(Y,Z))
rev(nil) -> nil
rev(cons(X,Y)) -> append(rev(Y), cons(X,nil))

Regexp A0
rev(cons(A,((cons(A,c) * c) . c ((cons(B,c) * c) . c nil))))| nil
The language defined by A0 is the set of terms rev(l) where l is any list of As and Bs where As are all before the Bs, and there is at least one A and one B. Note that the constant c used for language replacement with operators * c and . c has to be defined in the Ops section. See TATA for explanations about the semantics of this language.  For defining simple tree languages, this format is too complex.

Thus, we defined a specific format called Simplified Regular Expression (denoted by SRegexp). Unlike the standard format, this one is incomplete (some languages do not have simplified regular expression) but is very well adapted to define simple regular tree languages. For instance, the same language can be defined as follows.

SRegexp (simplified regular expressions)


Ops append:2 rev:1 nil:0 cons:2 A:0 B:0 c:0
Vars X Y Z U Xs

TRS R1
append(nil,X) -> X
append(cons(X,Y), Z) -> cons(X, append(Y,Z))
rev(nil) -> nil
rev(cons(X,Y)) -> append(rev(Y), cons(X,nil))

SRegexp A0
rev(cons(A,[cons(A,*|cons(B,[cons(B,*|nil)]))]))
Those regular expressions are defined using only 3 operators:
For instance the SRegexp cons(A,*|nil) defines the language {nil, cons(A,nil), cons(A,cons(A,nil)),...} i.e. lists of As. We can repeat the pattern cons(A, _) as long as possible and terminate by nil.  The SRegexp cons(A,cons(A,*|nil)) defines the language {nil, cons(A,cons(A,nil)), cons(A,cons(A,cons(A,cons(A,nil)))),...} i.e. lists of As of even length. The SRegexp cons(A,[cons(A,*|nil)]) defines the language {cons(A,nil), cons(A,cons(A,nil)),...} i.e. non empty lists of As. Here the brackets define the scope of the pattern to repeat which is only cons(A, _). Finally, the SRegexp rev(cons(A,[cons(A,*|cons(B,[cons(B,*|nil)]))])) thus defines , the language of terms rev(l) where l is any list of As and Bs where As are all before the Bs, and there is at least one A and one B. It is possible to iterate a pattern at several positions at the same time, but with no difference between the two iterations, this is why this language is incomplete. For instance, with operators Node:3 Leaf:0 zero:0 s:1 we can define the language of binary trees of natural numbers as follows:


SRegexp A1
Node([s(*|zero)],*|Leaf,*|Leaf)