Simplified Regular Expressions
As an alternative to automata in Timbuk
specifications Timbuk provides two formats for Tree Regular
Expressions. The standard one uses the classical syntax of Regular
Tree Expression of TATA.
Regexp
Ops append:2 rev:1 nil:0 cons:2 A:0 B:0 c:0
Vars X Y Z U Xs
TRS R1
append(nil,X) -> X
append(cons(X,Y), Z) -> cons(X, append(Y,Z))
rev(nil) -> nil
rev(cons(X,Y)) -> append(rev(Y), cons(X,nil))
Regexp A0
rev(cons(A,((cons(A,c) * c) . c ((cons(B,c) * c) . c nil))))| nil
The language defined by A0 is the set of terms rev(l) where l is any
list of As and Bs where As are all before the Bs, and there is at
least one A and one B. Note that the constant c used for
language replacement with operators * c and . c
has to be defined in the Ops section. See TATA for explanations
about the semantics of this language. For defining simple tree
languages, this format is too complex.
Thus, we defined a specific format called Simplified Regular
Expression (denoted by SRegexp). Unlike the standard format, this
one is incomplete (some languages do not have simplified regular
expression) but is very well adapted to define simple regular tree
languages. For instance, the same language can be defined as
follows.
SRegexp (simplified regular expressions)
Ops append:2 rev:1 nil:0 cons:2 A:0 B:0 c:0
Vars X Y Z U Xs
TRS R1
append(nil,X) -> X
append(cons(X,Y), Z) -> cons(X, append(Y,Z))
rev(nil) -> nil
rev(cons(X,Y)) -> append(rev(Y), cons(X,nil))
SRegexp A0
rev(cons(A,[cons(A,*|cons(B,[cons(B,*|nil)]))]))
Those regular expressions are defined using only 3 operators:
- | to build the union of two languages
- * to iterate a pattern
- The (optional) brackets [] to define the scope of
the embedded *
For instance the SRegexp cons(A,*|nil) defines the
language {nil, cons(A,nil), cons(A,cons(A,nil)),...} i.e.
lists of As. We can repeat the pattern cons(A, _) as long
as possible and terminate by nil. The SRegexp cons(A,cons(A,*|nil))
defines the language {nil, cons(A,cons(A,nil)),
cons(A,cons(A,cons(A,cons(A,nil)))),...} i.e. lists of As of
even length. The SRegexp cons(A,[cons(A,*|nil)])
defines the language {cons(A,nil), cons(A,cons(A,nil)),...}
i.e. non empty lists of As. Here the brackets define the scope of
the pattern to repeat which is only cons(A, _).
Finally, the SRegexp rev(cons(A,[cons(A,*|cons(B,[cons(B,*|nil)]))]))
thus defines , the language of terms rev(l) where l is any list
of As and Bs where As are all before the Bs, and there is at least
one A and one B. It is possible to iterate a pattern at several
positions at the same time, but with no difference between the two
iterations, this is why this language is incomplete. For instance,
with operators Node:3 Leaf:0 zero:0 s:1 we can define the
language of binary trees of natural numbers as follows:
SRegexp A1
Node([s(*|zero)],*|Leaf,*|Leaf)