Letting words flow: a brief introduction to piecewise-constant signals

A (finite piecewise-constant dense-time) signal over alphabet $\Sigma$ is a left-continuous function $f \colon (0, \delta] \to \Sigma$ with $\delta \in \R_{\geq 0}$ and having finitely many discontinuities. The duration $\delta$ of $f$ is denoted $|f|$ . Every signal can be summarized by a unique sequence $\mathrm{seq}(f) = (\sigma_1, \delta_1) \cdots (\sigma_k, \delta_k) \in (\Sigma \times \R_{>0})^*$ , where $\sigma_i \neq \sigma_{i+1}$ for all $i < k$ , which we write more compactly as $\sigma_1^{\delta_1} \cdots \sigma_k^{\delta_k}$ .

For example, $a^{1.5} b^1 a^{2.5}$ represents the signal $f \colon (0, 5] \to \Sigma$ given by

(0, 1.5] \mapsto a,\ (1.5, 2.5] \mapsto b,\ (2.5, 5] \mapsto a.

This signal can be visualized as follows:

Given a signal $f$ with $\mathrm{seq}(f) = \sigma_1^{\delta_1} \cdots \sigma_k^{\delta_k}$ , the $a$ -duration of $f$ is $|f|_a = \sum_{i : \sigma_i = a} \delta_i$ . Moreover, we define the number of pieces of $f$ as $\#f = k$ . For our previous example, we have $|f|_a = 4$ , $|f|_b = 1$ and $\# f = 3$ .

Concatenation

We write $\mathcal{S}(\Sigma)$ to denote the set of all signals, and we write $\varepsilon$ to denote the unique signal of duration $0$ . Given $f, g \in \mathcal{S}(\Sigma)$ , we write $f \cdot g$ to denote the signal $h \colon (0, |f| + |g|]$ given by

h(\tau) = \begin{cases} f(\tau) & \text{if } \tau \leq |f|, \\ g(\tau - |f|) & \text{otherwise}. \end{cases}

Note that $\mathrm{seq}(f \cdot g)$ differs from $\mathrm{seq}(f) \cdot \mathrm{seq}(g)$ when the last piece of $f$ and the first piece of $g$ output the same letter. For example, if $\mathrm{seq}(f) = a^{1.5} b^1 a^{2.5}$ and $\mathrm{seq}(g) = a^{0.5} b^2$ , then $\mathrm{seq}(f \cdot g) = a^{1.5} b^1 a^3 b^2$ :

In general, we have

$|f \cdot g| = |f| + |g|$ ,
$|f \cdot g|_\sigma = |f|_\sigma + |g|_\sigma$ ,
$\#f + \#g - 1 \leq \#(f \cdot g) \leq \#f + \#g$ .

Algebraic view and relation with words

The set of signals $\mathcal{S}(\Sigma)$ equipped with $\cdot$ forms a monoid with the empty signal $\varepsilon$ as the identity. We consider $\R_{\geq 0}$ as a monoid under addition with $0$ as the identity.

Let $\equiv$ be the relation over $(\Sigma \times \R_{\geq 0})^*$ defined by $u \equiv v$ iff $u$ and $v$ are equal after reduction using the rewriting rules $(\sigma, \delta) (\sigma, \delta') \to (\sigma, \delta + \delta')$ and $(\sigma, 0) \to \varepsilon$ . For example, $a^{0.5} a^1 b^{0.5} b^{0.5} a^1 \equiv a^{1.5} b^1 a^1$ :

Note that ${\equiv}$ is a congruence and that $\mathcal{S}(\Sigma) \cong (\Sigma \times \R_{\geq 0})^*/{\equiv}$ due to the isomorphism $f \mapsto [\mathrm{seq}(f)]$ . In more algebraic terms, the monoid $\mathcal{S}(\Sigma)$ can be seen as the free product $\R_{\geq 0} \ast \cdots \ast \R_{\geq 0}$ (with $|\Sigma|$ copies).

It is the case that $\Sigma^* \cong (\Sigma \times \N)^*/{\equiv}$ due to the isomorphism $\sigma_1 \cdots \sigma_n \mapsto [\sigma_1^1 \cdots \sigma_n^1]$ . Furthermore, concatenation and durations behave as expected for signals representing standard finite words. For example, consider the signal $f \in \mathcal{S}(\Sigma)$ with $\mathrm{seq}(f) = a^1 b^1 a^2$ . It corresponds to the word $w = abaa \in \Sigma^*$ . We have $|f| = |w| = 4$ , $|f|_a = |w|_a = 3$ , $|f|_b = |w|_b = 1$ , and $f(i) = w(i)$ for each $i \in [1..4]$ .

Yet another way to present $\Sigma^*$ is by considering words as finite discrete-time signals, e.g. $w = abaa$ can be seen as the signal $f \colon [1..4] \to \Sigma$ given by $[1..2] \mapsto a$ , $[3..3] \mapsto b$ and $[4..4] \mapsto a$ .

Note that $\Sigma^*$ is not isomorphic to $\mathcal{S}(\Sigma)$ , even if we used rational durations or changed the alphabet.

Click for a proof.

For the sake of contradiction, suppose there exists an isomorphism $f \colon \mathcal{S}(\Sigma) \to \Gamma^*$ . Let $f^{-1}(a) = \sigma_1^{\delta_1} \cdots \sigma_k^{\delta_k}$ be in reduced form, i.e. with $\sigma_i \neq \sigma_{i+1}$ and $\delta_i > 0$ for each $i$ . We have

\begin{aligned} a &= f(f^{-1}(a)) \\ &= f(\sigma_1^{\delta_1} \cdots \sigma_k^{\delta_k}) \\ &= f(\sigma_1^{\delta_1}) \cdots f(\sigma_k^{\delta_k}). \end{aligned}

Thus, all terms but one are equal to $\varepsilon$ , i.e. $a = f(\sigma_i^{\delta_i})$ for some $i \in [1..k]$ . Note that

f(\sigma_i^{\delta_i/2}) \cdot f(\sigma_i^{\delta_i/2}) = f(\sigma_i^{\delta_i/2} \cdot \sigma_i^{\delta_i/2}) = f(\sigma_i^{\delta_i}) = a.

Consequently, $ww = a$ for some $w \in \Gamma^*$ , which is impossible.

Relation with timed words

Timed words are similar to signals and have been extensively studied by the formal verification community. They can also be seen as sequences from $(\Sigma \times \R_{\geq 0})^*$ , but where zero durations matter. For example, the timed word $a^1 b^0 a^0 a^{1.5} b^{0.5}$ can be interpreted as follows:

After one unit of time, event $aba$ occurred,
After another 1.5 unit of time, event $a$ occurred,
After another 0.5 unit of time, event $b$ occurred.

In that context, $a^1 b^0 a^0 a^{1.5} b^{0.5}$ is not equivalent to $a^{2.5} b^{0.5}$ . In fact, timed words are usually defined as sequences $(\sigma_1, \tau_1) \cdots (\sigma_k, \tau_k) \in (\Sigma \times \R_{\geq 0})^*$ where $\tau_1 \leq \cdots \leq \tau_k$ are timestamps rather than durations. In that setting, our previous example is written as $(a, 1) (b, 1) (a, 1) (a, 2.5) (b, 3)$ . For a more formal algebraic treatment of timed words and their relation with signals, see the work of Asarin, Caspi and Maler¹.

Real-time regular languages

A subset $L \subseteq \mathcal{S}(\Sigma)$ is called a (signal) language. Let us introduce a counterpart to regular languages². A real-time regular expression is obtained from this grammar, where $\sigma$ ranges over $\Sigma$ , and $I$ ranges over rational-bounded intervals:

r ::= \emptyset \mid \varepsilon \mid \sigma_I \mid r + r \mid r \cdot r \mid r^*

For example, $a_{(0, \infty)} \cdot (b_{(0, 2]} \cdot a_{(0, 1]})^*$ denotes the language of signals of the form $a^\delta b^{\beta_1} a^{\alpha_1} \cdots b^{\beta_k} a^{\alpha_k}$ where $k \in \N$ , $\delta \in \R_{> 0}$ , $\alpha_i \in (0, 1]$ and $\beta_i \in (0, 2]$ . Formally, the semantics is defined inductively by

\begin{aligned} \llbracket \emptyset \rrbracket &= \emptyset, \\ % \llbracket \varepsilon \rrbracket &= \{\varepsilon\}, \\ % \llbracket \sigma_I \rrbracket &= \{(0, \delta] \mapsto \sigma : \delta \in \R_{> 0}\}, \\ % \llbracket r + r' \rrbracket &= \llbracket r \rrbracket \cup \llbracket r' \rrbracket, \\ % \llbracket r \cdot r' \rrbracket &= \{w \cdot w' : w \in \llbracket r \rrbracket, w' \in \llbracket r' \rrbracket\}, \\ % \llbracket r^* \rrbracket &= \llbracket r^0 \rrbracket \cup \llbracket r^1 \rrbracket \cup \llbracket r^2 \rrbracket \cup \cdots. \end{aligned}

A real-time regular language is a signal language $L \subseteq \mathcal{S}(\Sigma)$ such that $L = \llbracket r \rrbracket$ for some real-time regular expression. As in the classical setting, we can further provide an automaton model for these languages.

A real-time automaton is a tuple $\mathcal{A} = (Q, \Sigma, \Delta, Q_0, F, \lambda, \iota)$ where

$Q$ is finite set of states;
$\Sigma$ is a finite alphabet;
$\Delta \subseteq Q \times Q$ is the relation transition;
$Q_0, F \subseteq Q$ are the initial and final states;
$\lambda \colon Q \to \Sigma$ is the state labeling function;
$\iota$ is the time labeling function that maps each state to a rational-bounded interval.

An execution is a sequence $(q_1, \delta_1) \cdots (q_n, \delta_n)$ such that $\delta_i \in \iota(q_i)$ for all $i \in [1..n]$ , and $(q_i, q_{i+1}) \in \Delta$ for all $i \in [1..n-1]$ . If $q_1 \in Q_0$ and $q_n \in F$ , then we say that the execution accepts the following signal:

\lambda(q_1)^{\delta_1} \cdots \lambda(q_n)^{\delta_n}.

An execution is stuttering if $\lambda(q_i) = \lambda(q_{i+1})$ for some $i \in [1..n-1]$ . Note that an execution accepting a signal $w$ is non-stuttering iff $\# w = n$ .

For example, the language $\llbracket a_{(0, \infty)} \cdot (b_{(0, 2]} \cdot a_{(0, 1]})^* \rrbracket$ is accepted by this real-time automaton:

Real-timed automata and real-time regular expressions have the same expressive power.

Stuttering and determinism

We say that an automaton is stuttering-free if

$0 \in \iota(q_\varepsilon)$ for at most one state $q_\varepsilon$ , in which case we have $q_\varepsilon \in Q_0$ , $\iota(q_\varepsilon) = [0, 0]$ and $q_\varepsilon$ is not connected to any other state;
$\lambda(p) \neq \lambda(q)$ for each $(p, q) \in \Delta$ .

We say that an automaton is state-deterministic if the following holds for all states $q \neq r$ :

$q, r \in Q_0$ implies $\lambda(q) \neq \lambda(r)$ or $\iota(q) \cap \iota(r) = \emptyset$ ;
$(p, q), (p, r) \in \Delta$ implies $\lambda(p) \neq \lambda(q)$ or $\iota(p) \cap \iota(q) = \emptyset$ .

An automaton is deterministic if it is both stuttering-free and state-deterministic. For example, the automaton depicted above is deterministic.

If we see (discrete) words as signals with integer durations, then any standard regular language can be accepted by a real-time automaton. For example, the language $\{\varepsilon, a, aa, \ldots\}$ can be seen as the signal language $L_\N = \{a \mapsto (0, n] : n \in \N\}$ which equals $\llbracket (a_{[1, 1]})^* \rrbracket$ . However, determinism does not suffice.

L_\N

Augmented automata

A Kleene constraint is a duration constraint obtained by combining rational-bounded intervals with union Minkowski sum, star and complementation. For example, the subset of $\R_{\geq 0}$ associated with the Kleene constraint $(0, 1] \cup ([2, 4) + [5, 5]^*)$ is

(0, 1] \cup [2, 4) \cup [7, 9) \cup [12, 14) \cup \cdots.

An augmented real-time automaton is a real-time automaton where $\iota(q)$ is a Kleene constraint rather than a single interval. For example, it is now trivial to provide a deterministic automaton for $L_{\N}$ :

In fact, augmented real-time automata allow us to retrieve good properties of classical automata.

Augmented real-time automata are closed under complementation.

The emptiness and universality problems are decidable for real-time regular languages.

The following models have the same expressiveness: real-timed automata, augmented real-timed automata, and deterministic real-timed automata.

L \subseteq \mathcal{S}(\Sigma)

From the above pumping lemma, we can show, e.g., that the language $L_{(0, 1]} = \{w \in \mathcal{S}(\{a, b\}) : 0 < |w| \leq 1\}$ is not real-time regular. Indeed, for the sake of contradiction suppose $L_{(0, 1]}$ is real-time regular. Let $n \in \N$ be the threshold given by the pumping lemma. Let $w = (a^{1/2n} b^{1/2n})^n$ . We have $|w| = 1$ and hence $w \in L$ . Moreover, $\# w = 2n$ , and hence we can obtain $w = xyz$ from the pumping lemma. Consequently, we have $xy^2z \in L_{(0, 1]}$ , which is a contradiction since $|y| > 0$ by $\# y > 1$ .

Timed regular languages

A shortcoming of real-time regular expressions is that they can only locally constrain the duration of letters. Thus, let us enrich them. A generalized timed regular expression is an expression derived from the following grammar, where $\sigma$ ranges over $\Sigma$ , and $I$ ranges over integer-bounded intervals:

r ::= \emptyset \mid \varepsilon \mid \underline{\sigma} \mid \langle r \rangle_I \mid r \cdot r \mid r^* \mid (r \lor r) \mid (r \land r)

The semantics of generalized timed regular expressions is defined inductively by

\begin{aligned} \llbracket \emptyset \rrbracket &= \emptyset, \\ % \llbracket \varepsilon \rrbracket &= \{\varepsilon\}, \\ % \llbracket \underline{\sigma} \rrbracket &= \{(0, \delta] \mapsto \sigma : \delta \in \R_{> 0}\}, \\ % \llbracket \langle r \rangle_I \rrbracket &= \{w \in \llbracket r \rrbracket : |w| \in I\}, \\ % \llbracket r \cdot r' \rrbracket &= \{w \cdot w' : w \in \llbracket r \rrbracket, w' \in \llbracket r' \rrbracket\}, \\ % \llbracket r^* \rrbracket &= \llbracket r^0 \rrbracket \cup \llbracket r^1 \rrbracket \cup \llbracket r^2 \rrbracket \cup \cdots, \\ % \llbracket r \lor r' \rrbracket &= \llbracket r \rrbracket \cup \llbracket r' \rrbracket, \\ % \llbracket r \land r' \rrbracket &= \llbracket r \rrbracket \cap \llbracket r' \rrbracket. \end{aligned}

When an expression has no conjunction, we say that it is a timed regular expression. For example, the timed regular expression $\langle \underline{a}\rangle_{(0, 2]} (\langle \underline{b} \cdot \underline{c} \rangle_{(0, 8]})^*$ describes the language of signals of the form $a^\alpha b^{\beta_1} c^{\gamma_1} \cdots b^{\beta_k} c^{\gamma_k}$ where $0 < \alpha \leq 2$ , $k \in \N$ and $0 < \beta_i + \gamma_i \leq 8$ for all $i \in [1..k]$ . The language $L_{(0, 1]}$ seen earlier can be expressed by the timed regular expression $\langle(\underline{a} \lor \underline{b})^*\rangle_{(0, 1]}$ .

Timed regular expressions are less expressive than generalized timed regular expressions. Indeed, it is known that $L = \{a^\delta b^{1 - \delta} c^\delta : \delta \in (0, 1)\}$ cannot be described by a timed regular expression³. Yet, it is described by the generalized timed regular expression $r = (\langle \underline{a} \cdot \underline{b}\rangle_{[1, 1]} \cdot \underline{c}) \land (\underline{a} \cdot \langle \underline{b} \cdot \underline{c}\rangle_{[1, 1]})$ .

Click for a proof.

$\Rightarrow$ ) Let $w = a^\delta b^{1 - \delta} c^\delta$ where $\delta \in (0, 1)$ . We have $|a^\delta b^{1-\delta}| = \delta + (1 - \delta) = 1$ and $|b^{1-\delta} c^\delta| = (1 - \delta) + \delta = 1$ . Therefore, $w \in \llbracket r \rrbracket$ .

$\Leftarrow$ ) Let $w \in \llbracket r \rrbracket$ . By definition, there exist $x, y, z, x', y', z' \in \R_{>0}$ such that $w = a^x b^y c^z$ , $x + y = 1$ , $w = a^{x'} b^{y'} c^{z'}$ and $y' + z' = 1$ . Since all letters are distinct, we must have $x = x'$ , $y = y'$ and $z = z'$ . Thus, $1 - x = y = y' = 1 - z' = 1 - z$ and hence $x = z$ . This means that $w = a^x b^{1 - x} c^x \in L$ .

Timed automata

Timed automata are a well-established formalism for describing languages of timed words. Let us explain, through an example, how they can also describe signal languages. Let us reconsider the timed regular expression $r = \langle \underline{a}\rangle_{(0, 2]} (\langle \underline{b} \cdot \underline{c} \rangle_{(0, 8]})^*$ . The language of $r$ is described by this timed automaton:

We start in the initial state $q_0$ at time $0$ . From there, time elapses at the same rate in clocks $x$ and $y$ . A transition labeled by “ $\varphi; \texttt{op}$ ” can be taken if constraint $\varphi$ holds in the current clock valuation, upon which $\texttt{op}$ is executed. Formally, $\varphi$ is a Boolean combination of integer-bounded interval constraints over clocks, and the operation resets a (possibly empty) subset of the clocks.

Consider an execution

(q_0, x_0, y_0) \to^{\delta_0} (q_1, x_1, y_1) \to^{\delta_1} \cdots \to^{\delta_{n-1}} (q_n, x_n, y_n),

where $\delta_i$ is the time elapsed in $q_i$ , and $(x_i, y_i)$ is the value of the two clocks. Let $\lambda \colon Q \to \Sigma$ label states. If $x_0 = y_0 = 0$ and state $q_n$ is accepting, then the automaton accepts the signal $f \in \mathcal{S}(\Sigma)$ such that

f = \lambda(q_0)^{\delta_0} \cdots \lambda(q_{n-1})^{\delta_{n-1}}.

Recall the classical translation from standard regular expressions to non-deterministic finite automata. A similar inductive translation holds in the timed setting. For example, for $\langle r \rangle_I$ , we construct a timed automaton for $r$ ; add a new clock $z$ ; and add the constraint “ $z \in I$ ” to each transition leading to an accepting state.

Any generalized timed regular language is accepted by a timed automaton.

It can further be shown that (i) the signal language of a one-clock timed automaton is expressible by a timed regular expression; and (ii) the signal language $L$ of a timed automaton can be written as the renaming of the intersection of the language of finitely many one-clock timed automata. By “renaming”, we mean that $L \subseteq \mathcal{S}(\Sigma)$ equals $h(L')$ , where $L' \subseteq \mathcal{S}(\Sigma')$ and $h \colon \Sigma' \to \Sigma$ is a letter-to-letter morphism. Altogether, we obtain the following characterization for signal languages:

A signal language is generalized timed regular iff it is the renaming of the language of a timed automaton.

The same theorem holds for timed words. Herrmann proved that renaming is necessary in that setting, and claimed that the proof can be adapted to signals⁴.

Relation with real-time regular languages

Real-time regular languages form a strict subset of languages recognized by one-clock timed automata. Indeed, a real-timed automaton can be seen as a one-clock timed automaton where the single clock is reset at each transition. In particular, the language $L_{(0, 1]} = \{w \in \mathcal{S}(\{a, b\}) : 0 < |w| \leq 1\}$ , which is not real-time regular, can be recognized by a one-clock timed automaton:

See Section 2 of Eugene Asarin, Paul Caspi and Oded Maler. Timed regular expressions. Journal of the ACM, vol. 49, no. 2, 2002. ↩︎
We follow quite closely the presentation of Section 3.1 of Catalin Dima. An algebraic theory of real-time formal languages. Université Joseph-Fourier – Grenoble I, 2001. ↩︎
See Section 7 of Eugene Asarin, Paul Caspi, Oded Maler. A Kleene Theorem for Timed Automata. Proc. 12th Annual IEEE Symposium on Logic in Computer Science (LICS), 1997. ↩︎
Philippe Herrmann. Renaming Is Necessary in Timed Regular Expressions. Proc. 19th Conference on Foundations of Software Technology and Theoretical Computer Science Annual (FSTTCS), 1999. ↩︎