Important formal proofs
Here are proofs that ensure that IDeSyDe (or similar DSI tools) have as much correct behavior as possible; it is also a place to give formal performance guarantees when possible.
Bounds for non-exploration procedures
Make sure to read and understand the key DSI concepts and definitions before proceeding.
We discuss the asymptotic bounds for each non-exploration procedure regarding “computational time”. We present the bounds as theorems and discuss further how they are directly bounded by the inputs of each procedure.
Theorem 1. For a set of explorers \(\mathcal{E}\) and a set of identified design models \(\mathcal{X}\), let \(\kappa_{\mathcal{E}}\) be the worst complexity of evaluating the exploration properties for any possible bid between \(\mathcal{E}\) and \(\mathcal{X}\). Then, the bidding has complexity bound \(\Theta(|\mathcal{E}||\mathcal{X}|\kappa_{\mathcal{E}})\).
Proof. Since the computation of \(\mathcal{B}\) requires the product composition between explorers and decision models, The least amount of exploration properties computed is \(|\mathcal{E}||\mathcal{X}|\). Although each estimation may have its own complexity, we know by definition that they are upper bounded by \(\kappa_{\mathcal{E}}\), proving that the bidding procedure has the complexity bound \(\mathcal{O}(|\mathcal{E}||\mathcal{X}|\kappa_{\mathcal{E}})\).
However, since there is no early termination, as all products must be computed for \(\mathcal{B}\), the upper bound is also its lower bound, proving the theorem. QED
Theorem 2. For a set of design models \(\mathcal{M}\), a set of explored decision models \(\mathcal{Y}\) and set of reverse identification rules \(R_{\mathit{rev}}\), let \(\kappa_{R_{\mathit{rev}}}\) be the worst complexity of evaluating any reverse identification rule in \(R_{\mathit{rev}}\) with \(\mathcal{M}\) and \(\mathcal{Y}\). Then, the identification procedure has complexity bound \(\Theta(|R_{\mathit{rev}}|\kappa_{R_{\mathit{rev}}})\).
Proof. The reverse identification procedure requires only computing \(\mathcal{M}_{\rho}'\), which is determined by applying _once_all \(r \in R_{\mathit{rev}}\) to \(\mathcal{Y}\). Therefore, the amount of reverse identification rules evaluated is \(|R_{\mathit{rev}}|\). Since \(\kappa_{R_{\mathit{rev}}}\) is the upper bound of complexity for these rules by definition, we have that \(\mathcal{M}_{\rho}' = \rho(\mathcal{Y}, \mathcal{M})\) has the complexity bound \(\mathcal{O}(|R_{\mathit{rev}}|\kappa_{R_{\mathit{rev}}})\).
However, since there is no early termination, all reverse identification rules must be evaluated; thus, the upper and lower bound are equal, proving the theorem. QED
Theorem 3. For a set of design models \(\mathcal{M}\) and set of identification rules \(R_{\mathit{ident}}\), let \(\kappa_{R_{\mathit{ident}}}\) be the worst complexity of evaluating any identification rule in \(R_{\mathit{ident}}\) with \(\mathcal{M}\) and any set of decision models \(\mathcal{X}_i\). Then, the identification procedure has complexity bounds \(\Omega(|R_{\mathit{ident}}|\kappa_{R_{\mathit{ident}}})\) and \(\mathcal{O}(|\mathit{elems}(\mathcal{M})||R_{\mathit{ident}}|\kappa_{R_{\mathit{ident}}})\)
Proof. Without loss of generality, consider that the identification procedure is at step \(i\), with a set of identified decision models \(\mathcal{X}_i\). Computing \(\mathcal{X}_{i+1}\) requires evaluating \(|R_{\mathit{ident}}|\) identification rules with \(\mathcal{M}\) and \(\mathcal{X}_i\). Although every identification rule has its own complexity, which might depend on both \(\mathcal{M}\) and \(\mathcal{X}_i\), we know by definition that they are upper bounded by \(\kappa_{R_{\mathit{ident}}}\). Thus, the step from \(i\) to \(i+1\) has a complexity of \(|R_{\mathit{ident}}|\kappa_{R_{\mathit{ident}}}\).
After computation of \(\mathcal{X}_{i+1}\), two conditions are possible: either:
- \(\mathit{part}(\mathcal{X}_{i+1}) = \mathit{part}(\mathcal{X}_{i})\) or
- \(\mathit{part}(\mathcal{X}_{i+1}) \neq \mathit{part}(\mathcal{X}_{i})\).
If \(\mathit{part}(\mathcal{X}_{i+1}) = \mathit{part}(\mathcal{X}_{i})\), then the identification procedure has arrived at a fix-point and terminates. If \(i\) is the first step of the identification procedure, it has been completed with one step, which proves that the complexity lower bound is \(|R_{\mathit{ident}}|\kappa_{R_{\mathit{ident}}}\).
Otherwise, if \(\mathit{part}(\mathcal{X}_{i+1}) \neq \mathit{part}(\mathcal{X}_{i})\), there is at least one element \(x \in \mathit{part}(\mathcal{X}_{i+1})\) which \(x \notin \mathit{part}(\mathcal{X}_{i})\). By the monotonicity of identification rules, we know that \(x \in \mathit{elems}(\mathcal{M})\) by definition. The second condition can happen at most \(|\mathit{elems}(\mathcal{M})|\) times during the procedure; otherwise, we would need that \(x \in \mathit{part}(\mathcal{X}_{i+1})\) and \(x \notin \mathit{elems}(\mathcal{M})\), which is impossible. Therefore, the complexity of the identification procedure is upper bounded by \(|\mathit{elems}(\mathcal{M})||R_{\mathit{ident}}|\kappa_{R_{\mathit{ident}}}\). QED
The three theorems explicitly show the complexity bounds for each procedure but also introduce a constant \(\kappa\) for each. These were necessary analytical devices to carry out the proofs, but they may be refined further within the DSI scope.
First, exploration properties are estimates of the actual DSE activity; they should not be computationally expensive algorithms. In other words, \(\kappa_{\mathcal{E}}\) is ideally a number dominated by the initial number of elements in the design models. Specifically, we argue that correct estimation techniques used in the bidding procedures will have a complexity \(\kappa_{\mathcal{E}} = \mathcal{O}(|\mathit{elems}(\mathcal{M})|^a)\) where \(a\) is a fixed positive integer.
Second, because reverse identification rules are conceptually producing explored design model that enriches the input design models, we argue that correct reverse identification rules cannot arbitrarily extrapolate the number of initial elements. That is to say, like estimation techniques, reverse identification rules are dominated by the initial number of elements in the design models: \(\kappa_{R_{\mathit{rev}}} = \mathcal{O}(|\mathit{elems}(\mathcal{M})|^b)\) where b is a fixed positive integer.
The same logic applies to identification rules: they should not be computationally expensive algorithms; rather, they ideally prepare the parameters and associated functions for expensive algorithms. This does not forbid identification rules akin to “pre-processing” as long as they are acceptably inexpensive, i.e. polynomial in the number of elements of the input models. Mathematically, this means that \(\kappa_{R_{\mathit{ident}}} = \mathcal{O}(|\mathit{elems}(\mathcal{M})|^c)\) where \(c\) is a fixed positive integer.
After this discussion, we can enunciate three corollaries of the presented bound theorems with the assumptions of the foregoing discussion.
Corollary 4. For a set of explorers \(\mathcal{E}\) and a set of identified design models \(\mathcal{X}\), the bidding has complexity bound \(\Theta(|\mathcal{E}||\mathcal{X}||\mathit{elems}(\mathcal{M})|^a)\) with \(a \in \mathbb{N}\).
Corollary 5. For a set of design models \(\mathcal{M}\), a set of explored decision models \(\mathcal{Y}\) and set of reverse identification rules \(R_{\mathit{rev}}\), the identification procedure has complexity bound \(\Theta(|R_{\mathit{rev}}||\mathit{elems}(\mathcal{M})|^b)\) with \(b \in \mathbb{N}\).
Corollary 6. For a set of design models \(\mathcal{M}\) and set of identification rules \(R_{\mathit{ident}}\), the identification procedure has complexity bounds \(\Omega(|R_{\mathit{ident}}||\mathit{elems}(\mathcal{M})|^c)\) and \(\mathcal{O}(|R_{\mathit{ident}}||\mathit{elems}(\mathcal{M})|^{c+1})\) where \(c \in \mathbb{N}\).
We note again that these corollaries are based on the DSI assumption that the procedures are acceptably efficient. For instance, if an I-module is added to a flow run to be composed with others which contain one expensive identification rule, then Corollary 6 no longer holds.
To motive the discussion in practice, all the identification rules across the four modules in IDeSyDe have a \(c=2\). This value originates from the fact that few identification rules look for “neighboring” patterns of elements in the design model, which might require quadratic time in the number of elements. For the others, IDeSyDe has \(a=b=1\), since the reverse identification rules and exploration estimations are rather direct, as DSI promotes.