Refinements for performance of extracted code #4

palmskog · 2019-03-28T20:43:58Z

This is an ongoing effort to document how the executable parts of reglang could be refined to have better performance in practice for extracted code. This kind of refinement could be done in several ways, most directly in the style of CoqEAL.

palmskog · 2019-03-28T20:54:47Z

Epsilon NFA (ENFA) construction reachability

The set of reachable ENFAs is defined using the connect predicate as:

Definition eps_reach (p : N) := [set q | connect (enfa_trans None) p q].

As highlighted by @pi8027, this results in #|N| calls to connect in practice, each of which is a depth-first search of the relation. One way to make eps_reach more efficient is to directly use dfs from fingraph:

Definition eps_reach (p : N) := dfs (rgraph (enfa_trans None)) #|N| [::] p.

This results in the function returning a sequence rather than a set, but this can be accounted for.

palmskog · 2019-03-28T21:12:27Z

Union of ENFA reachable states

The ENFA-to-NFA construction takes the union of all reachable states using \bigcup:

nfa_s := \bigcup_(p in enfa_s N) (eps_reach p)

Unless the states reachable via DFS are completely disjoint for all ENFA states, this will result in states being explored many times. A more efficient way to handle this is to only incrementally and tail-recursively do the work of eps_reach for unexplored states:

nfa_s := foldl (dfs (rgraph (enfa_trans None)) #|N|) [::] (enum (enfa_s N))

palmskog · 2019-03-28T21:29:33Z

Datatype for transition relation of NFAs and ENFAs

The transition relations of NFAs and ENFAs are currently represented as boolean-valued functions. To play better to the strengths of the MathComp dfs definition, they could instead be represented using neighbor lists, i.e., as functions from states to sequences to of states (e.g., nfa_state -> seq nfa_state). This would avoid the costly conversion via rgraph when using dfs.

palmskog · 2019-03-28T21:41:56Z

Datatype for NFA and ENFA states

The key sets of states for NFAs and ENFAs, the set of all automaton states and the set of accepting states, are currently represented as finite sets over a finType. Finite sets don't have fast membership lookup in practice, so sets could instead be represented by some more efficient (logarithmic-time) data structure, such as red-black trees.

pi8027 · 2019-03-28T22:03:01Z

One can bridge finsets and machine integers to have fast membership lookup in extracted code: https://hal.inria.fr/hal-01251943v1 https://github.com/artart78/coq-bitset https://github.com/ejgallego/ssrbit

palmskog · 2019-03-28T22:07:52Z

@pi8027 as far as I'm aware, those libraries don't work well on a recent version of Coq. The "right thing" would now probably be to reimplement a similar library using the recently-merged primitive integers.

palmskog · 2019-03-28T22:16:51Z

Alternative algorithms for regexp matching

Regular expression matching has many algorithms in the literature. One algorithm is based on closures and explored in a classic paper by Harper; this algorithm has been formalized standalone by Sozeau in Equations, and could be reimplemented in the reglang setting. Another formalization of an algorithm based on residuals could also be adapted.

chdoc · 2019-04-01T16:37:17Z

I think we should discuss at some point what the purpose of these refinements would be. I would prefer the main definitions to remain "uncompromisingly proof oriented". I'm not opposed to having a second representation that is effective. But one should be clear about which parts of the (correctness) proofs one expects to carry over and what one expects to duplicate.

I'd also say that this question is orthogonal to the question of using different algorithms, because then one is again faced with the decision of whether to verify it in an abstract/declarative setting or directly on the effective representation.

So what are you doing or planning to do with the extracted code?

palmskog · 2019-04-01T17:05:22Z

At least in the CoqEAL refinement approach, the original definitions are left entirely unchanged, and would probably not live in the same repository as the refined ones. All lemmas from the proof-oriented side should be transferable to the refined instances in some way.

One interesting application is to compare performance of verified code against unverified code for some common domains related to regular languages (regexp matching, DFA minimization, etc.)

chdoc · 2019-04-01T17:40:56Z

Well, the current matching "algorithm" is really just an executable specification with the star operation originally from Coquand and Siles. Likewise, minimization is performed by forming a quotient modulo the coarsest Myhill-Nerode relation. Those likely do not yield reasonable performance, even on effective data structures. After all, our focus was on verifying the translations between various different representations of regular languages.

That being said, the translation from regular expressions to NFAs is (provably) linear. So while there are better algorithms that have already been formalized this is not entirely unreasonable. But then, there is currently no (reasonable) algorithm for the word problem in NFAs either.

palmskog · 2019-04-01T18:50:28Z

So the conclusion in my mind would be that certain functions can be "directly" refined without much algorithmic change (regexp to NFA translation), while other functions are simply used as (executable) specifications when applying a different algorithm. In the latter case, the main benefit is that one ensures that the specification of the alternative algorithm is correct, since it's connected to a larger theory; one might also reuse datatype representations from the proof-oriented development.

palmskog · 2019-04-03T00:20:50Z

The unified framework for regexps in Isabelle/HOL contains a collection of procedures that could be relevant (includes the one by Asperti referenced above, although it's probably easier to use the Matita formalization).

pi8027 · 2022-04-05T15:14:07Z

FTR, coq/coq#15901 might be a good application of this feature request, although it is more about computational reflection rather than extraction.

palmskog · 2022-04-05T15:22:23Z

@pi8027 if you want performant automata encodings in proofs by reflection, the encodings in ATBR might be the state of the art in Coq. We already have this issue on performance benchmarking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refinements for performance of extracted code #4

Refinements for performance of extracted code #4

palmskog commented Mar 28, 2019

palmskog commented Mar 28, 2019

palmskog commented Mar 28, 2019 •

edited

Loading

palmskog commented Mar 28, 2019

palmskog commented Mar 28, 2019

pi8027 commented Mar 28, 2019 •

edited

Loading

palmskog commented Mar 28, 2019 •

edited

Loading

palmskog commented Mar 28, 2019 •

edited

Loading

chdoc commented Apr 1, 2019

palmskog commented Apr 1, 2019

chdoc commented Apr 1, 2019

palmskog commented Apr 1, 2019

palmskog commented Apr 3, 2019

pi8027 commented Apr 5, 2022 •

edited

Loading

palmskog commented Apr 5, 2022

Refinements for performance of extracted code #4

Refinements for performance of extracted code #4

Comments

palmskog commented Mar 28, 2019

palmskog commented Mar 28, 2019

Epsilon NFA (ENFA) construction reachability

palmskog commented Mar 28, 2019 • edited Loading

Union of ENFA reachable states

palmskog commented Mar 28, 2019

Datatype for transition relation of NFAs and ENFAs

palmskog commented Mar 28, 2019

Datatype for NFA and ENFA states

pi8027 commented Mar 28, 2019 • edited Loading

palmskog commented Mar 28, 2019 • edited Loading

palmskog commented Mar 28, 2019 • edited Loading

Alternative algorithms for regexp matching

chdoc commented Apr 1, 2019

palmskog commented Apr 1, 2019

chdoc commented Apr 1, 2019

palmskog commented Apr 1, 2019

palmskog commented Apr 3, 2019

pi8027 commented Apr 5, 2022 • edited Loading

palmskog commented Apr 5, 2022

palmskog commented Mar 28, 2019 •

edited

Loading

pi8027 commented Mar 28, 2019 •

edited

Loading

palmskog commented Mar 28, 2019 •

edited

Loading

palmskog commented Mar 28, 2019 •

edited

Loading

pi8027 commented Apr 5, 2022 •

edited

Loading