Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Update KDA partial and directional diagram generation algorithm #22

Closed
2 tasks done
nawtrey opened this issue Mar 26, 2021 · 4 comments · Fixed by #116
Closed
2 tasks done

ENH: Update KDA partial and directional diagram generation algorithm #22

nawtrey opened this issue Mar 26, 2021 · 4 comments · Fixed by #116
Labels
enhancement New feature or request performance Performance-improving changes

Comments

@nawtrey
Copy link
Collaborator

nawtrey commented Mar 26, 2021

Problem

The primary limitation for graph complexity is the space complexity of functions like kda.diagrams.generate_partial_diagrams() and kda.diagrams.generate_directional_partial_diagrams(). This is inherently related to issue #2, where this issue is focused on one part of that issue.

This problem is 2 fold:

  1. the algorithm for generating partial diagrams is inefficient
  2. the functions store more information than they need to

While 2. does contribute to the issue, the main issue is 1., especially for simpler input graphs. At the moment, the partial diagrams are generated by finding every possible combination of N-1 edges then discarding subgraphs that don't fit the definition of a partial diagram. Complete input graphs contain N(N-1)/2 unique edges, and thus have N^(N-2) partial diagrams (after doing the appropriate combinatorics). These yield approximately N! possibilities (using Stirling's approximation), so generating all possibilities is probably appropriate. But for simpler input graphs the number of partial diagrams is much less numerous so generating every possibility is likely not necessary.

Solution(s)

Before I list the possible solutions, I have to mention that in graph theory a "partial diagram" is known as a "spanning tree", so the solutions below will be using the graph theory language.

There are many solutions to this problem, many of which have formulas for calculating the space complexity of the algorithm. At the moment it isn't obvious which method is the best method, so I'm just going to dump links below.

  • This is the first useful article I found that gives a clear way to quantify the number of partial diagrams for a given graph using an adjacency matrix. They also discuss a host of different methods (9 I think) for generating the edge lists for all possible spanning trees.

I was going to list more, but the other articles/papers I have are all references in the above paper so for now I will leave it here.

Tasks

  • Implement better algo for generate_partial_diagrams
  • Implement better algo for generate_directional_diagrams
@nawtrey nawtrey added the enhancement New feature or request label Mar 26, 2021
@nawtrey
Copy link
Collaborator Author

nawtrey commented Mar 29, 2021

Here are some links where people have discussed algorithms for collecting all spanning trees:

  • This one discusses the possibility of using built-in functions for finding all spanning trees by using an adjacency/connectivity matrix along with a built-in minimum spanning tree (MST) function
  • This one has code for doing this manually, but since they don't cite any papers I'm not sure if the space complexity would be better or worse than the current algorithm.

@nawtrey
Copy link
Collaborator Author

nawtrey commented Jul 19, 2024

A few more papers to look at for ideas:

  • Characterizing the Relationship between Steady State and Response Using Analytical Expressions...: link
  • The linear framework: using graph theory to reveal the algebra and thermodynamics...: link
  • Reference 70 in the "Linear Framework..." paper: link

@nawtrey nawtrey changed the title Update KDA Partial/Directional Partial diagram generation method ENH: Update KDA partial and directional diagram generation algorithm Aug 8, 2024
@nawtrey nawtrey added the performance Performance-improving changes label Aug 10, 2024
nawtrey added a commit that referenced this issue Aug 17, 2024
* Changes directional diagram algorithm 
to use depth-first-search (i.e. `nx.dfs_tree`)
to create the directional diagrams. Also
uses the `nx.DiGraph.reverse` method to build
the directional diagram with reversed edges
and changes directional diagram return type
for `return_edges=False` to a `nx.DiGraph`.

* Removes private functions 
`diagrams._collect_sources` and
`diagrams.get_directional_path_edges`

* Move array-based edge flipping code into
the `return_edges=True` code path

* Addresses the directional diagram 
algo improvement portion of #22
@nawtrey
Copy link
Collaborator Author

nawtrey commented Aug 17, 2024

With 61a1fcb merged I've checked off the directional diagram task.

@nawtrey
Copy link
Collaborator Author

nawtrey commented Aug 17, 2024

Upon further consideration I don't know if generate_partial_diagrams is really a rate-limiting step for KDA right now.

For example, using a maximally-connected 7-state diagram, the current algorithm generated the 16807 partial diagrams in roughly 1 second:

[70.83%] ··· ...alDiagrams.peakmem_generate_partial_diagrams                ok
[70.83%] ··· ============== ======= =======
             --               return_edges
             -------------- ---------------
                 graph        True   False
             ============== ======= =======
                3-state      80.3M   80.3M
              Hill-5-state   80.4M   80.4M
              Hill-8-state   80.6M   80.5M
              EmrE-8-state   80.6M   82.1M
              Max-7-state    81.3M    138M
             ============== ======= =======

[75.00%] ··· ...rtialDiagrams.time_generate_partial_diagrams                ok
[75.00%] ··· ============== ============= ============
             --                    return_edges
             -------------- --------------------------
                 graph           True        False
             ============== ============= ============
                3-state        225±9μs      215±3μs
              Hill-5-state     459±10μs     435±20μs
              Hill-8-state   2.69±0.02ms   2.94±0.2ms
              EmrE-8-state     16.7±1ms    15.3±0.1ms
              Max-7-state     1.07±0.05s    960±10ms
             ============== ============= ============

Comparatively, the new directional diagrams algo requires ~13 seconds to generate both the partial diagrams and directional diagrams:

[54.17%] ··· ...agrams.peakmem_generate_directional_diagrams                ok
[54.17%] ··· ============== ======= =======
             --               return_edges
             -------------- ---------------
                 graph        True   False
             ============== ======= =======
                3-state      80.4M   80.5M
              Hill-5-state   80.7M   80.9M
              Hill-8-state   80.9M   83.5M
              EmrE-8-state   82.5M    102M
              Max-7-state     152M    818M
             ============== ======= =======

[58.33%] ··· ...lDiagrams.time_generate_directional_diagrams                ok
[58.33%] ··· ============== ============ ============
             --                    return_edges
             -------------- -------------------------
                 graph          True        False
             ============== ============ ============
                3-state      623±100μs    791±100μs
              Hill-5-state   2.82±0.2ms   4.14±0.4ms
              Hill-8-state   23.0±0.4ms    46.7±2ms
              EmrE-8-state    179±8ms      328±20ms
              Max-7-state    8.28±0.04s   12.7±0.07s
             ============== ============ ============

So while it may be good to improve the partial diagram algorithm for other reasons, performance is probably not a main concern. If I can find any way to improve partial diagram algorithm performance I will probably close this issue.

nawtrey added a commit that referenced this issue Aug 20, 2024
* Improves performance of
`diagrams.generate_partial_diagrams`
by efficiently filtering out invalid
edges that do not span all nodes in
the kinetic diagram. For small diagrams
the performance benefit is small, but
for more complex diagrams (e.g., the
8-state model of EmrE), there is a
statistically significant performance
increase.

* Fixes #22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance-improving changes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant