Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] More band-aid fixes to Colvars #700

Open
wants to merge 5 commits into
base: reusable-cvcs
Choose a base branch
from

Conversation

HanatoK
Copy link
Member

@HanatoK HanatoK commented Jun 29, 2024

Well, I believe that no one would really like this PR. This PR uses a lot of hacks and workarounds in the backend to achieve reusable CVCs in SMP with limitations as less as possible (although I am not a big fan of distributing CVCs over threads), including:

  1. Build a toy AST and determine the parallelization scheme by the depth of the node. At first glance, colvardeps looks like an AST but after playing with it I feel it is not a real one, and it really just checks the dependencies of features, and there is no true AST. It would be better if Colvars could be redesigned with a true AST and a dependency checker for it. The dependency checker should not own the AST;
  2. Bypass the colvar class and take the cvc objects out to build the AST. I think that the colvar class should be completely removed;
  3. I don't know why the smp_lock and smp_unlock in colvarproxy_namd are implemented as creating and destroying locks, so I have changed them;
  4. Implement the chain rule in a dirty manner (see colvar::cvc::modify_children_cvcs_atom_gradients and propagate_colvar_force). When calling calc_gradients and apply_force of a CVC consisting of sub-CVCs, it now propagates the gradients and forces to all its sub-CVCs;
  5. To avoid race condition in propagating the atom gradients when reusing CVCs, I have to use smp_lock. However, it is very coarse-grained so I expect an additional performance penalty. I thought there should be a lock tied to each atom group but found none.

In summary, I think that Colvars should be fundamentally changed to achieve better support of reusable components and parallelization.

This PR tries to solve #232, extends #644 and finishes:

  • Reusing the computation of the individual "nodes" in a pair of path CVs ("s" and "z").

@HanatoK
Copy link
Member Author

HanatoK commented Jun 30, 2024

After some thoughts I feel there is no way to make explicit gradients working correctly with reusable components, so I will disable it in this PR.

This does not work as Colvars was not designed with automatic
differentiation in mind.
@HanatoK
Copy link
Member Author

HanatoK commented Jul 1, 2024

The problem of calc_gradients()

Colvars was not designed with automatic differentiation (AD) in mind. At first glance, it seems to perform the forward AD because for each CVC, the calc_gradients() is executed just after calc_value, but the colvar class, supposed to be a function of CVC, does not have explicit gradients with respect to the atoms. Conversely, colvar::communicate_forces just computes the gradients with respect to CVCs on-the-fly, which seems to be a backward AD implementation. Furthermore, the colvarvalue class has no gradient field, which is opposed to many other implementations like torch.tensor and PLMD::Value, and the CVC class does not store the gradients, either. All these factors make either forward AD or backward AD using calc_gradients() for CVCs of sub-CVCs difficult. The apply_force() code path is less broken as it looks consistent with backward AD and the backward propagation in PLUMED.

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

@HanatoK HanatoK changed the title More band-aid fixes to Colvars [RFC] More band-aid fixes to Colvars Jul 1, 2024
@HanatoK HanatoK marked this pull request as ready for review July 1, 2024 19:25
AST

Giacomo said that variables_active_smp is used for making the colvar
object appearing in the loop multiple times for the original
parallelization scheme. As I understand, this means that I can use
variables_active directly to build the AST instead of the duplicated
items in variables_active_smp.
if (it_find == cvc_info_map.end()) {
cvc_info_map.insert({parent, cvc_info{
// TODO: Here calc_cvc_values calls cvcs[i]->is_enabled() , which is is_enabled(int f = f_cv_active)
// I know both f_cv_active and f_cvc_active are 0 but are they the same option??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that f_cv_active and f_cvc_active have the same numerical value has no consequence, they should never be used in the same context, because they are respectively only meaningful in a colvar or cvc object. The colvardeps data of these two classes are non-overlapping. The relationship between them is a vertical (parent-child) dependency.

If we were to merge the two levels, then these two features would merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that in

if (!cvcs[i]->is_enabled()) continue;
, is_enabled() is called, and since there is no function parameter passed, so I think it would call
inline bool is_enabled(int f = f_cv_active) const {

which checks f_cv_active instead of f_cvc_active. My code is to follow what the original calls in calc_cvc_values and I am not sure if I need to follow it the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry. That is somewhat sloppy writing that I didn't remember well. It does rely on the "active" property being number 0.

// NOTE that all feature enums should start with f_*_active

But a class inheriting from colvardeps could also override is_enabled() and change this convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much effort do you think it would be to remove the default argument for this virtual function? Remember that this is one of the "issues" that clang-tidy was complaining about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be very little effort. I'm happy to do that if that helps in any way.

it != cv->variables_active_smp()->end(); ++it) {
// TODO: Bad design! What will happen if CVC a is in a "colvar" block
// that does not support total_force_calc, but is then reused in
// another block that requires total_force_calc even if it supports Jacobian itself???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the problem here. Assuming a cvc can have several parents: the colvar that does require a total force can enable it in its children cvcs, even if other parents don't require (or support) it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I just thought that if a colvar does not require the total force, then it would disable the corresponding feature of the children, but it seems the code do not check the dependency in that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the other way: disabled by default, and enabled on request - then disabled again if the refcount falls to zero.

lambda_fn, NULL, CKLOOP_NONE, NULL);
}
cvm::decrease_depth();
return cvm::get_error();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelization and checking the error

When I wrote and test new code, I found that NAMD may not exit cleanly (some other threads were still running) if one thread was exited. I think the implementation of colvarproxy_namd::error or at least its use in the parallel region, needs to be revised to use a std::atomic<bool> to setup an error "bit" instead of calling NAMD_die directly. Then after all threads are finished and joined, we can use cvm::get_error() to check the error bit and exit if there is an error.

@jhenin
Copy link
Member

jhenin commented Jul 4, 2024

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

There seems to be a misunderstanding here. To clarify, the spirit of init_dependencies is not that it "checks dependencies" - rather, it initializes the static dependency tree between features. The proper dependency checking happens with calls to enable(), which happen either during the semantic analysis of the input, or throughout the run in case of dynamic dependencies.

@HanatoK
Copy link
Member Author

HanatoK commented Jul 4, 2024

The problem of colvardeps

From the perspective of compiler or interpreter, when constructing the CVC object in colvar::cvc::cvc, Colvars is still in the stage of syntax analysis, parsing the syntax of the config file and trying to build a tree of colvardeps, but it ought to be noted that before running cvc::init the syntax analysis is not done. The weird design is that init_dependencies is called after the constructor of CVC and before cvc::init and checks the dependencies. In general, feature dependencies are a semantic thing, and that means that Colvars interleaves the syntax analysis with the semantic analysis, which, in my opinion, is a bad design.

There seems to be a misunderstanding here. To clarify, the spirit of init_dependencies is not that it "checks dependencies" - rather, it initializes the static dependency tree between features. The proper dependency checking happens with calls to enable(), which happen either during the semantic analysis of the input, or throughout the run in case of dynamic dependencies.

Thanks for the clarifications. You are right that init_dependencies only declares the dependencies. However, I think that the problem is still calling enable that tries to check the dependencies while still initializing the children CVCs, so for CVC of sub-CVCs, I cannot do the checking here:

// TODO: I don't know why I cannot check this
// if (is_enabled(f_cvc_gradient))
sub_cv->enable(f_cvc_gradient);

Also, I cannot use add_child in colvardeps to declare that a sub-CVC is a child of the other, as add_child also does dependency checking, so I think that it is a bad design to check dependencies while calling the initialization function. Calling the init means that Colvars is still parsing options, and the abstract syntax tree is not completely built, but the dependencies are semantic things that should be done after the AST is completely built.

In my opinion, the following structure should be separated from colvardeps,

colvars/src/colvardeps.h

Lines 155 to 162 in ff35f9c

/// pointers to objects this object depends on
/// list should be maintained by any code that modifies the object
/// this could be secured by making lists of colvars / cvcs / atom groups private and modified through accessor functions
std::vector<colvardeps *> children;
/// pointers to objects that depend on this object
/// the size of this array is in effect a reference counter
std::vector<colvardeps *> parents;

to form the AST, and colvardeps should only be a feature-dependency checker, and not own the AST. Adding new children to the AST should not trigger the checker. In other words, it is better that colvardeps acts somewhat like an LLVM pass.

@jhenin
Copy link
Member

jhenin commented Jul 10, 2024

Thanks @HanatoK , I now understand your point better and I fully agree!

To move the AST out of colvardeps and allow same-level dependencies, we need to deal with the fact that children objects of a CVC can be CVCs or atom groups, so those should be described by different parts of the feature tree.

@jhenin
Copy link
Member

jhenin commented Aug 26, 2024

@HanatoK - this is a large and disruptive change set. I think there are items in there that we can agree on. Having an AST alongside the colvardeps class is one.

Your initial point 2 was: Bypass the colvar class and take the cvc objects out to build the AST. I think that the colvar class should be completely removed;
The colvar class is the main workhorse at the moment, but I suppose you mean merging the cvc and colvar classes into one. I generally agree with that idea, although that will give a large class, where only a small subset of features will be used by most instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants