Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending structures (bonds, atom charges, etc) #426

Open
merkys opened this issue Oct 18, 2022 · 16 comments
Open

Extending structures (bonds, atom charges, etc) #426

merkys opened this issue Oct 18, 2022 · 16 comments

Comments

@merkys
Copy link
Member

merkys commented Oct 18, 2022

OPTIMADE specification v1.0.1 defines a structure as a set of sites, occupied by mixtures of atoms, with each atom described by its chemical type, mass and occupancy (proportion in the mixture). Means for expressing disorder are also in place, defined quite similarly to CIF standard.

I wonder whether there would be an interest to add more chemical attributes to OPTIMADE structures such as:

Some of these attributes can be derived algorithmically (connectivity, lone pairs), but derivation algorithms are often based on heuristics and sometimes fail to arrive at "correct" result. Thus if these details are available at provider's side, it would be nice to have them communicated in OPTIMADE attributes.

@JPBergsma
Copy link
Contributor

I think charges would be a useful property to add.
Some atomistic simulations use the charge of an atom to calculate the interatomic potentials.
In rare cases, the atomistic charges may also be the only way to distinguish chemical structures from one another. (e.g. Ions in cages can be stabilized in unusual oxidation states.)

Some formats like PDB also allow you to specify the connections between the atoms. I (ab)used this feature in the past for visualizing some of my course grained data, so I think this could be a useful feature as well.

A little over a year ago, I talked with some persons from materials cloud about which properties they would like to see standardized.
These properties are mostly the results of calculations on the structures.
The mentioned properties like:

  • magnetic moments,
  • spin configuration,
  • band structure (metallic, insulator, semiconductor)
  • conductivity
  • phonon spectra
  • the Fermi energy

For some of these, I do not really know what they are, so I can't really tell how useful these would be.
They also mentioned space groups but these have already been added in PR#405.

Some other properties that I thought could be useful(mostly for use within trajectories, but some are also useful for structures too) to add are:

Field Description
Temperature_set The temperature to which the thermostat was set.
Temperature_measured The measured temperature.
Velocities The velocities of the atoms/particles
Forces The force that is exerted on a particle
B factors Also known as Debye–Waller factor.
Constraint Force The Force required to maintain a reaction coordinate.
Time In case of a trajectory the time belonging to a particular frame
Remarks A field where some extra information can be given for this spefic entry that does not fit in any of the other fields
Various Energies we could have fields for the components of the energy such as kinetic energy, potential energy, total energy and electronic kinetic energy.
Enthalpy of formation The enthalpy of formation for the compound in the structure

@merkys
Copy link
Member Author

merkys commented Feb 13, 2023

I am mostly interested in chemical connectivity. However, I would expect the definition of chemical bond and its types to be quite involving. Could we adopt some already existing convention? CML, for instance, defines integer-numbered bond types for orders 1 to 3 (no 4), aromatic, unknown and other. To this list I would add order 4 and zero-order bonds. Anything else?

I saw @eimrek's addition to OPTIMADE paper manuscript about a database of covalent organic networks, thus it would be interesting to hear their opinion. Also pinging @BobHanson and @vaitkus for comments.

@JPBergsma
Copy link
Contributor

I think it will be more informative to allow non-integer bond orders than just having a value of 0.
The fact that the number is not an integer, indicates that it is a non-classical bond.
In the article you link to, they suggest to use 0 for these cases. But this may also be for backward compatibility.
An option would be to allow extra bond properties to store more information about a bond. In that case, we could make a dictionary for each bond and store extra information about the bond there if needed.
There are also more complex cases like three centred 2 electron bonds, perhaps we should also consider how to handle those.

Some d block metal dimers can have a bond order as high as 6, so I think we should allow the bond order to reach that value.

@BobHanson
Copy link

BobHanson commented Feb 13, 2023 via email

@vaitkus
Copy link
Contributor

vaitkus commented Feb 13, 2023

I think it might be quite difficult to agree on a single bonding model that covers every situation so we could start with something simple and then extend it in the future as needed. Some general thoughts on the model:

  • It would be nice to be able to specify the bonding without explicitly assigning the bond type/order (e.g. only provide the connectivity graph). I guess this could be achieved by using the CML unknown bond type or something similar.
  • Maybe aromaticity should be a separate property of a bond rather than a bond type? This might be used to convey that certain bonds are aromatic, but described using the Kekulé notation. Furthermore, the OpenChemLib library actually differentiates between aromatic bonds that can resonate (e.g. in benzene) and the ones that have a more or less fixed bond order (e.g. in thiophene). Thus it is quite reasonable under some circumstances to define a bond as both being aromatic and having a specific bond order.

@BobHanson
Copy link

BobHanson commented Feb 13, 2023 via email

@eimrek
Copy link
Member

eimrek commented Feb 13, 2023

hi all.

@merkys, our covalent organic framework databases don't contain bond orders and currently no intention to add it, as far as I'm aware. @ltalirz @yakutovicha correct me if i'm wrong.

Regarding atomic charges: there are multiple methods to calculate them: e.g. mulliken, hirshfeld, bader, ESP-derived, ...(https://mattermodeling.stackexchange.com/questions/1439/what-are-the-types-of-charge-analysis). Would this be something that the database provider just decides on which charges they provide? Still, it would be good to have information about method of calculation.

Regarding bond orders, there's a similar argument: there are multiple ways to calculate bond orders that can give different results. Additionally, one thing to keep in mind is how to represent non-kekule molecules, e.g. triangulene, and unpaired electrons and radical sites in general.

@merkys
Copy link
Member Author

merkys commented Feb 17, 2023

Thanks all for interesting responses. I agree that choosing the right representation for bond type/order will require a lot of thought. Thus I find @vaitkus's suggestion really appealing:

  • It would be nice to be able to specify the bonding without explicitly assigning the bond type/order (e.g. only provide the connectivity graph). I guess this could be achieved by using the CML unknown bond type or something similar.

Separating aromaticity from bond type/order is also a good suggestion.

How about starting from this:

"bonds": [ { "sites": [ 1, 2 ] } ]
  • sites would be the single REQUIRED property giving a list of sites participating in a bond. As @JPBergsma noted, sites list could contain more than two sites.
  • JSON object describing a single bond could then later be expanded by introducing properties giving type/order, aromaticity and so on.

I believe @eimrek's suggestion about specifying calculation methods should be promoted to more general level as other properties could benefit from such metadata as well.

@BobHanson
Copy link

BobHanson commented Feb 17, 2023 via email

@merkys
Copy link
Member Author

merkys commented Feb 17, 2023

I would prefer a more succinct format. Why duplicate "site" a zillion times? Maybe just array of arrays.

I understand the pros of a more succinct representation, but I tried to retain consistency with the other OPTIMADE properties which use explicit keys. Moreover, suggested plain list representation would not allow for bonds of more than two atoms. Placeholder value of 0 might be perceived as zero order bond by some. It is better to avoid placeholders at all, if no "type" (or something like it) property is given in a bond object, nothing else but some sort of connectivity should be assumed.

@ml-evs
Copy link
Member

ml-evs commented Feb 17, 2023

It might be nice if this design could also capture generic "connectivity", and serve e.g., list of sites within some cutoff of another site in PBCs. Having pre-computed neighbour lists can really help accelerate some applications and could allow for some kind of local environment/oxidation state searching expressed via correlated list queries (though this might require species data to be added to each bond, maybe not favourable), e.g., "give me all structures that contain SiO4 tetrahedra"

It would then be up to the database to decide this "calculation method" still, e.g., what distance cutoff to use (constant, sum of ionic/vdw radii etc)

@BobHanson
Copy link

BobHanson commented Feb 17, 2023 via email

@merkys
Copy link
Member Author

merkys commented Feb 17, 2023

@BobHanson

Yes, sorry, I was on my phone and, ah, still in bed... Meant to follow that with: "That said, the more use of associative arrays, the more easily extended this will be." Q: What else do we have that references sites like this?

OPTIMADE has assemblies to describe disorder, and that uses similar level of verbosity.

@merkys
Copy link
Member Author

merkys commented Mar 3, 2023

This might be slightly off-topic, but how does one get atom bonding out of QM calculations? Can existence of bonds/their types be objectively detected via QM, or would one need some heuristic (i.e., distance-based criterion) to derive them? Pinging @gmrigna.

@eimrek
Copy link
Member

eimrek commented Mar 3, 2023

This might be slightly off-topic, but how does one get atom bonding out of QM calculations? Can existence of bonds/their types be objectively detected via QM, or would one need some heuristic (i.e., distance-based criterion) to derive them? Pinging @gmrigna.

Here's a small overview of QM bond order methods: https://mattermodeling.stackexchange.com/questions/901/what-are-the-types-of-bond-orders/1508

Most of these (or at least the popular ones, Wiberg, Mayer and Laplacian, which I also have some experience with) are fully determined based on the electronic structure (so, the density/density matrix/occupied molecular orbitals/or derived orbitals) and the atom-atom distance is not "explicitly" used.

@merkys
Copy link
Member Author

merkys commented Jun 7, 2023

Suggestion for a queryable property:

  • distance between Voronoi neighbours
  • coordination number per site

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants