Skip to content

Latest commit

 

History

History
367 lines (241 loc) · 12.8 KB

atomsbonds.i.md

File metadata and controls

367 lines (241 loc) · 12.8 KB

Atoms, Bonds and Molecules

The basic objects in the CDK are the IAtom, IBond and IAtomContainer [Q27061829]. The name of the latter is somewhat misleading, as it contains not just IAtoms but also IBonds. The primary use of the model is the graph-based representation of molecules, where bonds are edges between two atoms being the nodes [Q37988904].

Before we start, it is important to note that CDK 2.0 has an important convention around object properties: when a property is unset, the object’s field is set to null. This brings in sources for NullPointerExceptions, but also allows us to distinguish between, for example, zero and unset formal charge. In the former case, the formal charge value be set and have a zero value; in the latter case, the field has a null value, indicating the formal charge is currently unknown.

Atoms

The CDK interface IAtom is the underlying data model of atoms. Creating a new atom is fairly easy. For example, we can create an atom of element type carbon, as defined by the element’s atomic number that we pass as parameter in the constructor:

CreateAtom3

For this we can also use the atomic number from the IElement class:

CreateAtom4

An atom can also be constructed by passing in the symbol but this is marginally less efficient:

CreateAtom1

Alternatively, we can also construct a new carbon atom, by passing a carbon IElement, conveniently provided by the Elements class:

CreateAtom2

A CDK atom has many properties, many of them inherited from the IElement, IIsotope and IAtomType interfaces. Figure atomInheritance shows the interface inheritance specified by the CDK data model.

These constructors will set the atomic number of the atom:

CreateAtom2

![](images/atomInheritance.png)

IElement

The most common property of IElements are their symbol and atomic number. Because the IAtom extends the IElement, CDK atoms also have these properties. Therefore, we can set these properties for atoms manually too:

ElementProperties

Of course, we can use the matching get methods to recover the properties:

ElementGetProperties

which outputs:

ElementGetProperties

IIsotope

The IIsotope information consists of the mass number, exact mass and natural abundance:

IsotopeProperties

Here too, the complementary get methods are available:

IsotopeGetProperties

giving:

IsotopeGetProperties

Appendix isotopes lists all isotopes defined in the CDK with a natural abundance of more than 0.1.

IAtomType

Atom types are an important concept in cheminformatics. They describe some basic facts about that particular atom in some particular configuration. These properties are used in many cheminformatics algorithms, including adding hydrogens to hydrogen-depleted chemical graphs (see Section implicithydrogens) and force fields. Chapter atomtype provides much more detail on the atom type infrastructure in the CDK library, and, for example, details how atom types can be perceived, and how atom type information is set for atoms.

The IAtomType interface contains fields that relate to atom types. These properties include formal charge, neighbor count, maximum bond order and atom type name:

AtomTypeProperties

Coordinates

The IAtom class supports three types of coordinates: 2D coordinates, used for diagrams, 3D coordinates for geometries, and crystal unit cell or notional coordinates. These properties are set with the respective methods:

AtomCoordinates

The latter coordinates define the locations of the atoms with respect to (or inside) the crystal structure’s unit cell. Section 5.2 explains the full crystal structure functionality.

Bonds

The IBond interface of the CDK is an interaction between two or more IAtoms, extending the IElectronContainer interface. While the most common application in the CDK originates from graph theory [Q37988904], it is not restricted to that. That said, many algorithms implemented in the CDK expect a graph theory based model, where each bond connects two, and not more, atoms.

For example, to create ethanol we write:

Ethanol

The CDK has a few bond orders, which we can list with this groovy code:

BondOrders

which outputs:

BondOrders

As you might notice, there is no AROMATIC bond defined. This is deliberate and the CDK allows to define single-double bond order patterns at the same time as aromaticity information. For example, a kekule structure of benzene with bonds marked as aromatic can be constructed with:

AromaticBond

Electron counts

Bond orders, as we have seen earlier, are commonly used in the CDK to indicate the electronic properties of a bond. At the same time, each bond consists of a number of atoms. For example, in a single (sigma) bond, two electrons are involved. In a double (pi) bond, four electrons are involved, and in a triple bond, six electrons are involved. We can report on the electron counts for the various orders with this code:

ElectronCounts

showing us the default implementation:

ElectronCounts

Bond stereochemistry

The IBond.setStereo() method is discussed in Section stereo:bond.

Molecules

We already saw in the previous pieces of code how the CDK can be used to create molecules, and while the above is, strictly speaking, enough to find all atoms in the molecule starting with only one of the atoms in the molecule, it often is more convenient to store all atoms and bonds in a container.

The CDK has one container: the IAtomContainer. It is a general container to holds atoms an bonds, and can contain both unconnected as well asfully connected structures. The latter has the added implication that it holds a single molecule, of which all atoms are connected to each other via one or more covalent bonds.

Adding atoms and bonds is done by the methods addAtom(IAtom) and addBond(IBond):

AtomContainerAddAtomsAndBonds

The addBond() method has an alternative which takes three parameters: the first atom, the second atom, and the bond order. Note that atom indices follows programmers habits and starts at 0, as you can observe in the previous example too. This shortens the previous version a bit:

AtomContainerAddAtomsAndBonds2

Iterating over atoms and bonds

The IAtomContainer comes with convenience methods to iterate over atoms and bonds. Both methods use the Iterable interfaces, and for atoms we do:

CountHydrogens

which returns

CountHydrogens

And for bonds the equivalent:

CountDoubleBonds

giving

CountDoubleBonds

Neighboring atoms and bonds

It is quite common that you like to see what atoms are connected to one particular atom. For example, you may wish to count how many bonds surround a particular atom. Or, you may want to list all atoms that are bound to this atom. The IAtomContainer class provides methods for these use cases. But it should be stressed that these methods do only take into account explicit hydrogens (see the next section).

Let's consider ethanol again, given in Script script:Ethanol, and count the number of neighbors for each atom:

NeighborCount

which lists for the three heavy atoms:

NeighborCount

Similarly, we can also list all connected atoms:

ConnectedAtoms

which outputs:

ConnectedAtoms

We can do the same thing for connected bonds:

ConnectedBonds

which outputs:

ConnectedBonds

Molecular Formula

Getting the molecular formula of a molecule and returning that as a String is both done with the MolecularFormulaManipulator class:

MFGeneration

giving:

MFGeneration

Implicit and Explicit Hydrogens

The CDK has two concepts for hydrogens: implicit hydrogens and explicit hydrogens. Explicit hydrogens are hydrogens that are separate vertices on the chemical graph. Implicit hydrogens, however, are not, and are attributes of existing vertices.

![](images/generated/MethaneImplicit.png)
![](images/generated/MethaneExplicit.png)

For example, if we represent methane as a chemical graph, we can define either a hydrogen-depleted chemical graph with a single carbon atom and zero bonds, or a graph with one carbon and four hydrogen atoms, and four bonds connecting the hydrogens to the central carbon. In the latter case, the hydrogens are explicit, while in the former case we can add those four hydrogens as implicit hydrogens on these carbon.

The first option in CDK code looks like:

HydrogenDepletedGraph

while the alternative look like:

HydrogenExplicitGraph

Section missinghydrogens describes how hydrogens can be added programmatically.

Chemical Objects

Another interface that must be introduced is the IChemOject as it plays an key role in the CDK data model. Almost all interfaces used in the data model inherit from this interface. The IChemObject interface provides a bit of basic functionality, including support for object identifiers, properties, and flags.

For example. identifiers are set and retrieved with the setID() and getID() methods:

ChemObjectIdentifiers

If you have more than one identifier, or other properties you like to associate with objects, you can use the setProperty() and getProperty() methods:

ChemObjectProperties

For example, we can use this approach to assign labels to atoms, such as in this example from substructure searching (see Chapter substructure):

AtomLabels

The CDKConstants class provides a few constants for common properties:

CDKConstantsProperties

outputting:

CDKConstantsProperties

A third characteristic of the IChemObject interface is the concept of flags. Flags are used in the CDK to indicate, for example, if an atom or bond is aromatic (see Script script:AromaticBond) or if an atom is part of a ring:

RingBond

The next section talks about the CDK data class for \topic{rings}.

Rings

One important aspect of molecules is rings, partly because rings can show interesting chemical phenomena. For example, if the number of FIXME electrons is right, then the ring will become aromatic, as we commonly observer in phenyl rings, such as in benzene. But, cheminformatics has many other aspects where one like to know about those rings. For example, 2D coordinate generator (see Section layout) requires algorithms to know what the rings are in a molecule.

![](images/rings.png)

Section spanningtree explains what functionality the CDK has to determine a bond takes part in a ring system. Here, we just introduce the IRing interface, which extends the more general IAtomContainer as shown in Figure ring. Practically, there is nothing much to say about the IRing interface. One method it adds, is to get the size of the ring:

RingExample

But this should be by definition the same as the number as atoms and bonds:

RingExample

An overview of three algorithms to find rings in atom containers is provided in Section ringsearch. Additionally, you may also be interested in ring sets, explained in Section reactionandringsets.

References