Mathematica 8 is now available

Data Structure

[Graphics:../Images/index_gr_2.gif]
[Graphics:../Images/index_gr_3.gif]

Syntax

Molecular Graphs

The valence bond model of a chemical structure can be represented by a vertex-colored (atomic species), edge-weighted (bond types) graph, the so-called molecular graph [Bal76].

[Graphics:../Images/index_gr_4.gif]

The molecular graph of an acyclic molecule can be represented by a rooted tree, and thus also by a normal expression because both are hierarchical [Nac98, Nac00]. The figure above shows the complete transformation in steps from the chemical graph of camphor to the normal expression.

Molecular Expressions

The heads of each subexpression correspond to the bonds in the molecule and the atomic elements (in the Mathematica sense) are the atoms.

The general form for a molecular substructure is

[Graphics:../Images/index_gr_5.gif]

where the heads [Graphics:../Images/index_gr_6.gif] are selected from the set {Single, Double, Triple, Dative, Molecule}, the last of which is reserved as the head of the whole molecule, and the symbols [Graphics:../Images/index_gr_7.gif] are drawn from the set  {H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, Br, I, Hg, NO, PO, PS, SO, SO2, AsO}, the last six of which are superatoms.

The set [Graphics:../Images/index_gr_8.gif] associated with the a given [Graphics:../Images/index_gr_9.gif] define its valence state.

Rings

The pairs of atoms that are bound by a ring-closure bond (atoms 4, 7, and 11 in camphor, above) have Single[R[i]] as subexpressions.

Each ring closure has a unique i.

Superatoms

Superatoms represent polyatomic substructures bound together with dative bonds, as in nitro, phosphate, sulfinyl, and sulfonyl moieties, for example.

[Graphics:../Images/index_gr_10.gif]

[Graphics:../Images/index_gr_11.gif]

[Graphics:../Images/index_gr_12.gif]

[Graphics:../Images/index_gr_13.gif]

Superatoms have fully expanded equivalents:

superatom [Graphics:../Images/index_gr_14.gif]
[Graphics:../Images/index_gr_15.gif] [Graphics:../Images/index_gr_16.gif]
[Graphics:../Images/index_gr_17.gif] [Graphics:../Images/index_gr_18.gif]
[Graphics:../Images/index_gr_19.gif] [Graphics:../Images/index_gr_20.gif]
[Graphics:../Images/index_gr_21.gif] [Graphics:../Images/index_gr_22.gif]
[Graphics:../Images/index_gr_23.gif] [Graphics:../Images/index_gr_24.gif]
[Graphics:../Images/index_gr_25.gif] [Graphics:../Images/index_gr_26.gif]

The use of superatoms NO, PO, PS, and AsO allows  N, P, and As to have a single valence of 3 instead of two valences of 3 and 5, and the use of superatoms SO, and SO2 allows S to have a single valence of 2 instead of three valences of 2, 4, and 6.

The user can define his/her own superatoms:

superatom [Graphics:../Images/index_gr_27.gif] [Graphics:../Images/index_gr_28.gif]
Me [Graphics:../Images/index_gr_29.gif] [Graphics:../Images/index_gr_30.gif]
Et [Graphics:../Images/index_gr_31.gif] [Graphics:../Images/index_gr_32.gif]
COOH [Graphics:../Images/index_gr_33.gif] [Graphics:../Images/index_gr_34.gif]
Ph [Graphics:../Images/index_gr_35.gif] [Graphics:../Images/index_gr_36.gif]
Ala [Graphics:../Images/index_gr_37.gif] [Graphics:../Images/index_gr_38.gif]

Examples

[Graphics:../Images/index_gr_39.gif]
[Graphics:../Images/index_gr_40.gif]
[Graphics:../Images/index_gr_41.gif]
[Graphics:../Images/index_gr_42.gif]
[Graphics:../Images/index_gr_43.gif]
[Graphics:../Images/index_gr_44.gif]
[Graphics:../Images/index_gr_45.gif]

[Graphics:../Images/index_gr_46.gif]

This example is deceivingly simple because the two bonds of its valence state are not equivalent. One could easily obtain a non-head-to-tail sequence of amino acids if the ligands appear in the wrong order!

[Graphics:../Images/index_gr_47.gif]
[Graphics:../Images/index_gr_48.gif]
[Graphics:../Images/index_gr_49.gif]

[Graphics:../Images/index_gr_50.gif]

What we can't do (yet)

This encoding method allows us to handle nearly all the usual organic chemical structures encountered in medicinal chemistry. The only structures that cannot be represented are those that require the specification of formal atomic charges, as in quaternary ammonium (1), azide (2), and isocyanate (3), and those that make use of d orbitals, as in phosphorus pentafluoride (4) and sulfur hexafluoride (5).

[Graphics:../Images/index_gr_51.gif]

Semantics

Valence

We can assign a valence to each atom type (H->1, C->4, N->3, SO2->2, R[_]->1 etc.) and to each bond type (Single->1, Double->2, Triple->3, Dative->0, and Molecule->0). Then the following condition must be satisfied for each level of the molecular expression [Graphics:../Images/index_gr_52.gif]

[Graphics:../Images/index_gr_53.gif]

For the subexpression pattern bond_[atom_, branches___] Equation 1 becomes

Valence[atom]==Plus@@Valence[{bond,branches}]

When explicit dative bonds are present, the following test checks that they don't exceed the number of available lone pairs of electrons

LonePairCount[atom]>=Count[{branches},Dative[_]]

Multigraphs and Pseudographs

The inappropriate use of ring closures (R[i]) can lead to the generation of a multigraph or a pseudograph. The chemical interpretation of them is a 2-membered or 1-membered ring, respectively, which are invalid.

[Graphics:../Images/index_gr_54.gif]

[Graphics:../Images/index_gr_55.gif]

[Graphics:../Images/index_gr_56.gif] [Graphics:../Images/index_gr_57.gif]

Freedom from 2-membered rings can be recognized by the predicate twoRingFreeQ

twoRingFreeQ[mol_Molecule] :=
    FreeQ[mol,
        _[_, ___, Single[R[x_]], ___,
            _[_, ___, Single[R[x_]], ___], ___] |
        _[_, ___, _[_, ___, Single[R[x_]], ___],
            ___, Single[R[x_]], ___]] &&
    FreeQ[Cases[mol, _[_, ___, Single[R[x_]], ___,
            Single[R[y_]], ___], {0, Infinity}],
        {___, _[_, ___, Single[R[x_]], ___, Single[R[y_]], ___], ___,
            _[_, ___, Single[R[x_]], ___, Single[R[y_]], ___], ___} |
        {___, _[_, ___, Single[R[x_]], ___, Single[R[y_]], ___], ___,
            _[_, ___, Single[R[y_]], ___, Single[R[x_]], ___], ___}]

Freedom from 1-membered rings can be recognized by the predicate oneRingFreeQ

oneRingFreeQ[mol_Molecule] :=
    FreeQ[mol, _[_, ___, Single[R[x_]], ___, Single[R[x_]], ___]]

Nonsense

Inappropriate symbols can be recognized by the failure of the predicates chemicalAtomQ and chemicalBondQ

chemicalAtomQ[atom_] := MemberQ[$ValidAtoms, atom] ||
    MatchQ[atom, R[_Integer?Positive]]
chemicalBondQ[bond_] := MemberQ[$ValidBonds, bond]

The presence of unmatched ring closures can be recognized by the failure of the predicate matchedRingClosuresQ

matchedRingClosuresQ[mol_Molecule] :=
    MatchQ[Split[Sort[Cases[mol, _[R[_]], {-3}]]], {{_, _}...}]

Total validity

MoleculeQ[mol:Molecule[atom_, branches___]]:=
    chemicalAtomQ[atom] &&
    (And @@ MoleculeQ /@ Hold[branches]) &&
    TrueQ[Valence[atom] ==
        Plus @@ Valence[{Molecule, branches}]] &&
    TrueQ[LonePairCount[atom] >=
        Count[{branches}, Dative[_]]] &&
    matchedRingClosuresQ[mol] &&
    oneRingFreeQ[mol] &&
    twoRingFreeQ[mol]
MoleculeQ[bond_[atom_, branches___]]:=
    chemicalBondQ[bond] &&
    chemicalAtomQ[atom] &&
    (And @@ MoleculeQ /@ Hold[branches]) &&
    TrueQ[Valence[atom] ==
        Plus @@ Valence[{bond, branches}]] &&
    TrueQ[LonePairCount[atom] >=
        Count[{branches}, Dative[_]]]
MoleculeQ[x___] /; Length[{x}]==1 ||
    Message[MoleculeQ::argx, MoleculeQ, Length[{x}]] :=
    False

Examples
[Graphics:../Images/index_gr_58.gif]
[Graphics:../Images/index_gr_59.gif]
[Graphics:../Images/index_gr_60.gif]
[Graphics:../Images/index_gr_61.gif]
[Graphics:../Images/index_gr_62.gif]
[Graphics:../Images/index_gr_63.gif]
[Graphics:../Images/index_gr_64.gif]
[Graphics:../Images/index_gr_65.gif]
[Graphics:../Images/index_gr_66.gif]
[Graphics:../Images/index_gr_67.gif]
[Graphics:../Images/index_gr_68.gif]
[Graphics:../Images/index_gr_69.gif]
[Graphics:../Images/index_gr_70.gif]
[Graphics:../Images/index_gr_71.gif]
[Graphics:../Images/index_gr_72.gif]
[Graphics:../Images/index_gr_73.gif]
[Graphics:../Images/index_gr_74.gif]
[Graphics:../Images/index_gr_75.gif]
[Graphics:../Images/index_gr_76.gif]
[Graphics:../Images/index_gr_77.gif]
[Graphics:../Images/index_gr_78.gif]
[Graphics:../Images/index_gr_79.gif]
[Graphics:../Images/index_gr_80.gif]
Stereochemistry

3-Dimensional

The tetrahedral geometry at a carbon atom endows it with the ability to exist in two non-superimposable, mirror-image isomers (enantiomers) when substituted with 4 constitutionally different ligands.

[Graphics:../Images/index_gr_81.gif]

[Graphics:../Images/index_gr_82.gif]

Of the 24 different ways to write the molecular expression, 12 can represent the form on the left and 12 the form on the right

[Graphics:../Images/index_gr_83.gif]

[Graphics:../Images/index_gr_84.gif]

Simple rules govern the ordering of the elements of the molecular expression or subexpression.

[Graphics:../Images/index_gr_85.gif]

[Graphics:../Images/index_gr_86.gif] [Graphics:../Images/index_gr_87.gif]
[Graphics:../Images/index_gr_88.gif]

The other 11 equivalent representations of each enantiomer can be obtained by applying the permutations {1,2,4,5,3} or {1,3,4,2,5}, or combinations of them.

[Graphics:../Images/index_gr_89.gif]
[Graphics:../Images/index_gr_90.gif]

The enantiomers can be interconverted by applying the permutation {1,2,3,5,4}.

[Graphics:../Images/index_gr_91.gif]
[Graphics:../Images/index_gr_92.gif]

Note that the signatures of the configuration preserving permutations are positive, while for the configuration inverting permutations it is negative.

[Graphics:../Images/index_gr_93.gif]
[Graphics:../Images/index_gr_94.gif]

The permutations that preserve the configuration of the subexpression [Graphics:../Images/index_gr_95.gif] are {1,2,3,4}, {1,3,4,2}, and {1,4,2,3}.

2-Dimensional

Stereoisomerism also exists for planar arrangements of atoms, as exist around double bonds.

[Graphics:../Images/index_gr_96.gif]

[Graphics:../Images/index_gr_97.gif] [Graphics:../Images/index_gr_98.gif]
[Graphics:../Images/index_gr_99.gif]

Again, arranging the ligands in a particular order by following certain rules can lead unambiguously to the structure on the left or the one on the right.

Other Representations

[Graphics:../Images/index_gr_100.gif]

[Graphics:../Images/index_gr_101.gif]
SMILES

SMILES = Simplified Molecular Input Line Entry System [Wei88]

[Graphics:../Images/index_gr_102.gif]
[Graphics:../Images/index_gr_103.gif]
[Graphics:../Images/index_gr_104.gif]
[Graphics:../Images/index_gr_105.gif]
Adjacency Lists
[Graphics:../Images/index_gr_106.gif]
[Graphics:../Images/index_gr_107.gif]
[Graphics:../Images/index_gr_108.gif]
[Graphics:../Images/index_gr_109.gif]
[Graphics:../Images/index_gr_110.gif]
[Graphics:../Images/index_gr_111.gif]
[Graphics:../Images/index_gr_112.gif]
[Graphics:../Images/index_gr_113.gif]
[Graphics:../Images/index_gr_114.gif]
[Graphics:../Images/index_gr_115.gif]