Skip to content
This repository was archived by the owner on Jul 10, 2024. It is now read-only.
This repository was archived by the owner on Jul 10, 2024. It is now read-only.

E/Z perception on tautomers #5

@tylerperyea

Description

@tylerperyea

In certain cases, unspecified E/Z information is encoded as known (or known E/Z information is lost) based on tautomer generation.

Example 1

Consider the following two structures, which have the same smiles, but are drawn differently (molfiles at the bottom).

Compare:

CN1C(=O)/C(=N\NC(N)=S)C2=CC=CC=C12

cistrans2

2ATKPHXN6-63AZUWLKFU-6U9JBVHA63M-6UM6PRK6J3GX

vs

CN1C(=O)/C(=N\NC(N)=S)C2=CC=CC=C12

cistrans

2ATKPHXN6-63AZUWLKFU-6U9JBVHA63M-6UMV4H3F7CST

Notice that while the smiles representations are exactly the same, the structures still get different hashes based on their initial coordinates. This happens because the cannonical tautomer has a different E/Z bond location than the one drawn above:

CN1C(=O)/C(=N\NC(N)=S)C2=CC=CC=C12

cistrans3

After selecting the prefered tautomer, E/Z is apparently recalculated based on the original atom coordinates. This leads two apparently identical structures to have different hashes.

The resolution to this problem isn't trivial, and is more a shortcoming of valance bond theory than of the encoding in general. This will require a bit of research, and an expert should be consulted. My intuition is that any cis/trans designation should be allowed if (and only if) both involved bonded atoms remain in an sp2 hybridized state across all tautomers (therefore the atoms and their substituents should remain coplanar).

If this is accurate, there is an unfortunate corollary: The prefered tautomer in the above example is either wrong, or should capture cis/trans information about the exocyclic bond, even though it is not explicitly a double bond.

The molfiles for the above structures are posted here for convenience:


  Ketcher 12191320432D 1   1.00000     0.00000     0

 16 17  0     0  0            999 V2000
    0.4048    3.7213    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0739    2.9781    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.0684    3.0827    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.5684    3.9487    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.4752    2.1691    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.4534    1.9612    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.1225    2.7044    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.1006    2.4964    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.4096    1.5454    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.7697    3.2396    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
    1.7321    1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7321    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8660    2.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0     0  0
  2  3  1  0     0  0
  3  4  2  0     0  0
  3  5  1  0     0  0
  5  6  2  0     0  0
  6  7  1  0     0  0
  7  8  1  0     0  0
  8  9  1  0     0  0
  8 10  2  0     0  0
  5 11  1  0     0  0
 11 12  2  0     0  0
 12 13  1  0     0  0
 13 14  2  0     0  0
 14 15  1  0     0  0
 15 16  2  0     0  0
 16  2  1  0     0  0
 16 11  1  0     0  0
M  END

  Ketcher 12191320472D 1   1.00000     0.00000     0

 16 17  0     0  0            999 V2000
    8.1745    3.7213    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.8436    2.9781    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    9.8381    3.0827    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.3381    3.9487    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   10.2449    2.1691    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   11.2231    1.9612    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   11.8922    2.7044    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   11.4775    3.6036    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.1079    4.5454    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   10.7180    3.7039    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
    9.5018    1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.6357    2.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.7697    1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    7.7697    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.6357    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.5018    0.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0     0  0
  2  3  1  0     0  0
  3  4  2  0     0  0
  3  5  1  0     0  0
  5  6  2  0     0  0
  6  7  1  0     0  0
  7  8  1  0     0  0
  8  9  1  0     0  0
  8 10  2  0     0  0
  5 11  1  0     0  0
 11 12  2  0     0  0
 12  2  1  0     0  0
 12 13  1  0     0  0
 13 14  2  0     0  0
 14 15  1  0     0  0
 15 16  2  0     0  0
 16 11  1  0     0  0
M  END

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions