Skip to content

QM MM Tutorial

Susanne Sauer edited this page May 4, 2020 · 5 revisions

In this tutorial you will learn how to perform a QM/MM calculation in CAST with a protein structure you downloaded from the Protein Data Bank.

Content

  1. External programs you need
  2. Creation of tinkerstructure
    1. Clean up pdb and convert to xyz (Pymol)
    2. Convert xyz to tinker (CAST)
  3. Choosing the QM region
  4. The real calculations
    1. First calculation
    2. Electrostatic vs. mechanical embedding
    3. Subtractive QM/MM
    4. Three-layer
  5. Conclusion

External programs you need

Creation of tinkerstructure

The first thing we have to do is to prepare a tinkerstructure (for explanation of this format see here) that contains the protein system we want to investigate. The Protein Data Bank contains a lot of protein structures which can be downloaded. Here we will use the structure 2p7u which contains the enzyme rhodesain bound to an inhibitor.

Clean up pdb and convert to xyz (Pymol)

The preferred program to process pdb files is pymol. So now we open pymol and first get our structure. This is done by typing fetch 2p7u into the command line inside this program. Now the protein-inhibitor system appears in the viewer. In order to better investigate the system we can display the aminoacid sequence by typing set seq_view, 1 . There are a lot of aminoacids, displayed by their one-letter-code, the symbol "D1R" which is the inhibitor and a lot of water molecules, displayed by "O". We want to take a closer look at residue S88 which is a Serin. So we click on this letter in the sequence, then type hide all to hide everything and then display the selected residue as sticks by typing show sticks, sele . We see that the serin residue looks strange. It seems to be there twice. This is called alternate locations and means that in the crystal where the structure was taken from the residue had one position and in others it had the other position. As CAST can't deal with that we remove all locations except the first (for the whole system) by typing remove not (alt ''+A), followed by alter all, alt=''. We see that the serin residue looks "normal" now. The second thing we have to do before we can use the structure in CAST is to add hydrogen atoms which are not present in most X-ray structures. For some reason it is not possible to do that straight ahead but we have to save the structure without alternate locations and open it again. We do that by typing:

save 2p7u.pdb
delete all
load 2p7u.pdb

Now we can add hydrogen atoms by h_add. We immediately see how the single oxygen atoms are converted into complete water molecules.

For most structures we could now save our molecule as .xyz file and open it by CAST. However in this structure there is a small mistake in the structure. We take a closer look at the C-terminal Glycin. We see that the C-terminus doesn't end in a carboxyl group but in an aldehyde. To correct this we have to convert the hydrogen atom into an OH group. We do this by using the "Builder" which is opened by clicking on this button in the upper right corner of Pymol. In the window that appears we click on the button "O". Pymol tells us to pick atoms to replace with Oxygen so we click on the aldehyde H. According to our wishes it immediately gets converted into an OH group. So now we save our structure as a xyz-file by typing save 2p7u.xyz and as a pdb-file with save 2p7u.pdb Then we can close pymol.

Convert xyz to tinker (CAST)

Now we will use CAST to convert this xyz-file to tinker format. So we open the CAST inputfile "CAST.txt". In it we have do choose the following options:

name                   2p7u.xyz              # name of our inputfile
outname                output                # name of the outputfile (can be anything)
inputtype              XYZ                   # input filetype: XYZ
xyz_atomtypes          1                     # automatically create OPLSAA atomtypes
task                   WRITE_TINKER          # write a tinker stucture
interface              DFTB                  # can be anything except forcefields

Now we run CAST. In the folder where we run it we need the inputfile and the structurefile. CAST now creates a tinkerfile "output.arc" which can be opened py Pymol. We see our molecule here. But what is more interesting is the terminal output of CAST which tells us for which atoms no forcefield atomtypes could be created. It says:

No atomtypes found for (to be copied into pymol):
select sele, id 339+340+341+342+343+344+345+346+347+348+3087+3088+3089+3090+3091+3092+3093+3094+3095+3096+3097+3098+3099+3100+3101+3102+3103+3104+3105+3106+3107+3108+3109+3110+3111+3112+3113+3114+3115+3116+3117+3118+3119+3120+3121+3122+3123+3124+3125+3126+3127+3128+3129+3130+3131+3132+3133+3134+3135+3136+3137+3138+3139+3140+3141+3142+3143+3144+3145+3146+3147+3148+3149+3150+3151+3152+3153+3154+3155+3156+3157+3158+3159+3160+3161+3162+3163+3164+3165+3166

We do what we are told here so we open pymol again and open the pdb-file we have saved before by load 2p7u.pdb. Then we copy the stuff above into the pymol command line, then do hide all and show sticks, sele. In the viewer we see that the atoms that are not recognized are the inhibitor and the cystein residue to which the inhibitor is bound. So what we have to do now is to assign atomtypes to these atoms. This is a very annoying task where you have open the .arc file in a text editor, look at every atom that is not assigned yet (the first one here is 339) and determine the atomtype by comparing it to the description of the atomtypes in the forcefield parameter file. Our parameterfile is called "oplsaa_mod2.prm" and you can download it from here. It is an extension of the file "oplsaa.prm" that you can download from the tinker homepage. To see which index belongs to which atom we can do label sele, ID to show the atom indices as labels in pymol. If they are wrong maybe pymol has changed the order of the atoms which you can prevent by set retain_oder, 0. So in our example atom 339 is a backbone nitrogen atom. In the parameter file you find "Amide -CO-NHR" as description for atomtype 180 which seems to be suitable here so you replace the 0 in the file "output.arc" by a 180. The same we have to do for all the other not-recognized atoms. To make it easier for you the resulting atom types are shown here:

   339  N     -1.747000    1.908000    9.073000   180   330   340   348
   340  C     -1.798000    1.114000    7.844000   165   339   341   343   345
   341  C     -1.972000    2.012000    6.595000   177   340   342   349
   342  O     -1.332000    1.765000    5.554000   178   341
   343  C     -2.949000    0.108000    7.940000   156   340   344   346   347
   344  S     -4.562000    0.878000    8.259000   144   343  3121
   345  H     -0.852365    0.582917    7.735252    85   340
   346  H     -2.733424   -0.544419    8.786154    85   343
   347  H     -3.007901   -0.440157    6.999705    85   343
   348  H     -2.329688    1.638838    9.852824   183   339
  ...
  3087  C     -6.347000   -2.406000   12.971000    90  3088  3089  3131
  3088  C     -5.097000   -1.976000   12.455000    90  3087  3090  3094
  3089  C     -6.647000   -2.151000   14.324000    90  3087  3091  3128
  3090  C     -4.157000   -1.364000   13.278000    90  3088  3092  3129
  3091  C     -5.709000   -1.499000   15.159000    90  3089  3092  3130
  3092  C     -4.463000   -1.116000   14.660000    90  3090  3091  3132
  3093  N     -9.278000    3.944000    8.013000   180  3102  3103  3133
  3094  S     -4.835000   -2.299000   10.934000   434  3088  3100  3101  3122
  3095  N     -7.134000    1.467000    9.526000   180  3111  3112  3134
  3096  O    -10.829000    2.974000    9.301000   178  3102
  3097  N    -11.030000    5.210000    8.817000   181  3102  3124  3126
  3098  O     -6.847000    3.703000    9.206000   178  3111
  3099  N    -11.602000    7.717000    9.856000   770  3123  3125  3127
  3100  O     -3.500000   -2.490000   10.564000   435  3094
  3101  O     -5.549000   -3.532000   10.721000   435  3094
  3102  C    -10.396000    4.020000    8.745000   177  3093  3096  3097
  3103  C     -8.593000    2.648000    7.976000   171  3093  3104  3111  3135
  3104  C     -7.950000    2.459000    6.575000    81  3103  3105  3136  3137
  3105  C     -8.871000    2.619000    5.395000    90  3104  3106  3110
  3106  C    -10.197000    2.237000    5.529000    90  3105  3107  3138
  3107  C    -11.056000    2.363000    4.423000    90  3106  3108  3139
  3108  C    -10.525000    2.854000    3.235000    90  3107  3109  3140
  3109  C     -9.216000    3.229000    3.123000    90  3108  3110  3141
  3110  C     -8.344000    3.082000    4.219000    90  3105  3109  3142
  3111  C     -7.445000    2.639000    8.957000   177  3095  3098  3103
  3112  C     -6.002000    1.319000   10.483000   171  3095  3113  3121  3143
  3113  C     -6.546000    0.977000   11.887000    81  3112  3114  3144  3145
  3114  C     -7.249000    2.153000   12.572000    81  3113  3115  3146  3147
  3115  C     -7.579000    1.822000   13.996000    90  3114  3116  3120
  3116  C     -8.535000    0.853000   14.277000    90  3115  3117  3148
  3117  C     -8.802000    0.508000   15.609000    90  3116  3118  3149
  3118  C     -8.138000    1.187000   16.632999    90  3117  3119  3150
  3119  C     -7.220000    2.179000   16.348000    90  3118  3120  3151
  3120  C     -6.942000    2.534000   15.028000    90  3115  3119  3152
  3121  C     -4.976000    0.231000    9.871000   149   344  3112  3122  3153
  3122  C     -5.592000   -1.166000    9.865000    81  3094  3121  3154  3155
  3123  C    -11.962000    9.034000   10.351000   733  3099  3156  3157  3158
  3124  C    -10.570000    6.377000    8.057000   736  3097  3125  3159  3160
  3125  C    -10.508000    7.641000    8.909000   736  3099  3124  3161  3162
  3126  C    -12.299000    5.387000    9.530000   736  3097  3127  3163  3164
  3127  C    -12.254000    6.566000   10.473000   736  3099  3126  3165  3166
  3128  H     -7.610201   -2.458744   14.730986    91  3089
  3129  H     -3.187489   -1.072335   12.874167    91  3090
  3130  H     -5.961298   -1.294353   16.199465    91  3091
  3131  H     -7.062234   -2.924221   12.332260    91  3087
  3132  H     -3.731841   -0.635221   15.309891    91  3092
  3133  H     -8.922134    4.743480    7.508726   183  3093
  3134  H     -7.687429    0.653530    9.297783   183  3095
  3135  H     -9.320177    1.870370    8.209679    85  3103
  3136  H     -7.614879    1.422224    6.545166    85  3104
  3137  H     -7.148853    3.191208    6.474326    85  3104
  3138  H    -10.567554    1.845370    6.476320    91  3106
  3139  H    -12.107267    2.084047    4.494559    91  3107
  3140  H    -11.174785    2.942572    2.364348    91  3108
  3141  H     -8.846310    3.642287    2.184585    91  3109
  3142  H     -7.285611    3.328384    4.134093    91  3110
  3143  H     -5.445266    2.246964   10.613503    85  3112
  3144  H     -5.689773    0.710201   12.506511    85  3113
  3145  H     -7.257367    0.156625   11.791911    85  3113
  3146  H     -8.176520    2.362567   12.039188    85  3114
  3147  H     -6.593601    3.023665   12.549780    85  3114
  3148  H     -9.074257    0.363943   13.465751    91  3116
  3149  H     -9.518294   -0.279705   15.842545    91  3117
  3150  H     -8.347087    0.930565   17.671568    91  3118
  3151  H     -6.707439    2.689765   17.163168    91  3119
  3152  H     -6.247972    3.343511   14.801908    91  3120
  3153  H     -4.071056    0.095607   10.463323    85  3121
  3154  H     -6.616967   -1.050255   10.217344    85  3122
  3155  H     -5.528267   -1.560928    8.851062    85  3122
  3156  H    -12.950261    8.991778   10.808884    85  3123
  3157  H    -11.231688    9.355893   11.093382    85  3123
  3158  H    -11.975821    9.742730    9.522984    85  3123
  3159  H    -11.286585    6.553201    7.254779    85  3124
  3160  H     -9.574313    6.169541    7.664994    85  3124
  3161  H     -9.576827    7.617278    9.475086    85  3125
  3162  H    -10.550103    8.510090    8.252486    85  3125
  3163  H    -12.483510    4.489136   10.119827    85  3126
  3164  H    -13.094086    5.547143    8.801787    85  3126
  3165  H    -13.277838    6.844875   10.722168    85  3127
  3166  H    -11.702019    6.283206   11.369350    85  3127

After inserting the missing atomtypes we can test if everything works by just performing a singlepoint calculation with CAST and the OPLSAA forcefield. These are the options we have to change:

name                   output.arc           # name of our inputfile
inputtype              TINKER               # input filetype: tinker
task                   SP                   # do a singlepoint calculation
interface              OPLSAA               # use OPLSAA forcefield
paramfile              oplsaa_mod2.prm      # file with the forcefield parameters

Now we run CAST. Besides the inputfile and the structurefile we now also need the forcefield parameter file which fits with the atomtypes that we have assigned in the step before. As a result we should get a total energy of -3873.511217990343 kcal/mol.

Now we have created a structure file that we can use for any kinds of CAST calculations. In the next steps we will define a QM region and then perform a QM/MM calculation.

Choosing the QM region

For doing a QM/MM calculation the next thing we have to do is choosing region which should be described by our QM program. This region will be given to CAST as a list of atom indices which are separated by commas. So we should now create this list. The easiest thing to do this is again the use of pymol. In order to do that we first convert our tinkerstructure back to a pdb file which can be better handled by pymol. (Of course we could also use the original pdb file but this is an approach which you can also use if your original structure is no pdb.) Converting tinkerstructures to pdb can be very easily done by CAST. The only option we have to change is task which we set to WRITE_PDB. Then we run CAST.

We get a structure "output.pdb" which we can open with pymol. What is very important is that pymol somtimes changes the order of the atoms. To avoid this the first thing we do after launching pymol is typing set retain_order, 1 into the command line. Then we display the sequence by set seq_view, 1 . If we take a closer look at the sequence we see 2 unkown residues XXX: The first is displayed between residues 24 and 25, the second at the end of the protein chain at field 215. Interestingly the first XXX also has the residue number 215, so in reality it is only one residue. If we select those residues and show them we see that the first part is the cystein where the inhibitor is bound to and the second is the inhibitor itself. The explanation of this is that CAST recognizes the cystein together with the bound inhibitor as one modified aminoacid that it doesn't know. So it puts them into one residue with the name XXX. But as the cystein atoms are between the atoms of S24 and W25 in the structure and we told pymol not to change the order of the atoms they are displayed in two different places in the sequence.

So now we want to choose the atoms of our QM region. The QM region should be the one you see in the picture below. So all atoms are part either of the inhibitor, the bound cystein and the aminoacid histidine 161. So we select these 3 residues in the sequence and show only them by typing hide all and show sticks, sele. So now we want to select just certain atoms in this residues. For this we have to switch the selection mode by clicking on Mouse -> Selection Mode -> Atoms. Then we click on all the atoms that are part of our QM region. At the end that should be 43 atoms. Then we can display the atom indices by typing x = ','.join(str(x[1]) for x in cmd.index('sele')) , followed by print(x). The output should be:

343,344,346,347,2271,2272,2273,2274,2275,2276,2278,2279,2281,2282,2283,3087,3088,3089,3090,3091,3092,3094,3095,3098,3100,3101,3111,3112,3113,3121,3122,3128,3129,3130,3131,3132,3134,3143,3144,3145,3153,3154,3155

These are the atom indices that define the QM region of our QM/MM calculation. We copy this into a textfile and save it until we need it in the next step.

The real calculations

First calculation

Now we finally really want to start the QM/MM calculation. So we open the inputfile "CAST.txt" again and set a few options:

task                  SP               # do a singlepoint calculation
interface             QMMM_A           # use additive QM/MM interface
paramfile             oplsaa_mod2.prm  # file with the forcefield parameters for MM

#################### QM/MM OPTIONS ##########################

QMMMqmatoms            343,344,346,347,2271,2272,2273,2274,2275,2276,2278,2279,2281,2282,2283,3087,3088,3089,3090,3091,3092,3094,3095,3098,3100,3101,3111,3112,3113,3121,3122,3128,3129,3130,3131,3132,3134,3143,3144,3145,3153,3154,3155     # these is the atom list from above
QMMMmminterface       OPLSAA         # interface for MM part
QMMMqminterface       DFTB           # interface for QM part (DFTB because it is fast) 
QMMMwriteqmintofile   1              # we want to see our QM region as seperate structure

QMMMlinkatomtype      85, 85, 221, 85

QMMMcenter            0              # central atom of QM region, 0 = default       

QMMMzerocharge_bonds  3              # default option for electrostatic embedding

#################### OPTIONS FOR DFTB+ ##########################
DFTB+path             /apps/dftbplus/bin/dftb+    # path to dftb+
DFTB+skfiles          /apps/dftbplus/mio-1-1/     # path to slater koster files
DFTB+charge           0                           # total charge of the QM system

Something where we have to pay attention is the option "QMMMlinkatomtype". When you look at the system we have chosen as QM region you see that there are 4 bonds that are cut in order to get it. This system will now be put in a QM program (or better semiempirical program, if you want to be very exact) and QM programs don't like loose bonds. So all the loose bonds are saturated by hydrogen atoms. With the option "QMMMlinkatomtype" you have to give the forcefield atomtype for those hydrogen atoms in the order of the QM atom they are bound to. So to determine them take your former view in pymol and show labels (label sele, ID) So you see you have the following QM atoms which are bound to a MM atom:

  • index 343: CH2 group, hydrogen gets type 85 ("Alkane H-C")
  • index 2271: CH2 group, hydrogen gets type 85 ("Alkane H-C")
  • index 3111: carbonyle group, hydrogen gets type 221 ("Aldehyde/Formamide H-C=O")
  • index 3113: CH2 group, hydrogen gets type 85 ("Alkane H-C")

Of course the paths to DFTB+ and to the slater-koster have to be adjusted to your system. Download links are https://www.dftbplus.org/download/ for the program itself and https://www.dftb.org/parameters/download/ for the slater-koster files.

Now we run CAST. The result is:

QM-atoms: 43
MM-atoms: 3630
Potentials
           QM             MM            VDW         BONDED           COUL           TOTAL

     -36357.9        -3959.9          -40.3           -0.8            0.0        -40272.9
Total energy: -40272.9 kcal/mol

What we see here is first the energy of only the QM system (which is also written into the file "qm_system.arc" so we can look if we did everything right when defining the QM region), then the energy of only the MM system and then the interactions between those two systems. Those are discriminated into van der Waals interactions and bonded interactions. The coulomb interactions are zero as we used electrostatic embedding so that the coulomb interactions are included into the energy of the coulomb system. As we used an additive scheme here the total energy is just the sum of all the partial energies.

Electrostatic vs. mechanical embedding

In the first calculation we used electrostatic embedding which means that electrostatic interactions between QM and MM system are taken into account by including the charge parameters of the MM atoms into the QM calculation as external charges. As in the case of bonded interactions some of the MM atoms are too close to the QM system and the corresponding interactions are already included in the bonded terms those atoms are not taken into account when creating the external charges. QMMMzerocharge_bonds 3 means that MM atoms are excluded if they are 3 bonds or less between them and the next QM atom. In the same manner you can choose 1 or 2 here then atoms are only excluded of they are directly bound to a QM atom (1) or seperated by either 1 or 2 bonds (2).

We can also use mechanical embedding, i. e. calculating the electrostatic interaction between QM and MM system as pairwise coulomb interactions of the MM charges. In order to do this we this we run the same calculation again but set the option QMMMzerocharge_bonds to 0 for activating mechanical embedding. Most of the partial energies are the same but now the coulomb interaction has a value (6.3) and the value of the QM energy has changed to -36315.9 so the total energy now is -40224.6 kcal/mol.

Subtractive QM/MM

There is also a subtractive QM/MM scheme implemented into CAST. To choose this you have to set the interface to QMMM_S instead of QMMM_A. All the other options are the same for additive and subtractive scheme. If you run the subtractive scheme with the same options as above you get:

QM-atoms: 43
MM-atoms: 3630
Potentials
          MM_big                MM_small                      QM                   TOTAL

         -3873.5                   -17.5                -36357.9                -40213.7
Total energy: -40213.7 kcal/mol

Here the first value is the energy of the total system, calculated by MM method, the next is the energy of the QM system, calculated by the MM method and QM is the energy of the QM system, calculated with the QM method. In the case the total energy is calculated by:

TOTAL = MM_big - MM_small + QM

This approach has a few advantages over the additive scheme. As we calculated the QM system twice and include one of the calculations with a positive and one with a negative sign some errors are cancelled out. Furthermore as only "complete" systems are calculated, no individual interactions as in the additive scheme it is not necessary to use a forcefield as "MM method" but you can also combine tho QM or semiempirical interfaces (just change the option MMinterface to some QM program).

Furthermore it is also possible to combine more than two different methods within the subtractive scheme. In CAST there is a THREE_LAYER interface implemented which can combine three different energy interfaces which is what we want to do in the next step.

Three-layer

In order to perform a three-layer calculation we first set the interface option in the CAST.txt file to THREE_LAYER. Three-layer means we have a region which we want to describe very precisely, in our case we will use the program Psi4 for this. This region is called the small system and the method for it "QM method". Around this region we have another layer, called intermediate layer that should be described by an intermediate method (in our case DFTB, in CAST it is generally called SE for semiempirical). The rest of the molecule is still described by a forcefield.

So we go to the QM/MM options in the CAST inputfile. By QMMMqmatoms we define which atoms we want to include in the inner system, i. e. calculate with Psi4. We take the same atoms as before. The QMmminterface remains OPLSAA. For the QM, i. e. the innermost interface, we choose PSI4. Now we have the choose which atoms belong to our intermediate layer which are defined by QMMMseatoms. We want to apply this method on the benzyle group of the inhibitor which is cut off by the QM system. For getting the atom indices we can again use pymol and get: 3114,3115,3116,3117,3118,3119,3120,3146,3147,3148,3149,3150,3151,3152

As QMMMseinterface we choose DFTB. QMMMsmall_charges is an option for electrostatic embedding the three-layer interface. It is recommended to take either 1 or 2 here, we take 1 . We also have to give an option for the central atom of the small system which is called QMMMsmall_center. As this still doesn't matter we again give 0 which is the default option.

Something else we have to change now are the link atom types. They have to be given for the intermediate system, i. e. for a system consisting of all QM and SE atoms. If you compare this system to the former QM system you notice that the atom with index 3113 doesn't need a link atom any more as the benzyle group is not cut off anymore. So this option is:

QMMMlinkatomtype      85, 85, 221

Furthermore we have to set a few option for the Psi4 interface (of course the path again depends on the location of Psi4 on your machine):

PSI4-path             /apps/psi4/psi4       # path to Psi4
PSI4-basis            6-31G                 # basisset     
PSI4-method           HF                    # calculation method
PSI4-spin             1                     # spin multiplicity (of small system)
PSI4-charge           0                     # charge of small system

PSI4-memory           4GB                   # resources for PSI4
PSI4-threads          4

Now we run CAST. The result is:

QM-atoms: 43
SE-atoms: 14
MM-atoms: 3616
Potentials
      MM_big      MM_middle      SE_middle       SE_small             QM          TOTAL

     -3873.5           39.8       -45287.9       -36359.1     -1130415.1      -1143257.2
Total energy: -1143257.2 kcal/mol

So first the total system is calculated by the MM method (OPLSAA). This number is identical to the one before. Then the middle or intermediate system is also calculated by the MM method. This system consists of all QM and SE atoms, i. e. in our case it contains 43 QM + 14 SE + 3 Link-Atoms = 60 atoms. This system is written into the tinkerstructure "intermediate_system.arc". Then the same system is calculated by the "SE method" (DFTB). Then the small system is calculated by the same method. This system only consists of the QM atoms and can be viewed in the file "small_system.arc". Last but not least this system is calculated by the "QM method" (Psi4). The total energy is then calculated by:

TOTAL = MM_big - MM_middle + SE_middle - SE_small + QM

Conclusion

You have now seen how you can prepare a structure from the Protein Data Bank for calculations with CAST. Furthermore you have learned how to perform QM/MM and three-layer calculations with CAST by the examples of singlepoint (SP) calculations. Of course you can use those energy interfaces also together with the other tasks of CAST like optimizations or MD simulations. Have fun with it!

Date: 29 Aug 2019 (name of interfaces changed on 04 May 2020)

Clone this wiki locally