Computational Chemistry Workflow

A computational chemistry workshop for beginners.

Eric Van Dornshuld https://dornshuld.chemistry.msstate.edu (Mississippi State University)https://chemistry.msstate.edu
2022-03-05

Computational chemistry is a powerful tool to model physical systems and make useful predictions. This workshop is written for the absolute beginner and will guide the user to running their first computational jobs.

In this workshop we will


Setup

To be able to perform computations at MCSR, you will need to obtain an account as well as download and install a variety of software.

Obtain MCSR Account

The Mississippi Center for Supercomputing Research (MCSR) is housed at the University of Mississippi. Any student in the Mississippi educational system can gain a free account to access and use these computers for research.

Request an account by visiting http://cypress.mcsr.olemiss.edu/resdb2/user_info.html and filling out the form. Use your official university email address. Your account information will be emailed to you along with a temporary password. You will be required to change this temporary password on your first login.

PuTTY

Download and install PuTTY for your operating system. This program is an SSH client that will be used to connect to the supercomputers at MCSR.

AMPAC 10 AGUI

The Department of Chemistry has a license to use AGUI, a software package that allows the user to build, view, and modify molecular structures and computational results.

You will receive an email with the download information as well as the license key file. Please follow the directions carefully. In short, you will (1) download and install the software and (2) copy the key file to the appropriate directory as stated in the email.

The manual for this program is located under the “Help” dropdown menu and clicking “Agui Help.” The keyboard shortcut is F1.

Notepad++

This program is not required but highly recommended. It is an enhanced variation of Microsoft’s Notepad program and offers useful features. Download it at https://notepad-plus-plus.org/downloads/.

WinSCP

This program allows you to easily move files between your computer and the supercomputers. Install it here: https://winscp.net/eng/index.php


Required Workshop Materials

When attending the workshop, you must bring

  1. Laptop with WiFi capability with all software already installed
  2. Your MSCR account information
  3. An external mouse

Bring an AC adapter with your laptop as the workshop will take a couple of hours.


Connect to MCSR

Open PuTTY and connect to MCSR. Under the “Host Name” box, input

hpcwoods.olemiss.edu

If you are met with a PuTTY Security Alert prompt, click Accept. Enter your username and press ENTER followed by your password. You will now be logged in. Your current “position” in the system is your HOME directory and is the default every time you log in. The PATH to this directory is ~/. No matter where you are at in a directory structure, you can always return to home by using the cd (change directory) command

cd ~/

Computer Structure

The computer named hpcwoods is the gateway into the entire supercomputer system. This node is not used to run computational jobs. It serves as the head node between all the other clusters. It shares the same filesystem that every supercomputer cluster uses.

There are various supercomputer clusters, each with a name and with nodes of similar computer hardware (processors, RAM, disk space, etc.). We will primarily being using the sequoia cluster to run our jobs.

sequoia

Log into the sequoia cluster by typing

ssh sequoia.mcsr.olemiss.edu

and using the appropriate temporary password provided. You will be required to change this password upon logging in. Once again, you will be placed in your HOME directory every time you log in.

The sequoia cluster contains many nodes that are designed to run computational jobs. “Computational jobs” are computing tasks that are performed using any of the pre-installed software or locally served programs.

Queue

Given that the cluster serves a public audience, a queuing system manages the incoming jobs. Jobs with lower computational expense are usually prioritized over those demanding a lot of resources and time.

To view the current queue of jobs, type

qstat

A list of jobs (R - running; Q - queued) will appear. You are presented with the following information:

Property Description
Job id Unique identifier for the job
Name User provided name for the job
User The username of the person who submitted the job
Time The amount of time the job has been running (h:min:s)
S The status of the job (R - running; Q - queued; E - error)
Queue The queue that the job was served to

Job ID

You can view the details of a job. Choose any job in the list and note the number in the Job ID. You can view the details of the job by typing

qstat -f <job id number>

where <job id number> is the number you chose from the list.

Login with WinSCP

Open WinSCP and create a connection similar to what you did in PuTTY. You will need the domain name for hpcwoods, username, and password.


Setup chem Directory

It is critical that we maintain a consistent, well-organized filesystem for our projects. While it may not be immediately realized, it will become clear later down the road as your project(s) grow in size and complexity (think hundreds or thousands of files).

First Law of Directorydynamics: Everything will exist in the chem directory.

Create a new folder called chem in your HOME directory. First, ensure that you are in the HOME directory by typing

cd ~/

then create the new folder using mkdir and pressing ENTER.

mkdir chem

To list the folders and files in the current directory, type

ls

You should see your brand new chem folder in the list.

Now we will change into that directory using cd.

cd chem

You can see the absolute path to your current location anytime by using the “print working directory” command pwd.

pwd

The absolute path begins at the ROOT directory /.

To move backwards one directory, type

cd ..

Type pwd once again and see how the path has changed. Return to the chem directory.

Setup Project Directory

Second Law of Directorydynamics: Each project will exist in its own directory.

When juggling multiple computational projects, it is critical to keep each one in their own directory. All files and folders related to that project will exist in that project folder.

Let us create a folder for our workshop project. Since we are going to characterize the energy difference between staggered and eclipsed ethane, let’s call this directory ethane. Create the folder now.

mkdir ethane

Move into the ethane directory.

cd ethane


Create LOT Folder

This next rule can go one of two ways.

Third Law of Directorydynamics: Each level of theory (LOT) will exist in its own directory OR each molecule/compound will exist in its own directory.

Example:

Option 1: Level of theory first
~/chem/project/level-of-theory-1/cpd01
~/chem/project/level-of-theory-1/cpd02
~/chem/project/level-of-theory-1/cpd03

~/chem/project/level-of-theory-2/cpd01
~/chem/project/level-of-theory-2/cpd02
~/chem/project/level-of-theory-2/cpd03

or

Option 2: Compound first
~/chem/project/cpd01/level-of-theory-1
~/chem/project/cpd01/level-of-theory-2
~/chem/project/cpd01/level-of-theory-3

~/chem/project/cpd02/level-of-theory-1
~/chem/project/cpd02/level-of-theory-2
~/chem/project/cpd02/level-of-theory-3

There are merits to either structure (though some prefer one more than the other).

For our workshop, we will go with Option 1. Note that “level-of-theory” can be further divided into “method” and “basis” folders. Consider this example:

Level of theory broken down by method and basis
~/chem/project/method-1/basis-1/cpd01
~/chem/project/method-1/basis-2/cpd01
~/chem/project/method-1/basis-3/cpd01
~/chem/project/method-1/basis-1/cpd02
~/chem/project/method-1/basis-2/cpd02
~/chem/project/method-1/basis-3/cpd02

~/chem/project/method-2/basis-1/cpd01
~/chem/project/method-2/basis-2/cpd01
~/chem/project/method-2/basis-3/cpd01
~/chem/project/method-2/basis-1/cpd02
~/chem/project/method-2/basis-2/cpd02
~/chem/project/method-2/basis-3/cpd02


All that aside, let us build our directories. First, we must choose some methods and basis sets (tabulated below).

Method Basis Set
HF aDZ
B3LYP aTZ
MP2 aQZ

Note that I have listed three methods and three basis sets. We will be running every combination of methods-to-basis sets. Therefore, this will be a total of 9 levels of theory to run on each of the two molecules (for a total 18 jobs).

Create the following directories:

~/chem/ethane/hf/aDZ/staggered
~/chem/ethane/hf/aDZ/eclipsed
~/chem/ethane/hf/aTZ/staggered
~/chem/ethane/hf/aTZ/eclipsed
~/chem/ethane/hf/aQZ/staggered
~/chem/ethane/hf/aQZ/eclipsed

~/chem/ethane/b3lyp/aDZ/staggered
~/chem/ethane/b3lyp/aDZ/eclipsed
~/chem/ethane/b3lyp/aTZ/staggered
~/chem/ethane/b3lyp/aTZ/eclipsed
~/chem/ethane/b3lyp/aQZ/staggered
~/chem/ethane/b3lyp/aQZ/eclipsed

~/chem/ethane/mp2/aDZ/staggered
~/chem/ethane/mp2/aDZ/eclipsed
~/chem/ethane/mp2/aTZ/staggered
~/chem/ethane/mp2/aTZ/eclipsed
~/chem/ethane/mp2/aQZ/staggered
~/chem/ethane/mp2/aQZ/eclipsed


Build a molecule

Before we can begin running any computations, we must build a molecule (or system) that we wish to model. We will use AGUI to do this. Open AGUI now.

You will see two windows appear. The main window contains all the menus and buttons. The second window (with a purple background) is the builder window. This is where we build the molecule visually.

Notice in the main window the 3D image of methane. If you were to click in the builder window, a methane would appear. You can change the molecule/fragment/atom by using the “Element Fragment,” “Ring Fragment,” “R-Group Fragment,” “Biological Fragment,” or “Custom Fragment” buttons in the toolbar. These same selections appear in the “Builder” dropdown menu.

For this workshop, we will build the eclipsed and staggered conformations of ethane in Cartesian coordinates (XYZ coordinates) and are provided below.

Staggered Ethane, D3d
C         0.000000000000      0.000000000000      0.770391000000
H         0.000000000000      1.013293467191      1.172487000000
H        -0.877537884076     -0.506646733595      1.172487000000
H         0.877537884076     -0.506646733595      1.172487000000
C         0.000000000000      0.000000000000     -0.770391000000
H         0.877537884076      0.506646733595     -1.172487000000
H         0.000000000000     -1.013293467191     -1.172487000000
H        -0.877537884076      0.506646733595     -1.172487000000
Eclipsed Ethane, D3h
C         0.000000000000      0.000000000000      0.770000000000
H        -0.510345941849      0.883945100720      1.169020000000
H        -0.510345941849     -0.883945100720      1.169020000000
H         1.020691883699      0.000000000000      1.169020000000
C         0.000000000000      0.000000000000     -0.770000000000
H        -0.510345941849     -0.883945100720     -1.169020000000
H        -0.510345941849      0.883945100720     -1.169020000000
H         1.020691883699      0.000000000000     -1.169020000000

To build the molecule, be sure to have the “Carbon Tetrahedral” molecule selected (it is the default molecule when AGUI loads) and click somewhere in the builder window. You can rotate the view in the builder window by holding down the left-mouse button and dragging. Holding down the right-mouse button allows you to zoom in and out.

Next, left-click on one of the hydrogen atoms in the builder window. This will add a methyl group at that location and replace the hydrogen atom with a carbon atom. We now have an ethane molecule!

We can do better though. Currently we have a staggered ethane molecule. The point group should be D3d. Let us enforce this symmetry on the structure. Go to Tools → Point Group. Be sure the “Enable Point Group Symmetry” box is checked. In the “Constrain to subgroup” dropdown menu, select D3d and click “Ok.” You should now see the point group indicated in the bottom right of the window.

Now we need to grab the XYZ coordinates for this molecule. Go to “Calculate” → “Gaussian Calculation Setup” and click the “Preview” tab. You should see the 8 atoms listed with their corresponding coordinates. Highlight these lines and copy them. Open up Notepad++ (or just Notepad) and paste the coordinates. Leave Notepad open. We will come back to these.

Next, we will build eclipsed ethane. We can simply do a modification to the staggered ethane we have already built! All we need to do is manually set a dihedral angle. However, we must first remove the symmetry constraint of D3d. Go to “Tools” → “Point Group” and change the “Constrain to subgroup” selection to something of lower symmetry (e.g. D3, C1, etc.). Click “Ok.”

Next, go to “Builder” → “Modify Dihedral.” In the builder window left-click a hydrogen atom, the adjoining carbon atom, the next adjoining carbon atom, and finally an adjoining hydrogen atom. In the pop-up window, set the angle to be 0. You can type the value in the box. Click “Ok.”

Now our molecule looks like an eclipsed ethane! Before we grab the coordinates, let us set the point group once again but this time to D3v. Do this now.

Once the symmetry is set, obtain the XYZ coordinates once again and paste these in an open Notepad window.


Create Input File

Let us create our first input file at the HF/aDZ level of theory for the staggered ethane molecule. Navigate to the appropriate directory.

cd ~/chem/ethane/hf/adz/staggered

We will use vim as our text editor. Open vim using and provide the filename 1.inp. We will use this name for all of our input files.

vim 1.inp

We are now in vim and we see a blank document. vim has two main modes: C is command mode and I is insert mode. When loading vim, we are in command mode. Command mode is the mode to use commands such as Write and Quit. To begin typing, we must enter insert mode. To do so, tap the i key on the keyboard.

We are going to create an input file for Gaussian 09, the quantum software package that we will be using. Our input file should look like the following:

%mem=3800MB
%chk=1.chk
%nprocs=4
#P HF/aug-cc-pVDZ opt freq

staggered ethane

0,1
C         0.000000000000      0.000000000000      0.770000000000
H        -1.020691883699      0.000000000000      1.169020000000
H         0.510345941849     -0.883945100720      1.169020000000
H         0.510345941849      0.883945100720      1.169020000000
C         0.000000000000      0.000000000000     -0.770000000000
H         1.020691883699      0.000000000000     -1.169020000000
H        -0.510345941849     -0.883945100720     -1.169020000000
H        -0.510345941849      0.883945100720     -1.169020000000

You can copy and paste this into vim. Depending on your system setup, you can paste into vim by using the right-mouse button or the middle-mouse button. Be sure that the file is terminated with a blank line. You can hit ENTER on the keyboard.

Now we are ready to save the file and quit the program. To do this, enter command mode by hitting the ESC key on the keyboard a couple of times. Now type the following command and hit ENTER.

:wq

This tells vim to write (e.g. save) and quit. If you wanted to quit vim without saving anything, type :q! to force-quit.

You can see your new file in the folder by typing

ls


PBS File

We will need a submit script (called a PBS file) to submit our job to the queue. I happen to have one available that you can copy over! Type the following command.

cp ~r1315/g09.pbs .

This command copies (cp) the g09.pbs file from my HOME directory to your current location (.).

Confirm that the file is present.

ls

Open the PBS file by typing

vim g09.pbs

Notice at the very top the following lines.

#PBS -l nodes=1:ppn=4
#PBS -l mem=4000mb
#PBS -W umask=022

We must make the PBS file reflect what we are telling Gaussian 09 to use resource-wise. Our input file calls for 3800MB of ram and 4 processors (nprocs=4). Our PBS file is already configured correctly. Note that your PBS file should always call a little more memory than what your input file requests.


Submit the Job

We are now ready to submit the job to the queue. To do so, type the following command and hit ENTER.

qsub g09.pbs

You can check on the status of your job by typing

qstat

or, to filter out jobs submitted by you, type

qstat -u <username>

where is your username!


Evaluate the Output File

Open the output file (called 1.out) with vim. Ensure the job finished. Jump to the very bottom of the file by typing SHIFT+G on the keyboard. You should see the typical “quote” printed by Gaussian and a message saying “Normal termination.” If you see it, that’s great! The job finished.

We should now verify a couple things about the job to make sure it did what we wanted! Close the file with :q.

Check the Point Group

Remember how we enforced a point group on our geometry? We should make sure that the structure optimized with the proper point group.

Let’s check an eclipsed geometry. Type the following and hit ENTER

grep "Full point group" 1.out | tail -1

This command finds the string “Full point group” in the output file and prints the very last string that is found. If it shows “D3H,” it means that the final step of the optimization had the geometry in the correct symmetry!

Check your staggered geometries and ensure that the point group was D3D.

Check the Stationary Point

We need to make sure the geometry optimized to an appropriate stationary point on the potential energy surface. A staggered ethane molecule should be a true minumum with zero imaginary modes of vibration whereas the eclipsed geometry should be a transition state with exactly one imaginary mode of vibration. Since we ran a “frequency” job using the freq keyword, we can check this.

Run the following command

grep Frequencies 1.out

You will see three columns of frequencies ordered from smallest to largest. An imaginary mode is printed as a “negative number” by Gaussian. For a staggered geometry, all the frequencies should be a positive number whereas for our eclipsed geometry, we should see exactly 1 imaginary frequency (or 1 “negative” frequency).

Grab the Final Energy

If everything checks out up to this point, we can now grab the energy of the structure. For any HF or DFT job, we can use the following command.

grep "SCF Done:" 1.out | tail -1

For MP2 jobs, use

grep EUMP2 1.out | tail -1 | awk '{print $4, $NF}

The MP2 energy is given as scientific notation.

You will see an energy printed to the screen. The energy is an absolute energy and is given in units of Hartrees. We should compile these energies into an Excel file. Be sure to note the directory path, structure, and level of theory.

To get the directory path, type the following in the terminal and hit ENTER.

pwd

Your Excel file should have the form

<PATH> <STRUCTURE> <METHOD> <BASIS> <NUM IMG MODES> <ENERGY>

References

Citation

For attribution, please cite this work as

Dornshuld (2022, March 5). Computational Chemistry Workflow. Retrieved from https://dornshuld.chemistry.msstate.edu/notes/comp-chem/first-workshop/index.html

BibTeX citation

@misc{dornshuld2022computational,
  author = {Dornshuld, Eric Van},
  title = {Computational Chemistry Workflow},
  url = {https://dornshuld.chemistry.msstate.edu/notes/comp-chem/first-workshop/index.html},
  year = {2022}
}