Email Us       |      Client Login

Predicting Liquid Chromatography Retention Times using Computational Chemistry Data Analytics

Liquid chromatography (LC) is a commonly used analytical technique to analyze samples to determine their individual constituents.  LC columns separate mixtures based on each component’s affinity for a polar solvent as well as interaction with the solid phase.  For instance, each mixture’s component’s polarity interacts differently with the mobile solvent carrying some components through the column faster than others.  Each component is, thus, separated from the mixture at different rates or retention times (RTs), which are then plotted on a chromatogram.  See Figure 1 for an example chromatogram obtained applying US EPA’s Method 5371 of an aqueous sample fortified with a mixture of perfluorinated alkylo acids.  Further analysis of the chromatograph with mass spectroscopy results provides the identify and concentration of the eluted compounds when compared to known calibration standards.

Figure 1. A chromatogram for reagent water fortified with different perfluorinated compounds obtained using method 537.
Figure 1. A chromatogram for reagent water fortified with different perfluorinated compounds obtained using method 537.

Data analytics allow chemists, materials scientists, and engineers to find an analytic model, derived by statistical regression, which best correlates chemical properties with descriptors representative of molecular structure.  Statistically robust models ultimately lead to predictive models, capable of making accurate and reliable predictions of the behavior of new compounds.  Considering Figure 1, can we predict the RT for a perfluorinated alkyl acid species based on its molecular structure? Predicting a compound’s RT could help us identify its potential presence in a sample even when known calibration standards are not available or if low-resolution mass spectrometry cannot confirm it.  This technique may also be applicable to liquid chromatography/high-resolution mass spectrometry (LC/HRMS) where structural isomers are isobaric, yet their retention times differ due to the molecular interactions within the column.

Development of such a model starts with data and structure collection for known compounds.  Table 1 shows the LC RTs1 for linear versions of perfluorobutanesulfonic acid (PFBS), perfluorohexanesulfonic acid (PFHxS), and perfluorooctanesulfonic acid (PFOS) along with their molecular structures.

Table 1. Analyte Retention Times (RT) and Corresponding Structure

Formula Analyte Peak # in Figure 1 RT (min) Structure2

























Polarity is a static physical property of a molecule.  Polarizability is a measure of how easily a molecule’s electron cloud can be distorted (polarized) by an external electric field (e.g., the dipoles of a polar solvent).  Polarizability is a relatively simple molecular property to predict using computational chemistry techniques.3  So, can we use data analytics to find a correlation between the predicted polarizabilities of perfluorinated sulfonic acids and their RTs on Table 1?

The relationship between the predicted polarizability for each of the compounds on Table 1 versus their RTs is shown in Figure 2.  As we expected, an obvious relationship exists.  And in fact, for this case, using the trendline analysis tools in Excel®, we find a very good correlation function.

RT (min) = -0.0427*(polarizability)2 + 6.0336*(polarizability) – 191.5442       (1)

Using equation (1), the predicted RTs for the perfluorinated compounds on Table 1 can be calculated and compared to the experimentally determined values.  Consistent with the equation’s squared correlation of 1.0, the predicted RTs have a mean absolute percentage error of 0.40%.[4]

Since the correlation model is so good, we can use it to predict the RT of linear perfluorodecanoic sulfonate acid (PFDS).  Once the polarizability of PFDS is computed, applying equation (1) yields an RT of 21.1 min.  Unfortunately, looking at Figure 1, the predicted RT for PFDS falls in a well-populated section of the chromatogram.  Notably, perfluorodecanoic acid (PFDA) at 20.5 min (peak # 11), N-ethyl perfluorooctanesulfonamidoacetic acid (NEtFOSAA) at 21.3 min (peak # 15), as well as N-methyl perfluorooctanesulfonamidoacetic acid (NMeFOSAA) at 22.0 min (peak # 13) elute in this region.  Obviously, mass spectroscopy helps differentiate the peaks.  Even so, PFDA’s experimental RT of 20.5 min gives some confidence to the prediction of 21.1 min for PFDS.

This example illustrates the powerful idea that a mathematical relationship between a material’s molecular descriptor (polarizability) and its chemical characteristic (RT) can be used to forecast unknown chemical behaviors from calculated properties.

Figure 2. Measured LC RTs versus predicted polarizabilities of the perfluoronated sulfonic acids on Table 1.
Figure 2. Measured LC RTs versus predicted polarizabilities of the perfluoronated sulfonic acids on Table 1.

However, using computational chemistry techniques, we can thoroughly search all of structural conformation space to generate predictions for an entire library of conformers.  For example, small peaks are observed in the chromatogram in Figure 1 at ~14 min attributed to PFHxS isomers.  These isomers are simple branched species based on the parent compound (e.g., perfluoro-1-methyl-pentanesulfonic acid [1-PFHxS]).  Calculating the polarizabilities of these enantiomers enables us to use equation (1) to predict the corresponding RTs (see Table 2).

Linear PFHxS elutes at 14.5 min (see Table 1) and considering the RTs on Table 2, we note that all the isomers are predicted to elute around 14.3 min – before PFHxS and in agreement with the chromatogram shown in Figure 1.  It is also interesting to notice that each isomer has a unique RT showing the potential order to expect (i.e., 3-PFHxS before 1-PFHxS before 4-PFHxS before 2-PFHxS). This information provides additional details about the makeup of the small isomer peak.  Similar calculations can be performed for PFOS (linear PFOS elutes at 18.8 min; the isomers between 18.5-18.6 min).5  Since branched standards are not available for a number of PFASs, predictions of their RTs would be helpful in confirming their presence.

Table 2. Predicted Analyte Retention Times (RT) and Corresponding Structure

Formula Analyte RT (min) Structure2

























Thus, applying data analytics to create robust and reliable correlation models derived from statistical regression of experimental and computational data helps interpret experimental data.  Robust mathematical models can be reliably applied to predict molecular properties or behavior of new materials for a wide variety of applications, for example: toxicity prediction and thermodynamic reaction energies.  These properties are useful for fate and transport modeling as well. Technical projects that develop and apply modeling methods in conjunction with experiments get to practical answers faster and more efficiently than just relying on experiments alone.

For more information or to speak with a computational science professional, please contact Senior Advisor Joseph T. Golab, Ph.D.

1US EPA Method 537 – Determination of Selected Perfluorinated Alkyl Acids in Drinking Water by Solid Phase Extraction and Liquid Chromatography/Tandem Mass Spectrometry

2Carbon – grey, Hydrogen – white, Fluorine – aqua, Oxygen – red, Sulfur – yellow

3The calculated results presented in this discussion are obtained from SPARTAN’16 v2 (

4Normally, more than three data points are necessary to build a robust data analytics model.  This discussion is for illustration purposes.

5Contact for more details.