Crystal structure prediction

Crystal structure prediction

Predicting the structure of a three-dimensional molecular crystal is an ongoing challenge, particularly for organic materials with several known polymorphs.   This is a problem that impacts fields ranging broadly from pharmaceuticals to organic semiconductors to high-energy materials, to name just a few.  The properties of these crystals can be depend sensitively on the particular polymorph into which the molecules arrange.  This problem can sometimes have unanticipated effects, particularly when the existence of polymorphs is undetected at the time a particular crystalline product is put on the market.  A dramatic example is the drug Ritonavir, an HIV protease inhibitor used in AIDS therapies, which was introduced in 1996.  At the time it became publicly available, only one crystal structure was known, and this structure was water soluble and, therefore, bioavailable.  It was soon discovered, however, that large lots of the drug, left alone on the shelf, were failing the dissolution test, causing them to lose their efficacy.   The reason for this appeared to be the spontaneous transformation of the crystals to a hitherto unknown and more stable polymorph caused by a small seed of this new structure in the crystalline samples.  The two structures are now known as form I and form II, form II being the more stable of the two.  This transformation resulted in a recall of the drug from the market in 1998, and it did not appear again, in a new formulation that resisted this transformation, until 2002.  It turns out that drugs as common as ranitidine hydrochloride (Zantac®) are polymorphic, with two forms both being bioavailable, which is why generic formulations of the drugs are available.  Aspirin is also another common drug that possesses polymorphs, and it is thought that more polymorphs are yet to be discovered.

Discovery of polymorphs experimentally is time consuming, difficult, and in many cases, expensive, and for this reason, computational methods can potentially play a major role in searching out and ranking polymorphs of molecular crystals.   The importance of crystal structure prediction is clearly recognized in the theoretical and computational chemistry communities through blind structure prediction competitions such as that run by the Cambridge Crystal Databank Centre (CCDC).  Recently, the CCDC ran the sixth such competition, which we entered for the first time. In this competition, a set of target molecules is issued to the competitors who are only given the chemical diagrams of these molecules.  They are given one year to predict the crystal structures of these molecules with no knowledge of what these crystal structures are.   Using a combination of quantum chemistry, statistical mechanics, crystal packing, force-field generation, and high-performance computing techniques, we predicted the crystal structure of one of the targets (Target XXII).   The general process is illustrated in the accompanying graphic.

The protocol consists of the following steps:

  1. Optimize the geometry of the target in the gas phase using quantum chemistry.
  2. Use the structure obtained in 1 to generate a large database of pair configurations of the molecule for use in generating a pairwise force field.
  3. Use the database generated in 2 to perform symmetry-adapted perturbation theory calculations based on density function theory in order to create a molecule-specific pairwise force field.
  4. Use the force field generated in step 3 to create and optimize a set of candidate structures by packing the molecules into different crystals according to the symmetry operations of a pre-selected set of space groups.
  5. Cluster the structures generated in 4 to create a set of unique structures and then select all those that have energies less than or equal to 10 kJ/mol of the lowest energy structure.
  6. Take the structures generated in step 5 and run fully flexible cell isothermal-isobaric molecular dynamics simulations at the experimental pressure and temperature in order to eliminate false minima on the potential energy surface.
  7. Of those structure in step 6 that survive, perform an average over the molecular dynamics trajectory in order to obtain a properly ensemble-averaged structure.
  8. Check the free-energy stability of the structures generated in step 7 by subjecting them to free-energy based enhanced sampling techniques. The structures that survive this step are then submitted as the prediction to the competition.

The results are published in the report of the 6th blind structure prediction test, Acta Cryst. B72, 439 (2016). Of the 21 groups that attempted to predict the crystal structure of this molecule, 12 successfully predicted it, and only 7 groups ranked it within their top 5 candidates. Our ranking of the structure was 4 on the list, however, the RMSD20 value we obtained at 150 K was 0.14 Å, which is the closest to the experimental structure of all the submitted structure.

Free-energy based crystal structure prediction: The above protocol is able to yield candidate structures that obey the symmetry rules of a pre-selected set of space groups. However, it cannot find rare space groups, and it does not yield a free-energy ranking directly. For this reason, we are working to develop free-energy based crystal structure prediction techniques that deliver both structures and fully anharmonic free energies directly. We are currently pursuing numerous avenues in this direction, many of which are only in the initial stages of development, one of these in which we have had some success stories targets a set of collective variables (CVs) for enhanced conformational sampling. Collective variables are thermodynamic variables or functions of the atomic coordinates that are able to distinguish different structures. The free energy is then naturally expressed as a function of these. The accompanying figure sketches out some possible collective variables that are useful for crystals.

The variables illustrated in the figure describe the overall shape of the supercell (and consequently, the unit cell), the way atoms and molecules pack in the crystal, and molecular orientation.  For flexible molecules, these could be supplemented with variables that describe the internal conformational changes.   Once selected, sampling of the CVs is achieved using a temperature-accelerated adiabatic free-energy dynamics approach [see Yu and Tuckerman Phys. Rev. Lett. 107, 015701 (2011) and Yu et al. J.Chem. Phys. 140, 214109 (2014)].

As examples of this approach, we have studied benzene and naphthalene crystals at high pressures using a standard force field.  For both of these systems, only the cell-matrix was used as the target CVs.  An example trajectory for benzene at 2 GPa is shown in the accompanying figure.

We see a number of structural transitions on the free energy landscape on the left.  On the right, a plot of the population of different structures in a series of trajectories and the accompanying free energy (green line) is shown. This is shown together with the lattice energy (red line), which shows the importance of free energy in ranking different structures.  Interestingly, we see a structure arising in the simulation that cannot be characterized by a particular space group.   This is a mixed structure, something that cannot be found when using standard crystal packing techniques.  We see that this mixed structure is a relatively low free energy structure, a fact that appears to resolve a controversy regarding the most likely crystal structure of benzene at a pressure of 2 GPa.

Further simulations with a slightly different force field yielded new high-pressure versions of forms I and V (called I’ and V’) of benzene, and the relative free energies of these forms as a function of pressure was studied [see Schneider et al. Acta Cryst. B72, 542 (2016)].   In the accompanying figure, a trajectory showing the transition between these structures (finally ending in benzene II) and their associated free energy as a function of pressure.

For naphthalene crystal, the appearance of additional small peaks in the Raman spectrum at 3050 cm-1 at pressures of 0.2 GPa and persisting through ~ 5 GPa have suggesting the possibility of a new high-pressure polymorph. However, attempts to find high-pressure polymorphs of naphthalene have, thus far, been unsuccessful, apart from one report by Block et al. Science (1970). Simulations using our approach show that, at high pressure, the stable experimental structure starts to develop mixed domains in which localized regions of the stable P21/c packing motif appear rotated relative to the rest of the crystalline lattice, as illustrated in the accompanying figure. Structures of this type appear to be as stable as the experimental structure at a wide range of pressures investigated. Mixed domain structures of this type could lead to the small observed changes in the Raman spectrum.

Microscopic mechanism of equilibrium homogeneous melting of an atomic solid:  An important class of problems to which the adiabatic free energy dynamics approach can be applied is to the melting transition,  an example of a first-order phase transition.  Using a combination of the diagonal elements of the cell matrix and Steinhardt order parameters Q4 and Q6, we investigated the free energy landscape of the melting of a sample of solid copper using an embedded-atom potential model [see Samanta et al. Science 346, 729 (2014)].   Recall that classical nucleation theory would describe this process as the growth of a liquid nucleus in the solid which expands from the inside out until a critical nucleus size is reached, at which point the transition from solid to liquid is a downhill process in free energy.   This is expressed via the free energy relation as a function of the radius r of the nucleus:

where Δμ < 0 is the chemical potential difference between the two phases at the melting point, and γs is the surface tension of the liquid.   This relation expresses the free energy balance favoring the formation of a nucleus of volume (4/3)πr3 and the cost to form the interface between a liquid surface of area 4πr2 with the surrounding solid. 

The accompanying figure shows the free energy surface, projected onto the Q4-volume plane.  This figure shows that the free energy surface is rich in detail.  In particular, it shows several basins describing the pure solid (no defects), a basin in which a low density of small defects are formed at the melting point, a basin in which the defects migrate to form defect clusters, the formation of a liquid embryo, and finally, a deep basin describing the pure liquid.  Snapshots of the types of defects that can form, specifically vacancies and interstitials, their ability to form the small clusters, and their occasional tendency to annihilate, are also shown with the free energy surface.  When the free energy surface is rendered into a one-dimensional profile using the string method, we then see where along the melting pathway these basins and local barriers occur.  Overall, the picture of a low density of highly mobile defects in the solid as driving the formation of liquid embryos and ultimately causing the melting transition stands in sharp contrast to the picture provided by classical nucleation theory.