A primer on computational modelling for chemical R&D today
Nicholas B. Tito
21 March 2023
Highlights in this article
Physics- and chemistry-based modelling is becoming more accurate, more versatile, cheaper, and more accessible for chemical/material R&D. But it's not replacing wet-lab R&D any time soon.
Molecular-scale modelling, capturing the behaviour of thousands to millions of molecules, has high predictive potential for this industry sector. Available methodologies strike a good balance between input demand, scientific accuracy, and compute cost.
Specialized expertise is needed to build, choose, and/or deploy a computational tool for a product pipeline. Consultancy helps here.
In-house modelling teams are valuable for long-term continuous integration of computational tech, and building custom solutions tailored to your specific R&D needs.
Tools with user experiences designed for non-experts (in modelling) have the best chance of broad adoption, as they most easily integrate into your workflow as "another tool in the toolbox".
Today, this kind of modelling fits most naturally in the “Research” part of R&D. It’s harder to deploy in the “Development” part.
Introduction
Computational modelling is the use of computers to simulate the outcome of real-world things within a virtual world/sandbox. "You" (as the builder) choose the ingredients, properties, rules, and/or laws of the virtual world, and then the computer carries out the simulation to reveal the consequences of your choices. The simulation can be done unattended, or it can be to any extent interactive like a video game (think Sim City, Age of Empires, Rollercoaster Tycoon, etc).
In this article, we focus on the realm of physics- and chemistry-based modelling, wherein models are built to contain fundamental features, laws, and interactions found in nature. These models are powerful not only for their predictive capabilities, but for offering deep physical and chemical insight into why the model outcome occurs.
The academic world is full of predictive modelling tools for polymers, liquids, formulations, surfactants, and dispersions. It is a bleeding-edge area of research, particularly as compute power grows rapidly.
Harnessing computational modelling for commercial R&D is much more challenging. It requires technical expertise, careful interpretation of output, and selective use to benefit these industries in a measurable way. Modern materials and formulations are exceptionally complex at the chemical level. Similar complexity is involved in modelling these systems.
How are models designed and chosen?
In physics- and chemistry- based modelling, compute power is not yet sufficient to model entire systems at the macro-scale all the way from “first principles” (e.g. the atomistic / quantum scale). Thus, computational modelling is done by bringing together a choice of chemical and physical laws, interactions, forces, conditions, etc. in a way that gives useful predictions, while still being computationally feasible and timely for the purpose at hand.
Designing or selecting a model also requires identifying the length and time scale that contains the “most important” physical/chemical processes for the functionality of the system.
Typical length scales of relevance, and associated computational methods, in chemical / material R&D are:
Macro
Millimeter to Meters. Our scale of existence: beakers with mixtures on a lab bench, fluid motion through laboratory / industrial apparati, centrifugation, nozzle flow / deposition.
Typical Methods: computational fluid dynamics (CFD), classical mechanics / dynamics, finite-element modelling (FEM)
Meso
Micrometer. The “intermediate” realm connecting molecular-scale to macro scale: multi-fluid phase separation, droplets / morphology, emulsification (by surfactants), colloids / suspensions, flocculation, particle jamming
Typical Methods: lattice Boltzmann model (LBM), smoothed particle hydrodynamics (SPH), field-theoretic modelling
Molecular (Micro)
Nanometer to Micrometer. The molecular realm: solvation, self-assembly, interface formation, surfactancy, nanoparticles / nanocomposities, macromolecules / polymer crosslinking & entanglement, glassification / phase behaviour.
Typical Methods: molecular dynamics (MD), Monte Carlo (MC), statistical thermodynamics, coarse-grained field models
Atomistic
Angstrom. Where atoms interact via quantum mechanics / dynamics: chemical bonding, ionic interactions, hydrogen bonding, dipole/van der Waals interactions, adsorption / solvation, solids / alloy atomistic interactions.
Typical Methods: molecular mechanics (MM), density functional theory (DFT), quantum calculations (e.g. Hartree-Fock)
In any of these scales, custom or combined models can also be created as needed that fall outside of the standard approaches. These models can be broadly divided into analytical (“back-of-the-envelope”) models, and numerical models. Examples include polymer scaling theories, chemical kinetics / reaction rate models, structure/property constitutive models (for e.g. elasticity), and mechanical models.
The predictive power of a computational model is highest when it requires the most minimal or basic chemical input, while yielding the most extensive, non-obvious, and useful output. This is often achieved by doing the modelling at a small length scale, so as to capture as much of the interaction complexity between small-scale building blocks as possible, and then using mathematical machinery (statistical thermodynamics) to predict macroscopic properties (e.g. shear viscosity, melting / boiling points, solubility, Tg, interfacial tension).
Modelling at the molecular scale in the table above is so common in academia and industry because it offers high predictive power, while still allowing for computational tricks that keep compute time reasonable. In modern formulations and materials, the molecular scale is also often where salient emergent behaviour is happening: where molecular building blocks are coming together to form larger-scale structures, themselves being the primary entities that dictate the properties and functionality of the formulation.
The title of this section was “How are models designed and chosen?” The answer is: by an expert. Specialized knowledge and experience in the realm of computational modelling is required to properly
identify the length and time scale(s) of necessary focus for the system under study;
choose which chemical and physical processes to capture in the model, so as to balance faithfulness of the model representation (output accuracy/realism) with compute time and cost;
choose which modelling method is best suited for a particular application, in order to have the best chance of successful use.
How are models commercialized? Do you need an in-house team?
There are a growing number of scientific software companies that provide out-of-box platforms for computational modelling. These range from intensive simulation solutions utilising cloud or on-site compute resources, to apps that work right on your desktop/laptop.
Traditionally, while these platforms have powerful user interfaces, they still require expertise in the underlying modelling technology at an intermediate or deep level in order to use. The learning curve is often steep.
Today, there are inspiring new efforts to broaden adoption of this technology, by fostering user experiences tailored to "general" (non-expert) users. Some of the ways that this is being done are:
Designing user interfaces that only require standard ambient or ingredient (chemical) input. For example: pressure, temperature, pH, etc., and for ingredients, things like molar mass, mass density, viscosity (if a liquid), Hansen solubility parameters, or a SMILES code. From this input alone, the platform must autonomously derive all model-specific parameters (e.g. interaction potentials, bond force constants, etc.) and numerics/physics (choice of models/representations, numerical stability criteria, time-step choices, etc.). This is challenging but feasible.
Placing the platform fully on the web, so that no installation or configuration is required on a local computer.
Outsourcing to cloud providers (e.g. Amazon Web Services, Google Cloud, Oracle) for on-demand compute resources, without requiring any user expertise or intervention on how to utilize in-house compute resources (if they're even available).
Providing scientific consultancy services along with the platform, to aid in integrating the platform within a specific R&D program.
Developing computational strategies that link length and time scales together at the modelling level, to yield "multi-scale" modelling solutions (removing the need to choose these explicitly).
Nevertheless, much of computational modelling in industrial R&D is still carried out by dedicated in-house teams. These teams, whether small or large, can leverage commercial, open-source, and home-brewed computational approaches in a way that tailors to the need at hand. The team can also evolve in step with the product pipelines in the company.
Where does computational modelling sit in R&D?
The potential for computational modelling is ever-evolving, based on progress in three areas:
Research: new method development, faster methods, better capturing of physical and chemical interactions, hybrid/pure AI approaches
Speed and cost of compute power: CPU/GPU architectures, parallelization & interconnect, chip process size, clock speed, memory size, thermal efficiency
Adoption in chemical and materials industries: mindset change, commercial value, correspondence between application area and suitability of existing computational methods, scalability
Among these, Adoption has tended to be the slowest so far. Why?
It is challenging to alter an R&D process, which already seems to work, by introducing new tools and technologies. R&D team members habituated to laboratory protocol are hesitant to embrace new methods, particularly ones that fall well outside their comfort zone.
Computational modelling also hasn’t gained a firm reputation yet. It “looks like an academic tool” (to practicing chemists), requires a different swath of expertise compared to hands-on chemistry/engineering, and in many cases still doesn’t give guidance that benefits commercial product development (i.e. it’s not “accurate enough”, “fast enough”, “easy enough”, “cheap enough”, etc.)
This is changing, due to progress in Research and Compute Speed/Cost. Computational modelling is becoming:
more accurate: larger and more detailed calculations can be done more rapidly and cheaply on current-day compute hardware.
more versatile, with more application potential: broader expanse of methods, introduction of / blending with AI methods
(sometimes) cheaper and faster than an equivalent experimental protocol to obtain information
more accessible: more companies offering commercial platforms that simplify the use of this technology
However, we are not yet at the point where computational modelling can replace the laboratory. Far from it, in fact. A model is just that—it’s a prototype, a simplified representation, of the real system. The model is another tool in the toolbox, along-side and augmenting existing laboratory methods.
Today, it remains an ongoing challenge to determine where computational modelling has the best chance of success within a product R&D pipeline. The following graphic provides some clues.
A typical application development sector has Research and Development phases to the products. These two phases have “pain points” that inevitably arise for each product—questions which must be solved in order to make the product functional, robust, commercially viable, affordable, safe, etc.
In the Research phase, work to address a pain point is exploratory, qualitative, and insight-driven. Computational modelling comes in very naturally in this phase via “use cases”. For example, modelling can help systematically explore an enormous range of parameters for a formulation, providing early guidance on where in that parameter space experimental efforts can be initially directed. This is a form of prototyping which also yields physical understanding of the system, e.g. at the molecular scale. The model illuminates how your chemical design leads to a desired functionality or behaviour in your formulation/material.
Pain points in the Development phase tend to require more targeted, quantitative evidence to address. Work is focused on optimization, and driven by clear data. Modelling plays in here more like traditional laboratory instrumentation, by providing answers to “problem statements”. The demands placed on the model are thus much larger than in the Research phase, because the model must have a high degree of quantitative accuracy. The model must be more carefully tailored to the specific product/application for it to achieve this accuracy.