Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights

Pankajakshan, Praveen ; Sanyal, Suchismita ; de Noord, Onno E. ; Bhattacharya, Indranil ; Bhattacharyya, Arnab ; Waghmare, Umesh (2017) Machine Learning and Statistical Analysis for Materials Science: Stability and Transferability of Fingerprint Descriptors and Chemical Insights Chemistry of Materials, 29 (10). pp. 4190-4201. ISSN 0897-4756

Full text not available from this repository.

Official URL: http://doi.org/10.1021/acs.chemmater.6b04229

Related URL: http://dx.doi.org/10.1021/acs.chemmater.6b04229

Abstract

In the paradigm of virtual high-throughput screening for materials, we have developed a semiautomated workflow or "recipe" that can help a material scientist to start from a raw data set of materials with their properties and descriptors, build predictive models, and draw insights into the governing mechanism. We demonstrate our recipe, which employs machine learning tools and statistical analysis, through application to a case study leading to identification of descriptors relevant to catalysts for CO2 electroreduction, starting from a published database of 298 catalyst alloys. At the heart of our methodology lies the Bootstrapped Projected Gradient Descent (BoPGD) algorithm, which has significant advantages over commonly used machine learning (ML) and statistical analysis (SA) tools such as the regression coefficient shrinkage-based method (LASSO) or artificial neural networks: (a) it selects descriptors with greater stability and transferability, with a goal to understand the chemical mechanism rather than fitting data, and (b) while being effective for smaller data sets such as in the test case, it employs clustering of descriptors to scale far more efficiently to large size of descriptor sets in terms of computational speed. In addition to identifying the descriptors that parametrize the d-band model of catalysts for CO2 reduction, we predict work function to be an essential and relevant descriptor. Based on this result, we propose a modification of the d-band model that includes the chemical effect of work function, and show that the resulting predictive model gives the binding energy of CO to catalyst fairly accurately. Since our scheme is general and particularly efficient in reducing a set of large number of descriptors to a minimal one, we expect it to be a versatile tool in obtaining chemical insights into complex phenomena and development of predictive models for design of materials.

Item Type:Article
Source:Copyright of this article belongs to American Chemical Society.
ID Code:135819
Deposited On:18 Aug 2023 11:41
Last Modified:18 Aug 2023 11:41

Repository Staff Only: item control page