02.02 npj: 可靠、可解釋的機器學習方法加速材料發現

npj: 可靠、可解釋的機器學習方法加速材料發現


應用材料信息學過程中,可靠且可解釋的機器學習解決方案的構建面臨挑戰,該研究為應對這一挑戰邁出了第一步。美國勞倫斯•利弗莫爾國家實驗室的Bhavya Kailkhura和T. Yong-Jin Han共同領導的團隊作出的主要貢獻包括兩個方面。首先,以代表性不充分和分佈失衡的數據作機器學習訓練的同時,在現有的材料信息學通道中找出了一些訓練、測試和量化不確定性步驟中的缺陷。他們的發現引起了人們對現有材料信息學通道可靠性的高度關注。其次,為克服這些挑戰,他們提出了一種通用的、可解釋的、可靠的機器學習方法,用於從代表性不足和分佈失衡的數據中進行可靠的學習。



該文近期發表於npj Computational Materials 5: 108 (2019),英文標題與摘要如下,點擊https://www.nature.com/articles/s41524-019-0248-2可以自由獲取論文PDF。

npj: 可靠、可解釋的機器學習方法加速材料發現

Reliable and explainable machine-learning methods for accelerated material discovery

Bhavya Kailkhura, Brian Gallagher, Sookyung Kim, Anna Hiszpanski & T. Yong-Jin Han

Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: 1) predicting properties of crystalline compounds and 2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.

npj: 可靠、可解釋的機器學習方法加速材料發現

