摘要

Objective: To construct na?ve Bayesian learning based on PI3K inhibitors prediction models which can be used to predict the activities of new PI3K inhibitors and perform drug virtual screening. Methods: A total of 6 175 inhibitors and non-inhibitors with multi-scaffolds of PI3Kα, PI3Kβ, PI3Kγ, and PI3Kδ were firstly collected. 112 classification models were established based upon molecular fingerprint using Na?ve Bayesian machine learning method. Results: The optimal models were obtained through a systematic comparison of the prediction results of the established models. The accuracy values of the best models of PI3Kα, PI3Kβ, PI3Kγ and PI3Kδ for the corresponding test sets(Q) are 0.774, 0.804, 0.816, and 0.673, respectively. Moreover, the AUC values of the best models for each target in test sets are higher than 0.82. Conclusion: These optimal models can be used to predict and virtual screen new PI3K inhibitors and can also construct targeted enrichment libraries for PI3K. Moreover, the favorable and unfavorable fragments obtained from the best Bayesian classifiers will be helpful for lead optimization or the design of new PI3K inhibitors.