摘要

We introduce an entropy-based methodology, Iterative Shannon entropy (ISE), to quantify the information contained in molecular descriptors and compound selectivity data sets taking data spread directly into account. The method is applicable to determine the information content of any value range dependent data distribution. An analysis of descriptor information content has been carried out to explore alternative binning schemes for entropy calculation. Using this entropic measure we have profiled 153 compound selectivity data sets for combinations of 68 target proteins belonging to 10 target families. With the ISE measure, we aim to assign high information content to compound data sets that span a wide range of selectivity values and different selectivity relationships and hence correspond to more than one biological phenotype. Target families with high average entropy scores are identified. For members of these families, active compounds display highly differentiated selectivity profiles.

全文