Skip to main content
Fig. 2 | Biology Direct

Fig. 2

From: Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Fig. 2

Phase I training and test results of our three submitted classifiers. Using the training data, we evaluated and attempted to optimize 7 classification algorithms as well as a soft-voting based classifier. Based on this analysis, we selected three approaches: soft voting (Ensemble), a Logistic Regression classifier (logReg), and a Random Forests classifier (RF). After evaluating these predictions, the CAMDA Challenge organizers provided class labels for the test set. These graphs illustrate the performance of the classifiers on the training and test sets during Phase I. a In some cases, the classifiers outperformed baseline accuracy (red lines), which reflect the predictive performance when classifying all cell lines as the majority class. However, the classifiers performed only marginally better—and sometimes worse—than the baseline. b-c Sensitivity increased and specificity decreased for the test-set predictions relative to the training-set predictions; this reflects different levels of class imbalance between the training and test sets. d On the training set, the Matthews Correlation Coefficient (MCC) was sometimes better than expected under random-chance expectations, but it was always worse on the test set

Back to article page