Correlation analysis
The first step is to study the existing correlation between the chemblScore and the predicted average cytotoxScore for each particular protein. The figure below shows a typical dashboard designed to find the best and most trustworthy correlations. The top left plot displays the relationship between the calculated slope and R squared, where slopes R2 values both > 0.5 are marked, thus causing the selected points to appear in the second plot (bottom left), where we can see the number of events in the database vs the number of unique molecules participating in those events. Marking here elements with more than 4 events with at least 2 different molecules produces the right diagram, where correlations of marked points are visualized, trellised by protein name. The CombinedPredictedAndActualCytotoxIndex is a combination of predictions and actual cytotoxScores, where the predicted values have been replaced by the actual ones when they exist.
Proportion of active predictions.
To allow extensive visualization of all target cytotoxicity records the countRatio view has been created. The count ratio is the proportion of active (cytotoxic) predicted and actual events vs the total number of events. The plot below shows the distribution of count ratios vs total events for all DB contents.
Same plot marking countRatios > 0.5 for at least two independent events.
Barcharts below show some of the targets with the highest likelihood of being cytotoxic sorted either by their countRatio or chemblScore.
The table at the bottom of the page contains the records marked in the chart above.
Potentially toxic protein targets.
proteinNameC | AvgChemblScore | AvCyotoxScorePrediction | countOfTotalScores | countRatio |
---|
proteinNameC | AvgChemblScore | AvCyotoxScorePrediction | countOfTotalScores | countRatio |
---|---|---|---|---|
DNA topoisomerase 4 | 5 | 4.7492385956882 | 120 | 0.66666666666667 |
Serine/threonine-protein kinase WEE1 | 6.1748098243565 | 5.0706852926913 | 945 | 0.54179894179894 |
Tyrosine-protein kinase receptor Tie-1 | 5.5540945924737 | 4.9816539434483 | 147 | 0.54421768707483 |
Epithelial discoidin domain-containing receptor 1 | 6.1506030766354 | 5.0617301533049 | 253 | 0.55731225296443 |
Tubulin alpha-1 chain | 5.4750672928932 | 5.1959153629845 | 43 | 0.62790697674419 |
Mitogen-activated protein kinase 7 | 5.3281037474288 | 5.0894734159103 | 190 | 0.54736842105263 |
Receptor-interacting serine/threonine-protein kinase 1 | 5.8951242371124 | 5.2838401346276 | 253 | 0.63241106719368 |
Voltage-gated potassium channel subunit Kv1.1 | 7.3675969648835 | 4.9878035856884 | 111 | 0.54954954954955 |
Peptidyl-prolyl cis-trans isomerase FKBP5 | 4.8986639239995 | 5.2997354054325 | 122 | 0.72131147540984 |
Splicing factor 3B subunit 3 | 5.4830530998707 | 5.3343556617786 | 67 | 0.53731343283582 |