BrCa: Results Summary (target Selection Process)

Target Selection Process

The focus of this section is set on the molecular targets where the active molecules against BrCa models in the ChEMBL database have significant impact. We will tackle the question from the empirical point of view (i.e the  ChEMBL recorded impact) and the predictive analysis, where predictions from machine learning algorithms will be used for target ID. Results will be further compared to determine whether the predictive analysis in the whole dataset affords any novelty upon the empirical annotations. Targets involved in killing particular BrCa cell lines will also be determined by hyerarchical clustering.

BRCA2 protein. By Filip em – self created from PDB entry with KiNG tool http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1n0w, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=3237307

Global Identification Process

The target identification process is initiated with compounds screened in breast cancer phenotypic assays. The results from such compounds in molecular target based assays are identified, and those with an active ChEMBL score are selected and pivoted to yield unique raw data by target. Each row, will contain, the actual ratio between the number of events per active target in all BrCa assays and the total count of events per target (countRatio). CountRatio is equivalent to the proportion of times that molecules with a positive hit on a particular protein activates BrCa as well. If always = 1, if never =0. This parameter is associated in the chart below with the total number of events per target (count(chemblActivityScore)) as a measure of confidence. Average potencies per target in Non BrCa molecular assays (Avg(chemblActivityScore)), and BrCa assays (Avg(BrCaScore)) are depicted as color and size gradients, respectively. As an example, targets with countRatio >0.65 and total events >1 are marked in the chart. This is: targets where >65% of molecular events with a positive chemblActivityScore also have active events in BrCa phenotypic assays, and results in the identification of 61 putative BrCa targets.

 

This selection is carried out on database with 170k events on molecular assays, but given that there are predictions from BrCa prediction of activity section, these could be used to look at potential targets among 13M events, the entire database content. In this particular case, the random forest regression output will be used as an indicator of BrCaScore. With this target dataset, by using predictions, we increment the chances of identifying unforeseen targets.

 

It appears that some of the targets identified from the actual data are present in the plot from predicted data, but few of the targets from the prediction are in the BrCa compound dataset. Below results are plotted in bar charts, sorted by potency.

From actual data…

 

From predicted data…

 

Little dashboard where targets marked on the left graph (actual) are displayed on the right one (prediction). Although most of the targets are also identified in in the right chart, there is an evident shift to lower countRatio values.

In this second dashboard, targets identified from predicted BrCa score are in the left, and marking them we make them to be displayed in the right plot. Just the two more frequent appear in the plot of actual targets. Should we trust on the value of prediction?

Breast Cancer is one of the subjects of research with higher amount of information published, so, let’s see in the table below how the predicted targets with an average score >6 do in literature
 

We can see how most of the predicted targets have a link to a literature reference that relates them to breast cancer biomarkers or therapies. Those that don’t, pertain to species other than human, i.e: E. coli, Pseudomonas, HIV virus or plasmodium. So the final target selection will include proteins selected by their actual and predicted scores of activity. The plot below shows the selection criterion for targets (>0.5 count ratio & > event of activity, predicted or actual).

 

Here, the corresponding bar charts for the average potencies and countRatios with the best values:

 

 

 

And here, the table with the corresponding values:

Targets identified from predicted and actual BrCa scores.

All targets with more than 50% events with a positive chembl score inhibiting BrCa cell lines growth either on real or virtual experiments.

 

 

 

Target Selection Upon Specific Cell Lines.

So far  we have been using the average BrCa scores from experiments carried out with 8 BrCa cell lines, but it is well known that there are genotypic and phenotypic differences among them that translate to differences in pharmacology. This may be relevant for discriminative treatments of tumors, for which the cell lines are representative, and the procedure is applicable on clinical databases with extensive and updated tumor treatment outcomes.

For this purpose, let’s do a simple classification of the tumors via hierarchical clustering performed on the global results of the BrCa compounds in the whole ChEMBL database, having removed all relatived to tumors or phenotypic (non molecular target) assays. The chart below shows the results of the clustering with a focus on three cell lines (MCF7, MDA-MB-435, and MDA-MB-231). The cell line specific clusters are areas where the activity on the cell line is the greatest (red) and minimal for the rest (yellow). Experiments with specific areas of activity are marked in red on the chart.

 

Once the results are collected, we proceed with the cell line specific results similar to the global target identification procedure. Results are pivoted by target, and the average potency in the specific cell line is calculated alongside the number of experimental data (countChemblScore). This is compared to the average potency of such experimental events at the average BrCaScore calculated from all BrCa cell lines. To facilitate selection views and interpretation, a selectivity index between the specific cell line and the averageBrCaScore is added to the plots.

Charts below represent the dashboards used for selection. On the left, the selectivity index is compared to the number of events in the DB. Compounds with the highest indexes are then marked, which makes them to be plotted in the right chart, that compares the potency in each particular cell line to the average BrCaAverageScore with the y=x, y=x+1 and y=x-1 lines.

 

 

And the corresponding tables containing the selected targets:

 

 

This sort of analysis is expanded in the BrCa pathways analysis section, to which you can access by clicking in this text.