Methodology (target selection)

Target selection.

This is a 3 stage process. 1st, the master data frame is pivoted by target using proteinNameC as target descriptor. The target pivoted data frame contains the following records:

proteinNameCName of the protein target
target_descriptionType of target
proteinTypeType of protein
GiProteinNumberGi code for protein number. To enable join with biosystems
Avg(chemblActivityScoreC) for NAverage potency recorded for all assays in ChemBl for compounds predicted inactive by our prediction model
Avg(chemblActivityScoreC) for YAverage potency recorded for all assays in ChemBl for compounds predicted active by our prediction model
Count(chemblActivityScoreC) for NNumber of times that the compound has been inactive upon each ChemBl protein target
Count(chemblActivityScoreC) for YNumber of times that the compound has been active upon each ChemBl protein target
targetClassProtein family of proteinNameC protein
proteinClassDescriptionClass to which protein NameC protein belongs
organismCOrganism to which protein NameC protein belongs
AvgChemblActivityScoreCAverage potency for all ChemBl assays
CountChemblActivityScoreCNumber of records for a particular proteinNameC protein in ChemBl database
CountRatioNumber of times in which the protein target has been active divided by the total number of records of a particular target
PotencyRatioAverage potency for a particular proteinNameC active records divided by total records of the sample protein

A custom view offers this aspect, each dot corresponding to an individual target.

At this moment, the most impactful targets can be selected picking the top right zone of the graph, where the targets with the highest activity count ratio (actives vs total) with the highest number of occurrences lie.

In a second step, the master file is processed through the apriori (arules) algorithm once the key parameters are converted to factors. Redundancies are removed and rules are sorted by lift for further visualization in R or SpotFire. When general rules are displayed, targets are isolated for matching with the pivoted targets table. Alternatively, if other DB variable is of interest for the analysis, as assay type or assay name, this can be isolated as well, although this should match with its own pivoted (by assay type or name) table.