Methodology (target selection)

Target selection.

This is a 3 stage process. 1st, the master data frame is pivoted by target using proteinNameC as target descriptor. The target pivoted data frame contains the following records:

proteinNameC Name of the protein target
target_description Type of target
proteinType Type of protein
GiProteinNumber Gi code for protein number. To enable join with biosystems
Avg(chemblActivityScoreC) for N Average potency recorded for all assays in ChemBl for compounds predicted inactive by our prediction model
Avg(chemblActivityScoreC) for Y Average potency recorded for all assays in ChemBl for compounds predicted active by our prediction model
Count(chemblActivityScoreC) for N Number of times that the compound has been inactive upon each ChemBl protein target
Count(chemblActivityScoreC) for Y Number of times that the compound has been active upon each ChemBl protein target
targetClass Protein family of proteinNameC protein
proteinClassDescription Class to which protein NameC protein belongs
organismC Organism to which protein NameC protein belongs
AvgChemblActivityScoreC Average potency for all ChemBl assays
CountChemblActivityScoreC Number of records for a particular proteinNameC protein in ChemBl database
CountRatio Number of times in which the protein target has been active divided by the total number of records of a particular target
PotencyRatio Average potency for a particular proteinNameC active records divided by total records of the sample protein

A custom view offers this aspect, each dot corresponding to an individual target.

At this moment, the most impactful targets can be selected picking the top right zone of the graph, where the targets with the highest activity count ratio (actives vs total) with the highest number of occurrences lie.

In a second step, the master file is processed through the apriori (arules) algorithm once the key parameters are converted to factors. Redundancies are removed and rules are sorted by lift for further visualization in R or SpotFire. When general rules are displayed, targets are isolated for matching with the pivoted targets table. Alternatively, if other DB variable is of interest for the analysis, as assay type or assay name, this can be isolated as well, although this should match with its own pivoted (by assay type or name) table.