What if we wanted to build a focused, ad hoc, cheap compound collection for a particular asset, either molecular-based or phenotypic? We could use the recorded activities upon the target of our choice and look at what these molecules have done in the rest of the assays present in the DB, and then, use neuronal networks, decision trees, random forests or many other machine learning tools that will allow us to build a model through which we can pass molecules that have never seen our target to predict its activity. All the models have incorporated several measures of accuracy, precision, so as estimations of the goodness of the model and true and false positive rates. The page contains several examples of predictive analysis applied to different experimental cases taken from the ChEMBL downloadable information.

From phenotypic screens we get a good inference with respect to the further efficacy that our molecules can achieve. But phenotypes, i.e. systems, cells and tissues are composed of networks of mutually interacting biomolecules, and often, it becomes hard to identify the key molecular target responsible for the observed efficacy. Biologists and chemists have developed a number of systems to determine this, but often face problems of capacity and unforeseen interactions that make the outcome slow and less than complete. In silico analysis of the overall activity of molecules that have been assayed in a particular phenotypic assay may elucidate proteins/genes unexpectedly. We’ll fill this section with the examples below

We can use pathway info to assess the best targets. The databases described in DBs  have incorporated some essential biosystem tables from NCBI  and Reactome repositories summarizing the interaction of proteins in networks of different systems and tissues and species. More than 180M records to be used for the confirmation of target ID or identifying new proteins to be attached within the actual identified pathway.

Drug Repositioning


From Wikipedia, the free encyclopedia
Drug repositioning (also known as drug repurposingre-profilingre-tasking or therapeutic switching) is the application of known drugs and compounds to treat new indications (i.e., new diseases).[1]

A significant advantage of drug repositioning over traditional drug development is that since the repositioned drug has already passed a significant number of toxicity and other tests, its safety is known and the risk of failure for reasons of adverse toxicology are reduced. More than 90% of drugs fail during development,[2] and this is the most significant reason for the high costs of pharmaceutical R&D. In addition, repurposed drugs can bypass much of the early cost and time needed to bring a drug to market. It significantly reduces the transition of bench research work to treatment at bedside. On the other hand, drug repositioning faces some challenges itself since the intellectual property issues surrounding the original drug may be complex and from a commercial point of view it may not always make sense to take such a drug to market.

Drug repositioning has been growing in importance in the last few years as an increasing number of drug development and pharmaceutical companies see their drug pipelines drying up and realize that many previously promising technologies have failed to deliver ‘as advertised’. Computational approaches based on virtual screening of comprehensive libraries of approved and other human use compounds against large numbers of protein targets simultaneously have been developed to enhance the efficiency and success rates of drug repositioning, particularly in terms of high-throughput shotgun repurposing.[3][4][5]

NCBI, ChEMBl, Uniprot and Open Targets free downloadable info reconfigured and joined in adHoc databases optimized for Machine Learning


Starting from an appropriate data frame containing chemistry, biology and assay information, built from SQL querying onto the selected ML DB, we can develop prediction and identification procedures. This section summarizes the subsequent procedures and algorithms employed. Whatever the route we take to identify pathways or targets of interest, or putative bioactive molecules, we can make them converge to confirm each other’s predictions.

The schema on the left attempts to illustrate how methodology described in the links below sustains an integrative prediction modelling.