SMN2: Prediction of Activity

SMN2

By Emw (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

By Emw (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons

The SMN2 gene is part of a 500 kb inverted duplication on chromosome 5q13. This duplicated region contains at least four genes and repetitive elements which make it prone to rearrangements and deletions. The repetitiveness and complexity of the sequence have also caused difficulty in determining the organization of this genomic region. The telomeric (SMN1) and centromeric (SMN2) copies of this gene are nearly identical and encode the same protein. The critical sequence difference between the two genes is a single nucleotide in exon 7, which is thought to be an exon splice enhancer. The nucleotide substitution in SMN2 results in around 80-90% of its transcripts to be a truncated, unstable protein of no biological function (Δ7SMN) and only 10-20% of its transcripts being full-length protein (fl-SMN).

Note that the nine exons of both the telomeric and centromeric copies are designated historically as exon 1, 2a, 2b, and 3-8. It is thought that gene conversion events may involve the two genes, leading to varying copy numbers of each gene.[6]

 

Clinical significance

While mutations in the telomeric copy are associated with spinal muscular atrophy, mutations in this gene, the centromeric copy, do not lead to disease. This gene may be a modifier of disease caused by mutation in the telomeric copy. Description taken from Wikipedia.

Source; Wiwkimedia from open.osmosis.org

  • Spinal muscular atrophy (SMA) is caused by insufficient levels of the survival motor neuron protein SMN. The SMN locus on chromosome 5q13 contains two inverted copies of SMN called SMN1 and SMN2 which are 99% identical at the amino acid level. SMN1 is a fully functional protein and SMN2 skips axon 7 90% of the time. Skipping of exon 7 produces non-functional protein product. 10% of the SMN2 protein includes exon 7 and is fully functional. In the SMA disease state, mutations in the SMN1 locus are the cause of the disease state. Because only 10% of SMN2 is of the fully functional form, it is not sufficient to overcome the deficiency produced by the loss of the SMN1 product. A therapy that either increase the amount of SMN2 product made or to increase the inclusion of exon 7 has been proposed for the treatment of SMA.
  • SMN2 assay has been designed to identify small molecules that can increase the amount of functional SMN2 product by appending a luciferase reporter gene after the native SMN2 gene, such that inclusion of exon 7 in the expressed product places the luciferase sequence in frame, thus generating functional luciferase enzyme.

Assay description by NIH Chemical Genomics Center [NCGC]. Assay Submitter (PI): Elliot Androphy. Go to ChEMBL assay page for details. 

 

 

Table summary of the existing SMN2 assays in the ChEMBL DB. Among all, qHTS Assay for Enhancers of SMN2 Splice Variant Expression, with more than 30k records, has been used to annotate activity of molecules on SMN2. The rest of them have been excluded for annotation and construction of predictive models.

ChEMBL Assay IDAssay SourceAssay TypeAssay OrganismActivity CountDescriptionReference
CHEMBL1613842PubChem BioAssaysFHomo sapiens33647PUBCHEM_BIOASSAY: qHTS Assay for Enhancers of SMN2 Splice Variant Expression. (Class of assay: confirmatory)CHEMBL1201862
CHEMBL1826381Scientific LiteratureB133Activation of SMN expressed in HEK293 cells assessed as concentration required to reach 50% of maximum luciferase signal by SMN2-promotor driven lu…J. Med. Chem., (2011) 54:18:6215
CHEMBL944092Scientific LiteratureFMus musculus114Activation of SMN2 promoter in mouse NSC34 cells assessed as induction of beta-lactamase activityJ. Med. Chem., (2008) 51:3:449
CHEMBL1738018PubChem BioAssaysFHomo sapiens109PUBCHEM_BIOASSAY: Confirmation Concentration-Response Assay for Enhancers of SMN2 Splice Variant Expression for Further Probe SAR. (Class of assay:…CHEMBL1201862
CHEMBL1826380Scientific LiteratureB98Activation of SMN expressed in HEK293 cells assessed as rate of induction by SMN2-promotor driven luciferase reporter gene assay relative to basal …J. Med. Chem., (2011) 54:18:6215
CHEMBL1738105PubChem BioAssaysFPhotinus pyralis94PUBCHEM_BIOASSAY: Counterscreen Assay for Enhancers of SMN2 Splice Variant Expression: Interaction with Luciferase Reporter for Further Probe SAR. …CHEMBL1201862
CHEMBL944093Scientific LiteratureFMus musculus90Activation of SMN2 promoter in mouse NSC34 cells assessed as induction of beta-lactamase activity relative to nonactivated cellsJ. Med. Chem., (2008) 51:3:449
CHEMBL1737944PubChem BioAssaysFHomo sapiens61PUBCHEM_BIOASSAY: Counterscreen Assay for Enhancers of SMN2 Splice Variant Expression: Modulation of SMN1 Expression for Further Probe SAR. (Class …CHEMBL1201862
CHEMBL1614260PubChem BioAssaysFHomo sapiens34PUBCHEM_BIOASSAY: Confirmation Concentration-Response Assay for Enhancers of SMN2 Splice Variant Expression. (Class of assay: confirmatory) [Relate…CHEMBL1201862
CHEMBL1614101PubChem BioAssaysBPhotinus pyralis26PUBCHEM_BIOASSAY: Counterscreen Assay for Enhancers of SMN2 Splice Variant Expression: Interaction with Luciferase Reporter. In this assay, we util…CHEMBL1201862
CHEMBL1613932PubChem BioAssaysFHomo sapiens20PUBCHEM_BIOASSAY: Confirmation Concentration-Response Assay for Enhancers of SMN2 Splice Variant Expression for Probe SAR. (Class of assay: confirm…CHEMBL1201862
CHEMBL1614136PubChem BioAssaysFPhotinus pyralis19PUBCHEM_BIOASSAY: Counterscreen Assay for Enhancers of SMN2 Splice Variant Expression: Interaction with Luciferase Reporter for Probe SAR. (Class o…CHEMBL1201862
CHEMBL1614348PubChem BioAssaysFHomo sapiens7PUBCHEM_BIOASSAY: Counterscreen Assay for Enhancers of SMN2 Splice Variant Expression: Modulation of SMN1 Expression for Probe SAR. (Class of assay…CHEMBL1201862
CHEMBL945043Scientific LiteratureFHomo sapiens3Increase in number of cells with Cajal bodies containing SMN2 protein in human fibroblasts from spinal muscular atrophy patient at 100 nM after 5 d…J. Med. Chem., (2008) 51:3:449
CHEMBL945042Scientific LiteratureFHomo sapiens3Increase in number of Cajal bodies containing SMN2 protein in human fibroblasts from spinal muscular atrophy patient at 100 nM after 5 days relativ…J. Med. Chem., (2008) 51:3:449
CHEMBL945041Scientific LiteratureFHomo sapiens3Increase in SMN2 mRNA expression in human fibroblasts from spinal muscular atrophy patientJ. Med. Chem., (2008) 51:3:449
CHEMBL1826384Scientific LiteratureB2Activation of SMN expressed in HEK293 cells by SMN2-promotor driven luciferase reporter gene assayJ. Med. Chem., (2011) 54:18:6215
CHEMBL944099Scientific LiteratureFMus musculus1Activation of SMN2 promoter in human NSC34 cells assessed as induction of beta-lactamase activity at 30 uMJ. Med. Chem., (2008) 51:3:449
CHEMBL945040Scientific LiteratureFMus musculus1Increase in SMN2 mRNA expression in mouse NSC34 cells at 500 nM relative to controlJ. Med. Chem., (2008) 51:3:449

There are about 20 assays based on SMN2 constructs in the ChEMBL DB. Among all we have chosen the reporter assay described in the data description section. The assay contains 30k records, which is an interesting amount to apply ML methodology, but beyond that, it is a blackbox assay, which means that the patterns of expression of functional SMN2 protein can be elicited by a plethora of cellular mechanisms that can be further investigated to identify protein targets involved. Below, assay description of the most relevant SMN2 assays and distribution histogram of the activities recorded for the 30k molecules screen.

SMN2 assays in ChEMBL DB ranked by the amount of compounds tested. The SMN1/SMN2 reporter assay, the most represented, is the selected one to carry out machine learning protocols.

Activity distribution histogram of the 32k compounds assayed in SMN2 reporter assay.

The 30k molecules used in SMN2 reporter screen have also been assayed in many more experimental models in the ChEMBL DB producing almost 1M records in more than 50k assays. We can therefore use the activity of these compounds on all these assays to build ML models suitable for prediction of activity of molecules that have never faced to a SMN assay. Charts below display some examples of these assays and the global distribution of activities of their respective records.

Barchart representing the frequency of the different assays having records for the molecules used in SMN2 reporter screen. The inset is a zoom upon the most relevant assay targets.

Activity distribution histogram of the 30k SMN2 screen molecules on all non SMN2 assays recorded in ChEMBL DB.

And, of course, ahead of finding relations, or making predictions, why not to have a look upon the relationship between the activity in SMN2 and the rest of the assay these molecules participate in? The picture shows the existing correlation between SMN2 and  other targets for the cases for R2 >0.5, slope>0.4 and N>10.

Evaluation of predictive models.

For model evaluation, data have been splitted in validation and test sets The validation set has been labelled for classification (Binary and multiple) and regression and used to generate the model. The test set is tested against the model and the predictions compared to the previously hidden activity columns. Accuracy of predictions is then assessed by the classical machine learning evaluation parameters in the charts below.

Validations set are passed through decision trees (Rpart: classification and regression) random forests (classification and regression), and Naive Bayes (binary and multiple classification) algorithms. Each algorithm is tested against different combinations of variables: All possible variables, a wide combination of biological and chemical variables, just biological, just chemical, and a minimal combination of both. For each combination of classification algorithm and variable set, a confusion matrix and a bar diagram with most meaningful statistical quality descriptors is charted below. If regression, there is a correlation plot instead.

 

Decision Trees Classification (confusion matrix)

 

Decision Trees (QC descriptors)

 

 

 

Random Forests Classification (confusion matrix)

 

Random Forests Classification (QC descriptors)

 

 

 

Naive Bayes Classification (ConfusionMatrix)

 

Naive Bayes (QC descriptors)

 

All algorithms compared in their respective best condition tested.

And the three binary classification algorithms visual confusion matrixes compared in their best conditions…

Best fitted Rpart algorithm

Best fitted RF algorithm

Best fitted NB algorithm

When classification algorithms result in acceptable quality they are submitted to 10 iterations cross-validations to assess predictions robustness and stability of QC descriptors

Rpart cross validation

Random Forest cross validation

Scores prediction by regression. Although prediction by decision tree results in a poor estimation of potencies, Random forest regression shows a high correlation with actual ChemblScore data.

Rpart regression

Random forest regression

Based on these quality indicators, compound selection will be carried output using a dashboard that contains classification predictions from Rpart and random forests so as forests regression.. Naive Bayes output to be included in the dashboard just as an optional end filter.

Compound selection. Description of dashboard uses.

Let’s look at a typical multimethod compound selection dashboard. It is a panel where predictions through different models are interactively compared to allow quality control of the selection.   As most compounds include multiple experimental events recorded in the database, and there is a prediction for each event, we need to calculate an index of activity for each compound. For regressions, this is the average SMN2 score (ActualPlusPredictedSMN2Score), which uses actual values when they exist. Classification will use the ratio between the number of  times a compound has been predicted active vs the total number of events (countRatio). In this example we will use binomial random forests classification.

Left chart in the dashboard below visualizes the ActualPlusPredictedSMN2Score for each unique molecule-target pair vs the number of occasions that this pair has a result in the DB. Marking the most active (either actual or predicted) in a convenient number of occasions in this chart causes the marked compounds to appear in the top right plot that compares the average SMN2 potency (actual and predicted. ) with the number of records for each molecule. Marking the desired compounds in this plot raises the down right chart with the random forests classification countRatio.

 

Now, it’s just a matter of making the selection at your convenience. The table here contains the 143 most potent predicted SMN2 actives with the highest count ratios, as indicated in more intense dot colors in the dashboard.

molregnomw_freebaseCount(chemblActivityScoreC)Avg(predictionRFRegression)RFBinomialCountRatiocanonical_smiles.x
109099304.3709.09691001300810.97142857142857Oc1ccc(cc1O)C(=O)\C(=C\c2ccc3[nH]ccc3c2)\C#N
821766399.31169.04575749056070.9375CCCC[C@@]1(Cc2cc(OCC(=O)O)c(Cl)c(Cl)c2C1=O)C3CCCC3
423602352.41138.82215609687451COc1cccc(c1)C2=Nn3c(SC2)nnc3c4ccccc4OC
423625352.4138.72277769968061COc1ccc(cc1)C2=Nn3c(SC2)nnc3c4cccc(OC)c4
423624352.4118.71037140546931COc1ccc(cc1)C2=Nn3c(SC2)nnc3c4ccccc4OC
423581352.4118.71037140546931COc1ccccc1C2=Nn3c(SC2)nnc3c4ccccc4OC
783909352.33198.491Fc1ccccc1C(=O)N2CCN(CC2)c3cccc(c3)C(F)(F)F
912276448.52378.47837837837841CCOc1ccc2nc(O)c(cc2c1)C(N3CCC(C)CC3)c4nnnn4Cc5occc5
423582352.4178.4137543018561COc1cccc(c1)c2nnc3SCC(=Nn23)c4ccccc4OC
423584352.4128.39359476169491COc1ccc(cc1)c2nnc3SCC(=Nn23)c4ccccc4OC
692092311.3418.36406600130081CC(C)[C@@]1(C)N=C(NC1=O)c2nc3ccccc3cc2C(=O)O
762709507.6398.350.8974358974359CCOC(=O)c1c2CCCCc2sc1NC(=O)c3c(N)n(CCCOC)c4nc5ccccc5nc34
1159205457.5728.0651OC(=O)CCNc1sc2CCCCc2c1Cc3nnc(S)n3NC(=O)c4ccccc4
1418094304.4108.050.9[Br-].C[N+]1(C)[C@@H]2CC[C@H]1C[C@H](C2)OC(=O)C(CO)c3ccccc3
661674304.468.051C[N+]1(C)[C@@H]2CC[C@H]1C[C@H](C2)OC(=O)C(CO)c3ccccc3
1273757320.3448.051Fc1cccc(NS(=O)(=O)c2ccc3NC(=O)CCc3c2)c1
984482207.25238.02608695652170.95652173913044Sc1nnc(c2cocc2)n1CC=C
807723493.53118.01593939393940.90909090909091CCCN1C(=O)NC(=O)C(=C1N)C(=O)CSC2=Nc3ccccc3C(=O)N2c4ccc(OC)cc4
807129430.46168.0050.9375N=C1N(CC2CCCO2)C3=C(C=C1C(=O)NCc4cccnc4)C(=O)N5C=CC=CC5=N3
1218332302.321081CCOC(=O)COc1ccc2C3=C(CCCC3)C(=O)Oc2c1
359668417.4817.97751Cc1ccc(C(=O)NC(CCS)C(=O)N[C@H](Cc2ccccc2)C(=O)O)c(O)n1
992556403.47217.95059523809521CC1CCN(CC1)C(C(=O)NC2CCCCC2)c3cc4OCOc4cc3[N+](=O)[O-]
150696233.1947.946251C[C@@](O)(C(=O)Nc1ccccc1)C(F)(F)F
116405233.1947.946251CC(O)(C(=O)Nc1ccccc1)C(F)(F)F
1160938233.3117.946251CC(C)(C)NC(=O)C(O)\C=C\c1ccccc1
252654192.2447.911Cc1nc(cs1)c2c[nH]c(C=O)c2
1702332211.6557.91C\C(=N/NC(=O)N)\c1ccccc1Cl
834050492.5727.86441666666671CCN(C(=O)CN1CCN(CC1)c2ccccc2OC)C3=C(N)N(Cc4ccccc4)C(=O)NC3=O
1104771368.4537.83751Cc1ccc(NC(=O)C(=O)NCCN2CCN(CC2)S(=O)(=O)C)cc1
938752320.3477.83392857142861Fc1ccccc1NS(=O)(=O)c2ccc3NC(=O)CCc3c2
255016285.26587.79181034482761OC(=O)Cn1cnc2c(O)nc(Nc3ccccc3)nc12
984429305.3677.75714285714291CC(=O)N(CC1CCCO1)c2nnc(s2)c3cnccn3
1009982473.52297.74991379310341COc1ccc(cc1)C2=CN(C(=O)N2CC(=O)Nc3ccc(OC)cc3OC)c4ccc(C)cc4
888665366.26127.74635416666671Cc1ccc2SCC(NC(=O)c3ccc(Cl)c(Cl)c3)C(=O)c2c1
1312048201.25377.73751c1ccc(cc1)c2nc3sccn3n2
1538251201.2527.73751c1csc(c1)c2ccc3nncn3c2
1839634252.3157.7156025032521Cl.N[C@H]1C[C@H]1c2cccc(NC(=O)c3ccccc3)c2
1672353488.66157.68751SCCC(=O)OCC(COC(=O)CCS)(COC(=O)CCS)COC(=O)CCS
869149389.4737.67083333333331COc1ccc(CN(C2CCS(=O)(=O)C2)C(=O)c3cccc(OC)c3)cc1
1302993246.26547.65171005274011COc1cc2ccc(CC(=O)O)cc2cc1OC
1698463333.3947.651Cc1nn(Cc2ccccc2)c(C)c1\C=N\NC(=O)c3ccncc3
405235304.37207.64251CCCNc1nnc(CN2C(=O)Oc3ccc(C)cc23)s1
548008267.1237.6356251Cl.Brc1ccc2OC(Cc2c1)C3=NCCN3
1110569277.727.631COc1ccc(Nc2cc(Cl)ccc2C(=O)O)cc1
865534312.43267.59251Cc1cccc(C)c1NC(=S)NC(=O)CCc2ccccc2
1962360402.427.5751OC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)c2cccc3C(=O)c4ccccc4Nc23
183470284.2827.563751[Na+].[O-]S(=O)(=O)C(F)(F)c1cccc(c1)c2ccccc2
1273631323.3527.561Cn1ccnc1NC(=O)c2ccc3cc4C(=O)NCCCn4c3c2
922240288.34137.51153846153851CCC(=O)c1cn(Cc2ccccc2C#N)c3ccccc13
69225178.2147.51Nc1nc(ns1)c2cccnc2
69259178.2147.51Nc1nc(ns1)c2ccncc2
985902516.63167.51COc1ccc(CNC(=O)[C@H](C)[C@H]2C[C@]2(C)[C@H](NC(=O)OCc3ccccc3)c4ccccc4)cc1OC
130479239.2937.49251CCCCN1C(=O)c2ccccc2S1(=O)=O
1208689355.4137.47811032037791COc1ccccc1c2nnc(SC(C)C(=O)O)n2c3ccccc3
746541302.35427.46607142857140.95238095238095COc1ccccc1NC(=S)NC(=O)\C=C\c2occc2
23297444.5427.46251CC[C@H](C)[C@H](NC(=O)[C@@H](S)c1ccccc1)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)O
23927444.5427.46251CC(C)[C@H](NC(=O)[C@@H](S)Cc1ccccc1)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)O
748748399.4677.43751CCc1ccc(NC(=O)c2ccc(NS(=O)(=O)c3c(C)onc3C)cc2)cc1
807870498.55197.42763157894741CCOc1cc(ccc1OS(=O)(=O)c2ccc(C)cc2)C(c3c(C)n[nH]c3O)c4c(C)n[nH]c4O
1207664269.2627.4251Cc1ccc(Nc2ncc3c(O)nnc(O)c3n2)cc1
31016255.2427.4242275032521NCCc1c[nH]c(n1)c2ccccc2C(F)(F)F
886994274.36137.41346153846151N#Cc1ccccc1OCCOCCNC2CCCC2
1402809238.2427.41O.NC(=O)OCC(COC(=O)N)c1ccccc1
57376262.2737.381251CC(C)(C)C(=O)OCn1nnnc1c2cnccn2
623319339.3727.3656251COc1ccc(cc1)N2C=Nc3c(sc4nc(C)nc(N)c34)C2=O
1109330285.2927.36251COc1ccc(cc1)C(=O)NC(C(=O)O)c2ccccc2
744335498.5357.36251COc1ccc(cc1)c2nc3NC(=C(C(c4ccc(OC)c(OC)c4)n3n2)C(=O)Nc5cccnc5)C
503123218.3357.36251CC(=CCCC(C)(O)c1ccc(C)cc1)C
775141305.36157.36251CN(C(=O)C1CCCO1)c2nnc(s2)c3cnc(C)cn3
969153250.2847.3593751COC(=O)CSc1nnnn1c2ccccc2
1320560235.06557.35751OCC1COc2cc(Cl)c(Cl)cc2O1
1073814292.3337.3551COc1c2CCC(C)(C)Oc2c(C(=O)C)c(OC)c1C=O
796832339.43117.3528030303030.90909090909091CC(C)c1nnc(NC(=O)CS(=O)(=O)Cc2ccccc2)s1
769821322.2957.351Fc1ccc(cc1F)C(=O)CSc2oc(nn2)c3occc3
898090339.3977.34142857142861COc1ccccc1NC(=O)N2C[C@@H]3C[C@@H](C2)C4=CC=CC(=O)N4C3
359615497.5227.341OC(=O)[C@@H](Cc1ccccc1)NC(=O)C(CCS)NC(=O)c2oc(cc2)c3ccccc3[N+](=O)[O-]
360050430.5227.33751Cc1cc(cc(C)c1O)C(=O)NC(CCS)C(=O)N[C@H](Cc2ccccc2)C(=O)O
868579311.3447.32751Cc1c2C=NN(CC(=O)O)C(=O)c2c(C)n1Cc3ccccc3
1238753255.3447.3156251CN1CCN(CC1)S(=O)(=O)c2ccc(N)cc2
1244843347.2327.31251CN(CC(=O)NC1CC1)S(=O)(=O)c2ccc(Br)cc2
825043394.42137.29615384615390.92307692307692COCCN1C(C(=C(O)C1=O)C(=O)c2ccc(OCC=C)cc2)c3cccnc3
851047349.43127.286250.91666666666667O=C1N(CCCN2CCOCC2)C(=Nc3ccccc13)c4ccccc4
827328249.3577.27071428571431Cc1cc(C)c(C)c(OCCN2CCOCC2)c1
297984423.3927.2651COc1cc(cc(\C=C\C(=O)O)c1OC)S(=O)(=O)Nc2ccc(C(=O)O)c(O)c2
736479255.74187.25486111111111Cc1cc(OCCN2CCOCC2)ccc1Cl
280445243.2227.251Oc1nc(O)c(C#N)c(NCc2cccnc2)n1
887078483.5627.251CCN(CC)CCN1C(C(=C(O)C1=O)C(=O)c2c(C)[nH]c(C(=O)OC)c2C)c3ccc(OC)cc3
688354229.6727.251Clc1cccc(c1)c2cnc3ncccn23
122441496.5367.21251OC(=O)CCNC(=O)[C@H](CC1CCCCC1)NC(=O)[C@H](CCCc2ccccc2)CP(=O)(O)O
552739257.1237.2106251Cl.Clc1ccc2CC(Oc2c1Cl)C3=NCCN3
737321340.46137.20480769230771CCC(=O)N(Cc1cc2c(C)ccc(C)c2nc1O)C3CCCCC3
1099633334.4277.20107142857141Cc1csc(NC(=O)CSc2nccn2Cc3occc3)n1
187199444.5747.21OC1(CNCc2ccccc2)CCN(CCCc3c[nH]c4ccc(cc34)n5cnnc5)CC1
14961370.4737.19583333333331CCN1CCCC1CNC(=O)c2cc(ccc2OC)N(C)S(=O)(=O)N
132747227.6947.193751CCC(C)(O)C(=O)Nc1ccccc1Cl
290126349.3837.18751NC(=O)CNC(=O)COCC1CCCN1C(=O)OCc2ccccc2
153953584.6627.1751Cc1cc(C)nc(O[C@H](C(=O)O)[C@]2(NCC(=O)N(Cc3ccc(cc3)c4ccccc4)c5ccccc25)c6ccccc6)n1
861885368.45117.17272727272731CCOC(=O)[C@H]1CCC[C@H](C1)NS(=O)(=O)c2ccc(NC(=O)C)cc2
360264355.4747.17251O=C(C1CN(C2CCCCC2)C(=O)C1)N3CCN(CC3)c4ccccc4
804660352.41147.16607142857140.92857142857143COc1ccc(cc1)N(C(C)C(=O)NCc2occc2)S(=O)(=O)C
580465454.9527.16251Cn1nccc1c2cc(NC(=O)Nc3ccc(Cl)cc3)ccc2OCCN4CCNCC4
1345337462.8927.151Clc1cc(Nc2ncncc2c3occ(CNC(=O)C=C)n3)ccc1OCc4ccccn4
1698917289.31147.151COC(=O)\C=C\1/SC(=NC1=O)N\N=C\c2ccccc2
881856500.5547.151CCc1nnc(NC(=O)CSc2nnc(Cc3cc(O)nc(O)n3)n2c4ccc(OC)cc4)s1
751364500.5557.151CCOc1ccc(cc1)n2c(Cc3cc(O)nc(O)n3)nnc2SCC(=O)Nc4nnc(C)s4
466426408.41117.14090909090911COc1ccc(cc1)N2N=C(c3nc4ccccc4[nH]3)c5nc6ccccc6n5C2=O
1786409412.3727.1351CN(C(=O)c1cn2c(cnc2cn1)c3ccc(cc3)C(F)(F)F)c4ccc(C)nn4
789115288.3187.13314814814821CC(=O)OC(C)(C)C1Cc2cc3C=CC(=O)Oc3cc2O1
261971311.3827.1251Oc1c(cnc2ccccc12)C(=O)NC3CCN4CCCC3C4
895463311.36107.12151CN(C)S(=O)(=O)c1ccc2c(c1)nc(CCC(=O)O)n2C
396730352.75197.11604463579811Fc1ccc(Cl)cc1c2nc(Nc3ccncc3)c4nccnc4n2
1017627997.39137.11538461538460.92307692307692CC(C)(C)CC(C)(C)c1ccc(OCOCCOCCO)c(Cc2cc(cc(Cc3cc(ccc3OCOCCOCCO)C(C)(C)CC(C)(C)C)c2OCOCCOCCO)C(C)(C)CC(C)(C)C)c1
1003526309.1457.1138751Cc1onc(NC(=O)CSCC(=O)O)c1Br
763492320.3827.1091COc1ccc(cc1OC)C(=O)NCC(=O)N2CCC(C)CC2
2003920443.4627.10251CC1(C)OCC(=N[C@](C)(c2cc(N[C@H]3CCc4cc(cnc34)C#N)ccc2F)C1(F)F)N
1534460416.5127.11CCCCNC(=O)C[C@@H]1C[C@]2(CCCCC=C2N(Cc3occc3)C1=O)C(=O)OC
160137150.1337.09751Cc1ccc2nc(O)oc2n1
1225532355.4157.0949853203781COc1ccccc1c2nnc(SCC(=O)O)n2c3ccc(C)cc3
867806307.2667.091COC(=O)c1cnc2N(C)C(=O)N(C)C(=O)c2c1C(=O)OC
379867336.4727.076251C[C@@H]1C[C@@H](O)[C@H]2[C@@](C)(CO)CCC[C@]2(C)[C@@]1(O)CCc3cocc3
449995468.5927.07251COc1cc2nc(nc(NC3CCCCNC3=O)c2cc1OC)N4CCC(CC4)N5CCCC5
2009201405.427.07251OC(=O)C[C@@H](Cc1ccc(cc1)c2ccccc2)NC(=O)c3cc(ncn3)C(=O)O
951444230.2657.0551Cc1nn(Cc2ccccc2)c(C)c1C(=O)O
145945312.3237.05416666666671OC(=O)C1CCN(CC1)c2ccc3[nH]c4nc(O)nc4cc3c2
815015357.3557.05251CS(=O)(=O)C1=C(O)C(=O)N(CCCC(=O)O)C1c2ccccc2F
830234396.48497.05102040816330.93877551020408C\C(=C/c1ccccc1)\C=C\2/SC(=S)N(NC(=O)c3cccc(O)c3)C2=O
97880531.6977.051COc1ccc(CNC(=O)C(N2CCN(CC(O)COc3c(C)cccc3C)CC2)c4ccc(C)cc4)cc1
206152260.25537.051CC(=O)Nc1ccc(Nc2cc(O)nc(O)n2)cc1
854753213.347.051CCCC(C)C(=O)Nc1nnc(C)s1
915974236.6567.0496137516261OC(=O)c1ccc(Cn2cc(Cl)cn2)cc1
1224007311.3847.04751CCOC(=O)CC1N(CCNC1=O)C(=O)NC2CCCCC2
917463246.357.04251C[C@@H](C(=O)O)c1ccc(C[C@@H]2CCCC2=O)cc1
959645434.3877.03421428571431Fc1cccc(c1)N2C(=O)[C@H]3NN=C([C@H]3C2=O)C(=O)CCN4C(=O)c5ccccc5C4=O
912837301.3467.02083333333331Cc1ccc(OCC(=O)NCC(=O)n2nc(C)cc2C)cc1
950421263.477.01559523809521Cc1ccccc1CSCC(=O)NC2CCCC2
903207291.34117.0150.90909090909091CC1CCCC(C)N1C(=O)COC(=O)\C=C\c2occc2
1563165253.2527.01251Nc1ccc(cc1)C2=CC(=O)c3c(O)cccc3O2
1328495281.31577.0117275032521OC[C@@]1(OC[C@@H]1c2ccccc2)n3nc4ccccc4n3
1209849311.34371Cn1nccc1c2coc3c(cccc23)C(=O)N4CCOCC4
942098349.446.96251Cc1ccc(cc1C(=O)c2nccn2C)S(=O)(=O)N3CCOCC3
1834654252.6326.9617275032521OC(=O)c1cc(ncn1)c2ccc(F)c(Cl)c2
832689283.3586.92718751CC(=O)Nc1nnc(SCc2cccc(F)c2)s1
410704483.5446.9051CNC(=O)Cc1ccccc1CNC(=O)c2nc(N3CCCCS3(=O)=O)c4cccnc4c2O

And here, some of the most potent structures