# Analysis of training and seed bias in small molecules generated with a conditional graph-based variational autoencoder - Insights for practical AI-driven molecule generation

@article{Kang2021AnalysisOT, title={Analysis of training and seed bias in small molecules generated with a conditional graph-based variational autoencoder - Insights for practical AI-driven molecule generation}, author={Seung-gu Kang and Joseph A Morrone and Jeffrey K. Weber and Wendy D. Cornell}, journal={ArXiv}, year={2021}, volume={abs/2107.08987} }

The application of deep learning to generative molecule design has shown early promise for accelerating lead series development. However, questions remain concerning how factors like training, dataset, and seed bias impact the technology’s utility to medicinal and computational chemists. In this work, we analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE). Leveraging a massive, labeled dataset corresponding to the… Expand

#### Figures and Tables from this paper

#### References

SHOWING 1-10 OF 44 REFERENCES

PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning

- Medicine
- iScience
- 2021

A hybrid Variational Autoencoder is constructed that tailors molecules to target-specific transcriptomic profiles, using an anticancer drug sensitivity prediction model (PaccMann) as reward function and frequently exhibit the highest structural similarity to compounds with known efficacy against these cancer types. Expand

Constrained Graph Variational Autoencoders for Molecule Design

- Computer Science, Mathematics
- NeurIPS
- 2018

A variational autoencoder model in which both encoder and decoder are graph-structured is proposed and it is shown that by using appropriate shaping of the latent space, this model allows us to design molecules that are (locally) optimal in desired properties. Expand

Multi-objective de novo drug design with conditional graph generative model

- Computer Science, Mathematics
- Journal of Cheminformatics
- 2018

A new de novo molecular design framework is proposed based on a type of sequential graph generators that do not use atom level recurrent units, which is much more tuned for molecule generation and has been scaled up to cover significantly larger molecules in the ChEMBL database. Expand

Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders

- Computer Science, Mathematics
- NeurIPS
- 2018

A regularization framework for variational autoencoders is proposed that focuses on the matrix representation of graphs and formulate penalty terms that regularize the output distribution of the decoder to encourage the satisfaction of validity constraints. Expand

Junction Tree Variational Autoencoder for Molecular Graph Generation

- Computer Science, Mathematics
- ICML
- 2018

The junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network, which allows for incrementally expand molecules while maintaining chemical validity at every step. Expand

Low Data Drug Discovery with One-Shot Learning

- Computer Science, Mathematics
- ACS central science
- 2017

This work demonstrates how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications and introduces a new architecture, the iterative refinement long short-term memory, that significantly improves learning of meaningful distance metrics over small-molecules. Expand

GuacaMol: Benchmarking Models for De Novo Molecular Design

- Computer Science, Medicine
- J. Chem. Inf. Model.
- 2019

This work proposes an evaluation framework, GuacaMol, based on a suite of standardized benchmarks, to standardize the assessment of both classical and neural models for de novo molecular design, and describes a variety of single and multiobjective optimization tasks. Expand

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction.

- Mathematics, Medicine
- Journal of chemical information and modeling
- 2017

A convolutional neural network is employed for the embedding task of learning an expressive molecular representation by treating molecules as undirected graphs with attributed nodes and edges, and preserves molecule-level spatial information that significantly enhances model performance. Expand

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

- Computer Science, Mathematics
- ICANN
- 2018

This work proposes to sidestep hurdles associated with linearization of discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once by formulated as a variational autoencoder. Expand

Randomized SMILES strings improve the quality of molecular generative models

- Medicine, Computer Science
- Journal of Cheminformatics
- 2019

An extensive benchmark on models trained with subsets of GDB-13 of different sizes, with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations shows that models that use LSTM cells trained with 1 million randomized SMilES are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Expand