The machines are learning to develop new targeted drug therapies


From anti-cancer to antibiotic, high throughput screening is a powerful method of drug discovery. Researchers identify a target protein and test tens if not hundreds of thousands of chemicals for their ability to inhibit its function in the lab setting. Then, they whittle the chemical pool to a handful of “hits” that serve as chemical precursors to safe, effective drugs.

But these screens are cost-, time- and labor-intensive. What if there were a way to screen these molecules, or even more molecules, at a lower cost but just as accurately?

Enter the machines, and virtual screening.

“We are trying to use computational techniques to go from millions of compounds to a few potent hits, and do that faster, cheaper and better,” says Scott Wildman, a chemist with the University of Wisconsin Carbone Cancer Center’s Drug Development Core (DDC).

In this video, a docking program takes small molecules — the colored pieces — and tries to fit them into the target protein. The better the fit, the better the score. (source: UW DDC staff)

Dozens of “docking” programs have been developed where researchers input their target of choice. The program then virtually tries to fit, or dock, a molecule into the target, like a three-dimensional puzzle. If the program can dock it easily, the molecule scores high; if the pieces don’t fit, the molecule scores low. The researchers can then decide a cut-off score for presumed “hits” and “misses,” leaving a more manageable number to use in a real-life experiment to confirm the hits.

Graphs of docking scores vs number of molecules screened are shown. The two curves overlap significantly when only one docking program is used, but with machine learning, the curves of hits and misses barely overlap.

When one docking program is used, there is lots of overlap, in purple, between hits and misses. Using the average score from multiple programs reduces that overlap. In this study, Ericksen et al used machine learning algorithms to separate hits from misses even more accurately, meaning most “hits” will be further screened in wet lab experiments, while the majority of misses will be removed from consideration.

Many research groups rely on just one docking program because of the cost of computing power, although it can still be difficult to separate the hits from the misses. If researchers have access to, as Wildman says, “a ridiculous number of hours of computing time,” then they can obtain a more accurate separation of hits and misses by running multiple docking programs and taking the consensus score. Luckily, Wildman and his DDC colleagues have just that – and more – through their collaboration with UW’s Center for High Throughput Computing (CHTC).

In their new study, published in the Journal of Chemical Information and Modeling, DDC and CHTC researchers added a machine learning component on top of running multiple docking programs to improve virtual screening.

First, they took a known dataset of 21 target proteins and tens of thousands of small molecules and ran them through eight common docking programs. Next, they applied machine learning algorithms to combine the scores to better separate hits from misses such that even if one docking program does poorly, the other programs can still provide reliable scores. Because they already knew which molecules were hits and which were misses for each target, they could train the machine learning algorithms to learn what good scores or bad scores looked like.

“The machine learning algorithms are finding relationships between the programs. For example, let’s say that when program #1 and #6 both give good scores, then that molecule is always a hit,” says Spencer Ericksen, a DDC scientist and lead author of the study. “It up-weights and down-weights scores accordingly, and knows how to sort through the data to identify good from bad. That’s the beauty of machine learning.”

The researchers applied two machine learning algorithms to obtain their final, trained model. First, they trained a model using data from 20 of the known targets, confirming the scoring performance on the remaining target (leaving out any training from the 21st “test” target so as to avoid bias). They also built a second model in collaboration with UW statistics professor Michael Newton (he is also Associate Director of the Center for Predictive Computational Phenotyping, which provided funding support). Both types of models work better at sorting hits from misses than single docking programs over a wide variety of target protein classes.

“The approaches we took are well known in the math and statistics fields but have not been applied to computational chemistry problems before now,” Wildman, the study’s senior author, says. “Working with collaborators from across campus provided access to mathematical techniques and computing resources not typically available for virtual screening,”

Because the weighting of scores during the training step is related to the docking programs rather than the targets, the consensus docking approach requires only that the target protein structure is known. This means their models are now available to screen protein targets for new hits.

“The cool thing here is that because we trained the models on known targets, we can now apply them to new target proteins implicated in human diseases,” says Ericksen.

Wildman, Ericksen and colleagues are now hoping to collaborate with groups across campus looking to discover new drugs.