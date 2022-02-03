Researchers have developed a learning system that uses large amounts of unlabeled data that other models cannot.

Researchers from the College of Engineering at Carnegie Mellon University decided to use a large volume of unlabeled molecules to build machine learning models. It makes predictions better than other models.

Researchers have created a self-learning AI MolCLR using Graph Neural Networks (GNNS).

MolCLR significantly improves the performance of machine learning models by using approximately 10 million unlabeled molecular data.

To explain how labeled and unlabeled data work, consider pictures of dogs and cats. In one set, each animal is labeled with the name of its species. In another set, images are not accompanied by inscriptions.

For humans, the difference between these two types of animals can be obvious. But for a machine learning model, no. This means that untagged data may not always train the model correctly. If we apply this analogy to the millions of unlabeled molecules that can take decades for humans to manually identify, it becomes clear that the problem needs to be addressed differently.

The research team taught their MolCLR framework how to use untagged data to compare positive and negative pairs from an extended molecular graph. Plots converted from the same molecule are considered a positive pair, while plots converted from different molecules are considered a negative pair. Therefore, similar molecules stay close to each other, while the rest move far away.

During the tests, the machine learning model performed better than others and was able to distinguish which chemicals pose the most serious threat to human health.