MIT researchers have developed a new algorithm based on panoramic photography to merge diverse cell datasets into a single source.
While traditional methods tend to cluster cells together based on nonbiological patterns or accidentally merge dissimilar cells do not scale well to large datasets, a new method based on panoramic photography is able to merge diverse datasets into a single source effectively. The team were able to merge more than 100,000 cells from 26 different datasets.
Single-cell datasets profile the gene expressions of human cells. For example, cells such as:
- Muscles; and
- Immune cells.
- This gives an insight into human health and treating disease.
A range of labs and technologies produce cell datasets. They contain diverse cell types and combining them into a single pool of data could be helpful in creating research possibilities. However, that is a difficult task to complete effectively and efficiently.
Using panoramic photography as inspiration
Researchers from MIT have described an algorithm called “Scanorama” that can efficiently merge over 20 datasets of vastly differing cell types. It merges them into a larger “panorama.”
Scanorama finds and merges shared cell types between two datasets. MIT describe this as similar to combining overlapping pixels in images for panoramic photography.
Brian Hie, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and a researcher in the Computation and Biology group, said: “Traditional methods force cells to align, regardless of what the cell types are. They create a blob with no structure, and you lose all interesting biological differences. You can give Scanorama datasets that shouldn’t align together, and the algorithm will separate the datasets according to biological differences.”
Hie added: “Even if you need to sketch, integrate, and reapply that information to the full datasets, it was still an order of magnitude faster than combining entire datasets.”