FiRE: Finding a needle in the haystack.

Sindhuri Upadrasta

Many biologically relevant cells occur in such low frequencies that their detection and analysis in tissues with millions of other cell types poses a problem for researchers. A new study from researchers at IIT Delhi describes a new algorithmic approach which helps detect rare cells much faster than the techniques that are currently in use.

FiRE: Finding a needle in the haystack.
FiRE: Finding a needle in the haystack.  

Cells that exist in low numbers as compared to surrounding cells are called rare cells. These include circulating tumor cells, cancer stem cells, antigen specific T-cells etc. In spite of their low abundance, these cells play important physiological roles which can range from initiating cancer metastasis to fighting against infectious agents. In a recent study led by Debarka Sengupta and Jayadeva from the Indian Institute of Technology (IIT), Delhi, researchers have come up with a computational algorithm called FiRE (Finder of Rare Entities,) which can identify these rare cells in a matter of seconds.

In order to identify rare cells, one must study the activity or expression of thousands of genes from a large number of cells. While new techniques like single cell transcriptomics are capable of examining the expression of genes of thousands of individual cells, making sense out of this data can be a very tedious process. The existing algorithms identify rare cells by grouping cells based on similarities in their gene expression. This approach (called “clustering”) is dependent on a number of sensitive parameters. This makes these algorithms time-consuming as well as memory-intensive.

FiRE bypasses the clustering step and instead assigns each cell with a “hash” code, which acts as its index within the large dataset. Major cell types tend to share their code with many more cells as compared to minor cell types. This allows FiRE to use these hash codes and a few other estimations to generate a ‘rareness’ score to determine whether a cell type is rare or not.

The researchers were able to apply FiRE to various publicly available single cell expression datasets and identify a known rare cell population called megakaryocytes, which makes up only ~0.3% of the total cell population. FiRE was also able to discover rare cell types from a number of real and simulated data sets. As compared to other algorithms, FiRE was able to correctly detect a larger number of rare cells as annotated in the datasets.

Another advantage of the FiRE algorithm is that it can identify minor cell populations even when provided with a minimal set of differentially expressed genes (~20). The algorithm is also able to generate results at exceptionally high speeds and makes minimal usage of computational memory. “The authors made ingenious use of big data techniques to estimate rareness of entities,” says Sanghamitra Bandyopadhyay, Professor, Machine Intelligence Unit and Director of Indian Statistical Institute, Kolkata.

“We went through existing mouse brain data set and predicted a truly rare sub-type of the pituitary gland related tissue called pars tuberalis,” says Sengupta, referring to an experiment where the researchers used FiRE to generate compelling evidence of a new rare cell type in the mouse brain. Bandopadhyay referred to this as one of the most significant results from this study.

FiRE could not only identify the various types of cells in a tissue but also identify sub-types of cells. This particular aspect can be useful to identify intra-tumor differences and identify small clonal populations of drug resistant cells.

“Despite tremendous promise, the absence of a scalable rare cell detection method had stagnated the field. The existing algorithms struggle to scale for large scale single-cell data. FiRE breaks this long-standing barrier by ingenious use of big data techniques discovery of rare cells. FiRE will surely pave the way for the discovery of numerous rare cell lineages,” says Bandopadhyay.

Did you like this article? Please let us know in the comments below