The genes are allocated to a bin based on their average log-level of expression, then with each bin the dispersion (variance over mean of the log-levels) is z-scored. Genes not having a sufficiently high dispersion z-scores are excluded from the dataset.
This method replicates FindVariableFeatures
from the Seurat package.
zscore_threshold | a numeric value indicating the zcored dispersion threshold above which the gene names are returned. Default to 0. |
---|---|
num_bins | a integer value indicating the number of bins used to calculate z-score into. Default to 20. |
data_status | character string. Specifies whether the gene expression levels used for calculation are raw ("Raw"), normalized ("Normalized") or have been imputed ("Smoothed"). Default to "Raw". |