Title: | Deconvolution for LINCS L1000 Data |
---|---|
Description: | LINCS L1000 is a high-throughput technology that allows the gene expression measurement in a large number of assays. However, to fit the measurements of ~1000 genes in the ~500 color channels of LINCS L1000, every two landmark genes are designed to share a single channel. Thus, a deconvolution step is required to infer the expression values of each gene. Any errors in this step can be propagated adversely to the downstream analyses. We present a LINCS L1000 data peak calling R package l1kdeconv based on a new outlier detection method and an aggregate Gaussian mixture model. Upon the remove of outliers and the borrowing information among similar samples, l1kdeconv shows more stable and better performance than methods commonly used in LINCS L1000 data deconvolution. |
Authors: | Zhao Li[aut], Peng Yu[aut, cre] |
Maintainer: | Zhao Li <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.0 |
Built: | 2025-02-26 03:45:44 UTC |
Source: | https://github.com/cran/l1kdeconv |
Get the Cluster Ranges in a Vector of 1D Coordinates
getclusterranges(x, gap)
getclusterranges(x, gap)
x |
a numeric vector |
gap |
the size for the recognation of data free gaps |
x = c(1:3, 11:13) getclusterranges(x, 3)
x = c(1:3, 11:13) getclusterranges(x, 3)
Plot the Fit Results of 2-Component Gaussian Mixture Model
gmmplot(x, mu1, mu2, sigma, lambda, nbins = 15, xlim)
gmmplot(x, mu1, mu2, sigma, lambda, nbins = 15, xlim)
x |
a numeric vector |
mu1 |
the mean of the 1st cluster |
mu2 |
the mean of the 2nd cluster |
sigma |
the common variance of both clusters |
lambda |
the proportion parameter |
nbins |
the number of bins per cluster (6*sigma) |
xlim |
the limitation of x scale |
set.seed(0) x=list(c( rnorm(150, mean=0) , rnorm(50, mean=10) )) fit_res=multigmmsamedistribu(x) with( as.list(fit_res$par_conv) , gmmplot(x[[1]] , mu1=mu1 , mu2=mu2 , sigma=sigma , lambda=lambda , xlim=range(unlist(x)) ) )
set.seed(0) x=list(c( rnorm(150, mean=0) , rnorm(50, mean=10) )) fit_res=multigmmsamedistribu(x) with( as.list(fit_res$par_conv) , gmmplot(x[[1]] , mu1=mu1 , mu2=mu2 , sigma=sigma , lambda=lambda , xlim=range(unlist(x)) ) )
Due to the limitation of optimization that too many data would dramatically slow down the speed.
multigmmmanydata(x, grp_size = 3, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
multigmmmanydata(x, grp_size = 3, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
x |
a list of numeric vector |
grp_size |
the normal group size for each group |
lambda_lower |
the lower bound of |
lambda_upper |
the upper bound of |
sigma_lower |
the lower bound of |
debug |
enable the debug mode to show |
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x4=c(rnorm(150, mean=30), rnorm(50, mean=60)) x5=c(rnorm(150, mean=30), rnorm(50, mean=60)) x6=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3, x4, x5, x6) multigmmmanydata(x)
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x4=c(rnorm(150, mean=30), rnorm(50, mean=60)) x5=c(rnorm(150, mean=30), rnorm(50, mean=60)) x6=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3, x4, x5, x6) multigmmmanydata(x)
Plot the Fit Results of aggregate 2-Component Gaussian Mixture Model
multigmmplot(x, fit_res, nbins = 15)
multigmmplot(x, fit_res, nbins = 15)
x |
a list of a numeric vector |
fit_res |
the result of AGMM |
nbins |
the number of bins per cluster |
params=list( c(mu1=0, mu2=10, sd = 1) , c(mu1=10, mu2=20, sd = 1) ) set.seed(0) x=lapply( params , function(v) { c( rnorm(100, mean=v[['mu1']], sd = v[['sd']]) , rnorm(50, mean=v[['mu2']], sd = v[['sd']]) ) } ) multigmmplot(x, multigmmsamedistribu(x))
params=list( c(mu1=0, mu2=10, sd = 1) , c(mu1=10, mu2=20, sd = 1) ) set.seed(0) x=lapply( params , function(v) { c( rnorm(100, mean=v[['mu1']], sd = v[['sd']]) , rnorm(50, mean=v[['mu2']], sd = v[['sd']]) ) } ) multigmmplot(x, multigmmsamedistribu(x))
Fit Multi 2-Component Gaussian Mixture Model in same distribution with a Fixed Proportion
multigmmsamedistribu(x, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
multigmmsamedistribu(x, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
x |
a list of numeric vector |
lambda_lower |
the lower bound of |
lambda_upper |
the upper bound of |
sigma_lower |
the lower bound of |
debug |
enable the debug mode to show |
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3) multigmmsamedistribu(x)
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3) multigmmsamedistribu(x)
The sum of Log-Likelihoods of 1D Multi Same Distribution Gaussian Mixture Model
multigmmsamedistribulik(x)
multigmmsamedistribulik(x)
x |
a list of numeric vectors |
set.seed(0) x1=c( rnorm(100, mean=0) , rnorm(100, mean=1) ) x=list(x1) multigmmsamedistribulik(x)(c(0.5, 1, 0.5, 1))
set.seed(0) x1=c( rnorm(100, mean=0) , rnorm(100, mean=1) ) x=list(x1) multigmmsamedistribulik(x)(c(0.5, 1, 0.5, 1))
Due to the limitation of optimization that too many data would dramatically slow down the speed.
multigmmsamedistribumulti(x, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
multigmmsamedistribumulti(x, lambda_lower = 0.1, lambda_upper = 1 - lambda_lower, sigma_lower = 0.01, debug = F)
x |
a list of numeric vector |
lambda_lower |
the lower bound of |
lambda_upper |
the upper bound of |
sigma_lower |
the lower bound of |
debug |
enable the debug mode to show |
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x4=c(rnorm(150, mean=30), rnorm(50, mean=60)) x5=c(rnorm(150, mean=30), rnorm(50, mean=60)) x6=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3, x4, x5, x6) multigmmmanydata(x)
set.seed(0) x1=c(rnorm(150, mean=0), rnorm(50, mean=10)) x2=c(rnorm(150, mean=20), rnorm(50, mean=40)) x3=c(rnorm(150, mean=30), rnorm(50, mean=60)) x4=c(rnorm(150, mean=30), rnorm(50, mean=60)) x5=c(rnorm(150, mean=30), rnorm(50, mean=60)) x6=c(rnorm(150, mean=30), rnorm(50, mean=60)) x=list(x1, x2, x3, x4, x5, x6) multigmmmanydata(x)
Remove the Outliers in a Vector of 1D Coordinates
rmoutlier1d(x, dy_thr = dnorm(4), clustersize_thr = 3, gapsize = 10)
rmoutlier1d(x, dy_thr = dnorm(4), clustersize_thr = 3, gapsize = 10)
x |
a numeric vector |
dy_thr |
the threshold for dy |
clustersize_thr |
the threshold for cluster size |
gapsize |
the threshold of points in recognizing data free gap |
x=c(1,10:30,50) par(mfrow=c(2,1)) plot(density(x)) plot(density(rmoutlier1d(x)))
x=c(1,10:30,50) par(mfrow=c(2,1)) plot(density(x)) plot(density(rmoutlier1d(x)))
Split a list with size n into groups with at least m elements
splitgrp(n, m)
splitgrp(n, m)
n |
an integer indicating the total length |
m |
the min group size |
splitgrp(1, 2) splitgrp(2, 2) splitgrp(3, 2)
splitgrp(1, 2) splitgrp(2, 2) splitgrp(3, 2)