1 Star 1 Fork 0

王诗翔 / sigminer-doc

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
03-quantify-sample-exposure.Rmd 5.37 KB
一键复制 编辑 原始数据 按行查看 历史
王诗翔 提交于 2020-10-08 18:04 . Update doc site
# Signature Fit: Sample Signature Exposure Quantification and Analysis {#sigfit} Besides *de novo* signature discovery shown in previous chapters, another common task is that you have gotten some reference signatures (either from known database like COSMIC or *de novo* discovery step), you want to know how these signatures contribute (fit) in a sample. That's the target of `sig_fit()`. `sig_fit()` uses multiple methods to compute exposure of pre-defined signatures from the spectrum of a (can be more) sample. Use `?sig_fit` see more detail. To show how this function works, we use a sample with maximum mutation counts as example data. ```{r} i % t() ``` ```{r} head(example_mat) ``` ## Fit Signatures from reference databases For SBS signatures, users may want to directly use reference signatures from COSMIC database. ```{r} sig_fit(example_mat, sig_index = 1:30) ``` > At default, COSMIC v2 signature database with 30 reference signatures is used (i.e. `sig_db = "legacy"`). Set `sig_db = "SBS"` for COSMIC v3 signature database. That's it! You can set `type = "relative"` for getting relative exposure. ```{r} sig_fit(example_mat, sig_index = 1:30, type = "relative") ``` For multiple samples, you can return a `data.table`, it can be easier to integrate with other information in R. ```{r} sig_fit(t(mt_tally$nmf_matrix[1:5, ]), sig_index = 1:30, return_class = "data.table", rel_threshold = 0.05) ``` When you set multiple signatures, we recommend setting `rel_threshold` option, which will set exposure of a signature to `0` if its relative exposure in a sample less than the `rel_threshold`. ## Fit Custom Signatures We have already determined the SBS signatures before. Here we can set them to `sig` option. ```{r} sig_fit(example_mat, sig = mt_sig2) ``` ## Performance Comparison Now that we can use `sig_fit` for getting optimal exposures, we can compare the RSS between **raw matrix** and the **reconstructed matrix** either by NMF and `sig_fit()`. i.e. $$ RSS = \sum(\hat H - H)^2 $$ ```{r} ## Exposure got from NMF sum((apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% mt_sig2$Exposure - t(mt_tally$nmf_matrix))^2) ``` ```{r} ## Exposure optimized by sig_fit H_estimate = apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% sig_fit(t(mt_tally$nmf_matrix), sig = mt_sig2) H_estimate = apply(H_estimate, 2, function(x) ifelse(is.nan(x), 0, x)) H_real = t(mt_tally$nmf_matrix) sum((H_estimate - H_real)^2) ``` ## Estimate Exposure Stability by Bootstrap This feature is based on `sig_fit()`, it uses the resampling data of original input and runs `sig_fit()` multiple times to estimate the exposure. Bootstrap replicates >= 100 is recommended, here I just use 10 times for illustration. ```{r} bt_result Note, set `sig_db` to 'legacy' (the default) in `sig_fit()` family functions. ```{r} DT::datatable(db2[, c(1:3, 4)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3") ``` ```{r} DT::datatable(db2[, c(1:3, 5)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3") ``` ```{r} DT::datatable(db2[, c(1:3, 6)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3") ```
