# Signature Fit: Sample Signature Exposure Quantification and Analysis {#sigfit}
Besides *de novo* signature discovery shown in previous chapters, another common task is that
you have gotten some reference signatures (either from known database like COSMIC or *de novo* discovery step), you want to know how these signatures contribute (fit) in a sample. That's the target of `sig_fit()`.
`sig_fit()` uses multiple methods to compute exposure of pre-defined signatures from the spectrum of a (can be more) sample. Use `?sig_fit` see more detail.
To show how this function works, we use a sample with maximum mutation counts as example data.
```{r}
i % t()
```
```{r}
head(example_mat)
```
## Fit Signatures from reference databases
For SBS signatures, users may want to directly use reference signatures from COSMIC database.
```{r}
sig_fit(example_mat, sig_index = 1:30)
```
> At default, COSMIC v2 signature database with 30 reference signatures is used (i.e. `sig_db = "legacy"`). Set `sig_db = "SBS"` for COSMIC v3 signature database.
That's it!
You can set `type = "relative"` for getting relative exposure.
```{r}
sig_fit(example_mat, sig_index = 1:30, type = "relative")
```
For multiple samples, you can return a `data.table`, it can be easier to integrate with other information in R.
```{r}
sig_fit(t(mt_tally$nmf_matrix[1:5, ]), sig_index = 1:30, return_class = "data.table", rel_threshold = 0.05)
```
When you set multiple signatures, we recommend setting `rel_threshold` option, which will set exposure of a signature to `0` if its relative exposure in a sample less than the `rel_threshold`.
## Fit Custom Signatures
We have already determined the SBS signatures before. Here we can set them to `sig` option.
```{r}
sig_fit(example_mat, sig = mt_sig2)
```
## Performance Comparison
Now that we can use `sig_fit` for getting optimal exposures, we can compare the RSS between **raw matrix** and the **reconstructed matrix** either by NMF and `sig_fit()`.
i.e.
$$
RSS = \sum(\hat H - H)^2
$$
```{r}
## Exposure got from NMF
sum((apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% mt_sig2$Exposure - t(mt_tally$nmf_matrix))^2)
```
```{r}
## Exposure optimized by sig_fit
H_estimate = apply(mt_sig2$Signature, 2, function(x) x/sum(x)) %*% sig_fit(t(mt_tally$nmf_matrix), sig = mt_sig2)
H_estimate = apply(H_estimate, 2, function(x) ifelse(is.nan(x), 0, x))
H_real = t(mt_tally$nmf_matrix)
sum((H_estimate - H_real)^2)
```
## Estimate Exposure Stability by Bootstrap
This feature is based on `sig_fit()`, it uses the resampling data of original input and runs `sig_fit()` multiple times to estimate the exposure. Bootstrap replicates >= 100 is recommended, here I just use 10 times for illustration.
```{r}
bt_result Note, set `sig_db` to 'legacy' (the default) in `sig_fit()` family functions.
```{r}
DT::datatable(db2[, c(1:3, 4)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```
```{r}
DT::datatable(db2[, c(1:3, 5)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```
```{r}
DT::datatable(db2[, c(1:3, 6)], caption = "Data source: Alexandrov et al. https://www.nature.com/articles/s41586-020-1943-3")
```