Confidence Intervals for Species Delimitations Methods
Source:R/confidence_intervals.R
confidence_intervals.Rd
These functions compute confidence intervals for various species delimitation methods, including GMYC, bGMYC, Local Minima, and mPTP.
Usage
gmyc_ci(tr, posterior, method = "single", interval = c(0, 5))
bgmyc_ci(
tr,
posterior,
ppcutoff = 0.05,
mcmc,
burnin,
thinning,
py1 = 0,
py2 = 2,
pc1 = 0,
pc2 = 2,
t1 = 2,
t2 = 51,
scale = c(20, 10, 5),
start = c(1, 0.5, 50)
)
locmin_ci(dna, block = 1, reps = 100, threshold = 0.01, haps = NULL, ...)
mptp_ci(
infile,
bootstraps,
exe = NULL,
outfolder = NULL,
method = c("multi", "single"),
minbrlen = 1e-04,
webserver = NULL
)
Arguments
- tr
An ultrametric, dichotomous tree object in ape format.
- posterior
Trees from posterior. An object of class multiphylo.
- method
Method of analysis, either "single" for single-threshold version or "multiple" for multiple-threshold version.
- interval
Upper and lower limit of estimation of scaling parameters, e.g. c(0,10)
- ppcutoff
Posterior probability threshold for clustering samples into species partitions. See bgmyc.point for details. Default to 0.05.
- mcmc
number of samples to take from the Markov Chain
- burnin
the number of samples to discard as burn-in
- thinning
the interval at which samples are retained from the Markov Chain
- py1
governs the prior on the Yule (speciation) rate change parameter. using the default prior distribution, this is the lower bound of a uniform distribution. this can be the most influential prior of the three. rate change is parameterized as n^py where n is the number of lineages in a waiting interval (see Pons et al. 2006). if there are 50 sequences in an analysis and the Yule rate change parameter is 2, this allows for a potential 50-fold increase in speciation rate. this unrealistic parameter value can cause the threshold between Yule and Coalescent process to be difficult to distinguish. are more reasonable upper bound for the prior would probably be less than 1.5 (a potential 7-fold increase). Or you could modify the prior function to use a different distribution entirely.
- py2
governs the prior on the Yule rate change parameter. using the default prior distribution, this is the upper bound of a uniform distribution.
- pc1
governs the prior on the coalescent rate change parameter. using the default prior distribution, this is the lower bound of a uniform distribution. rate change is parameterized as (n(n-1))^pc where n is the number of lineages in a waiting interval (see Pons et al. 2006). In principle pc can be interpreted as change in effective population size (pc<1 decline, pc>1 growth) but because identical haplotypes must be excluded from this analysis an accurate biological interpretation is not possible.
- pc2
governs the prior on the coalescent rate change parameter. using the default prior distribution, this is the upper bound of a uniform distribution.
- t1
governs the prior on the threshold parameter. the lower bound of a uniform distribution. the bounds of this uniform distribution should not be below 1 or greater than the number of unique haplotypes in the analysis.
- t2
governs the prior on the threshold parameter. the upper bound of a uniform distribution
- scale
a vector of scale parameters governing the proposal distributions for the markov chain. the first to are the Yule and coalescent rate change parameters. increasing them makes the proposals more conservative. the third is the threshold parameter. increasing it makes the proposals more liberal.
- start
a vector of starting parameters in the same order as the scale parameters, py, pc, t. t may need to be set so that it is not impossible given the dataset.
- dna
an object of class DNAbin.
- block
integer. Number of columns to be resampled together. Default to 1.
- reps
Number of bootstrap replicates. Default to 100.
- threshold
Distance cutoff for clustering. Default of 0.01. See localMinima for details.
- haps
Optional. A vector of haplotypes to keep into the tbl_df.
- ...
Further arguments to be passed to dist.dna.
- infile
Path to tree file in Newick format. Should be dichotomous and rooted.
- bootstraps
Bootstrap trees. An object of class multiphylo.
- exe
Path to an mPTP executable.
- outfolder
Path to output folder. Default to NULL. If not specified, a temporary location is used.
- minbrlen
Numeric. Branch lengths smaller or equal to the value provided are ignored from computations. Default to 0.0001. Use min_brlenfor fine tuning.
- webserver
A .txt file containing mPTP results obtained from a webserver. Default to NULL.
Value
A vector containing the number of species partitions in tr
, dna
or infile
followed by
the number of partitions in posterior
, reps
or bootstraps
.
Details
Both gmyc_ci
and bgmyc_ci
can take a very long time to proccess, depending on how many
posterior trees are provided. As an alternative, these analyses can be sped up significantly
by running in parallel using plan.
Examples
# \donttest{
# gmyc confidence intervals
# compute values using multisession mode
{
future::plan("multisession")
gmyc_res <- gmyc_ci(ape::as.phylo(geophagus_beast), geophagus_posterior)
# reset future parameters
future::plan("sequential")
}
# plot distribution
plot(density(gmyc_res))
# tabulate
tibble::tibble(
method = "gmyc",
point_estimate = gmyc_res[1],
CI_95 = as.integer(quantile(gmyc_res[-1], probs = c(0.025, 0.975))) |>
stringr::str_flatten(collapse = "-"),
CI_mean = as.integer(mean(gmyc_res[-1])),
CI_median = as.integer(stats::median(gmyc_res[-1]))
)
#> # A tibble: 1 × 5
#> method point_estimate CI_95 CI_mean CI_median
#> <chr> <int> <chr> <int> <int>
#> 1 gmyc 21 3-43 25 23
# }