Confidence Intervals for Species Delimitations Methods — confidence

These functions compute confidence intervals for various species delimitation methods, including GMYC, bGMYC, Local Minima, and mPTP.

Usage

gmyc_ci(tr, posterior, method = "single", interval = c(0, 5))

bgmyc_ci(
  tr,
  posterior,
  ppcutoff = 0.05,
  mcmc,
  burnin,
  thinning,
  py1 = 0,
  py2 = 2,
  pc1 = 0,
  pc2 = 2,
  t1 = 2,
  t2 = 51,
  scale = c(20, 10, 5),
  start = c(1, 0.5, 50)
)

locmin_ci(dna, block = 1, reps = 100, threshold = 0.01, haps = NULL, ...)

mptp_ci(
  infile,
  bootstraps,
  exe = NULL,
  outfolder = NULL,
  method = c("multi", "single"),
  minbrlen = 1e-04,
  webserver = NULL
)

Arguments

tr: An ultrametric, dichotomous tree object in ape format.
posterior: Trees from posterior. An object of class multiphylo.
method: Method of analysis, either "single" for single-threshold version or "multiple" for multiple-threshold version.
interval: Upper and lower limit of estimation of scaling parameters, e.g. c(0,10)
ppcutoff: Posterior probability threshold for clustering samples into species partitions. See bgmyc.point for details. Default to 0.05.
mcmc: number of samples to take from the Markov Chain
burnin: the number of samples to discard as burn-in
thinning: the interval at which samples are retained from the Markov Chain
py1: governs the prior on the Yule (speciation) rate change parameter. using the default prior distribution, this is the lower bound of a uniform distribution. this can be the most influential prior of the three. rate change is parameterized as n^py where n is the number of lineages in a waiting interval (see Pons et al. 2006). if there are 50 sequences in an analysis and the Yule rate change parameter is 2, this allows for a potential 50-fold increase in speciation rate. this unrealistic parameter value can cause the threshold between Yule and Coalescent process to be difficult to distinguish. are more reasonable upper bound for the prior would probably be less than 1.5 (a potential 7-fold increase). Or you could modify the prior function to use a different distribution entirely.
py2: governs the prior on the Yule rate change parameter. using the default prior distribution, this is the upper bound of a uniform distribution.
pc1: governs the prior on the coalescent rate change parameter. using the default prior distribution, this is the lower bound of a uniform distribution. rate change is parameterized as (n(n-1))^pc where n is the number of lineages in a waiting interval (see Pons et al. 2006). In principle pc can be interpreted as change in effective population size (pc<1 decline, pc>1 growth) but because identical haplotypes must be excluded from this analysis an accurate biological interpretation is not possible.
pc2: governs the prior on the coalescent rate change parameter. using the default prior distribution, this is the upper bound of a uniform distribution.
t1: governs the prior on the threshold parameter. the lower bound of a uniform distribution. the bounds of this uniform distribution should not be below 1 or greater than the number of unique haplotypes in the analysis.
t2: governs the prior on the threshold parameter. the upper bound of a uniform distribution
scale: a vector of scale parameters governing the proposal distributions for the markov chain. the first to are the Yule and coalescent rate change parameters. increasing them makes the proposals more conservative. the third is the threshold parameter. increasing it makes the proposals more liberal.
start: a vector of starting parameters in the same order as the scale parameters, py, pc, t. t may need to be set so that it is not impossible given the dataset.
dna: an object of class DNAbin.
block: integer. Number of columns to be resampled together. Default to 1.
reps: Number of bootstrap replicates. Default to 100.
threshold: Distance cutoff for clustering. Default of 0.01. See localMinima for details.
haps: Optional. A vector of haplotypes to keep into the tbl_df.
...: Further arguments to be passed to dist.dna.
infile: Path to tree file in Newick format. Should be dichotomous and rooted.
bootstraps: Bootstrap trees. An object of class multiphylo.
exe: Path to an mPTP executable.
outfolder: Path to output folder. Default to NULL. If not specified, a temporary location is used.
minbrlen: Numeric. Branch lengths smaller or equal to the value provided are ignored from computations. Default to 0.0001. Use min_brlenfor fine tuning.
webserver: A .txt file containing mPTP results obtained from a webserver. Default to NULL.

Value

A vector containing the number of species partitions in tr, dna or infile followed by the number of partitions in posterior, reps or bootstraps.

Details

Both gmyc_ci and bgmyc_ci can take a very long time to proccess, depending on how many posterior trees are provided. As an alternative, these analyses can be sped up significantly by running in parallel using plan.

Author

Pedro S. Bittencourt, Rupert A. Collins.

Examples

# \donttest{

# gmyc confidence intervals

# compute values using multisession mode
{
  future::plan("multisession")

  gmyc_res <- gmyc_ci(ape::as.phylo(geophagus_beast), geophagus_posterior)

  # reset future parameters
  future::plan("sequential")
}

# plot distribution
plot(density(gmyc_res))


# tabulate
tibble::tibble(
  method = "gmyc",
  point_estimate = gmyc_res[1],
  CI_95 = as.integer(quantile(gmyc_res[-1], probs = c(0.025, 0.975))) |>
    stringr::str_flatten(collapse = "-"),
  CI_mean = as.integer(mean(gmyc_res[-1])),
  CI_median = as.integer(stats::median(gmyc_res[-1]))
)
#> # A tibble: 1 × 5
#>   method point_estimate CI_95 CI_mean CI_median
#>   <chr>           <int> <chr>   <int>     <int>
#> 1 gmyc               21 3-43       25        23
# }