Estimate ancestry coefficents for empirical sites

This function tests how parameter combinations (num_tested & popmod) influence the ability of the popmaps algorithm to estimate known values of ancestry coefficients using a leave-one-out approach.

Usage

jackknife(
  input_raster = "",
  input_locs = "",
  surface = "G",
  empirical_pt_dist = 5,
  num_sites = 10,
  num_tested_vec = c(2, 3, 4, 5, 6, 7, 8),
  popmod_vec = c(-0.001, -0.01, -0.05, -0.1, -0.15),
  dist_prob_func = function(popmod_temp, distance) {
     exp(popmod_temp * distance)
 }
)

Arguments

input_raster: An R RasterLayer object defining the geographic extent for the spatial interpolation. Values in the cells will be used to calculate distance used in the dist_prob_func if surface = ‘C’. See example data hija_raster.
input_locs: An R object (rows = total # empirical sites, columns = total # genetic axes + 3) with column 1: site name; column 2: decimal longitude; column 3: decimal latitude; column 4…column x: ancestry coefficients for genetic axis 1…genetic axis x. Function depends on this precise format – see example data hija_struc.
surface: A string (either ‘G’ or ‘C’) that determines how input_raster is used to calculate ancestry coefficients. If 'G', the spatial attributes of input_raster will be used to calculate geographic (i.e., Euclidean) distances between empirical sites and inference sites. If ‘C’, least-cost distances are calculated using the values contained in the raster cells.
empirical_pt_dist: An integer representing the minimum distance that all sites in num_tested must be separated by. If num_tested = 3, the closest two empirical sites to the cell are determined first. If the two sites are closer than this value, the closest will be kept and the second discarded. The next closest site will then be selected and compared to the first; this process repeats until the two sites are farther than empirical_pt_dist. Subsequently, the third empirical site will be selected using the same process. This rarefaction procedure may reduce high spatial autocorrelation expected in genetic patterns among proximate empirical sites that may mislead estimations of ancestry coefficients. The parameter should be informed by the distribution of empirical sites and the biology of the focal species.
num_sites: An integer representing the pool of empirical sites considered when selecting num_tested sites to estimate ancestry coefficients. If empirical sites are highly clustered and rarefaction due to empirical_pt_dist causes many to be discarded, this variable will likely need to be increased at the cost of computing time.
num_tested_vec: A vector of integers representing the values of num_tested that should be tested during jackknifing. All pairwise combinations with popmod_vec will be tested.
popmod_vec: A vector of floats representing the values of popmod that should be tested during jackknifing. All pairwise combinations with num_tested_vec will be tested.
dist_prob_func: A function defining the relationship between distance and the contribution of an empirical site’s ancestry coefficients to the estimation of ancestry coefficients at an inference cell. The default equation defines the relationship in Fig. 2 of Massatti & Winkler (2022).

References

Massatti R & Winkler DE. (2022) Spatially explicit management of genetic diversity using ancestry probability surfaces. Methods in Ecology and Evolution. http://dx.doi.org/10.1111/2041-210X.13902

Author

Rob Massatti

Examples

if (FALSE) { # \dontrun{
    ex_raster <- raster::aggregate(hija_raster,fact=16)  #Cells in embedded raster are aggregated to reduce computation time
    jack_data <- jackknife(input_raster=ex_raster,input_locs=hija_struc,surface="G")

} # }