Generate CLUMPP output from a qlist

Takes a qlist and combines several repeats for each K into a single file along with a parameter file suitable for input to CLUMPP. The two output files are organised into folders by K. This function only creates the input files for CLUMPP. It does not run CLUMPP.

clumppExport(
  qlist = NULL,
  prefix = NA,
  parammode = NA,
  paramrep = NA,
  exportpath = NULL,
  path = NULL
)

Arguments

qlist	A qlist (list of dataframes). An output from `readQ`.
prefix	A character prefix for folder names. By default, set to 'pop'.
parammode	A numeric 1, 2 or 3 indicating the algorithm option for CLUMPP paramfile. Calculated automatically by default. Set this value to 3 if CLUMPP runs too long. See details.
paramrep	A numeric indicating the number of repeats for CLUMPP paramfile. Calculated automatically by default. See details.
exportpath	The path to export output files. To use current working directory, set `exportpath=getwd()`.
path	Deprecated. Use exportpath.

Value

The combined file and paramfile are written into respective folders named by K.

Details

This function only generates the files needed to run CLUMPP. The CLUMPP executable can be downloaded and run for downstream steps. It can be obtained from https://web.stanford.edu/group/rosenberglab/clumpp.html. Please remember to cite CLUMPP if you use it.

When multiple repeats are run for each K in runs, the order of clusters may be jumbled for each run. Therefore, when plotting multiple runs within each K, the colours cannot be assigned correctly. The software CLUMPP helps to overcome this issue by reordering the clusters correctly. This function clumppExport() takes multiple runs for each K and combines them into a single file and generates a parameter file for easy use with CLUMPP. Further details for CLUMPP can be found here: Jakobsson, M., and Rosenberg, N. A. (2007). CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23(14), 1801-1806.

parammode
The parammode (M) is the type of algorithm used. Option 1 is 'FullSearch' (takes the longest time), option 2 is 'Greedy' and option 3 is 'LargeKGreedy' (fastest). If clumpp takes more than a few minutes, consider changing parammode to a higher number (ex. from 2 to 3), or open the exported paramfile and manually change GREEDY_OPTION to 3.

The parammode and paramrep for CLUMPP paramfile is set based on this calculation. X <- factorial(k)*((runs*(runs-1))/2)*k*ind, where k is number of populations, runs is number of runs for k and ind is number of individuals. If X <= 100000000, then parammode is 2 and paramrep is 20, otherwise parammode is 3 and paramrep is set to 500.

To find out more about parammode (algorithm type) and paramrep (repeats), refer to CLUMPP documentation.

See the vignette for more details.

Examples


if (FALSE) {

# generate input files for CLUMPP from STRUCTURE files
sfiles <- list.files(path=system.file("files/structure",package="pophelper"),
full.names=TRUE)
clumppExport(readQ(sfiles),exportpath=getwd())

# generate input files for CLUMPP from ADMIXTURE files
afiles <- list.files(path=system.file("files/admixture",package="pophelper"),
full.names=TRUE)
clumppExport(readQ(afiles),exportpath=getwd())

}

Arguments

Value

Details

See also

Examples