Roy Francis | 29-Dec-2020
Files are uploaded interactively in the Upload tab. Input format option Auto should generally work. The following file types are expected: STRUCTURE, BAPS, TESS, ADMIXTURE output files or fastSTRUCTURE meanQ files. If using files as inputs from other software or modified files, from spreadsheets etc, they must be all numeric tabular data without headers in tab-delimited, space-delimited or comma-separated format. Decimal must be defined by dot. Combined, aligned and merged CLUMPP file from R package pophelper
is also supported. If you think, a file format is incorrectly identified, you can manually set Input format. Mixed file formats are not supported.
On successful completion of upload, a summary table of uploaded files is shown. Fields shown are filename, file format and size of the file. File formats identified are STRUCTURE, BAPS, TESS, CLUMPP or BASIC files. Other text files or files of incorrect format are displayed as UNIDENTIFIED. ADMIXTURE and fastSTRUCTURE files are simple tables without headers differing in delimiter spaces. Any text file that is all numeric in a tabular format with space/tab/comma delimiter will be identified as BASIC format.
Fig. 1: A preview of the input files. STRUCTURE file (Left), TESS file (centre) and BASIC (ADMIXTURE) file (right). To download sample files, see downloads section.
The Data tab shows tabulated and summarised tables.
A tabulated table is displayed for a set of one or more identified run files. Tabulated data lists all runs sorted by loci, individuals and K for STRUCTURE runs. The table is sorted by individuals and K for other run formats. Tabulated data is not displayed if uploaded files consist of mixed formats. The tabulated data can be download using the download button the left side.
The summarised table is created from the tabulated table. For STRUCTURE runs, the summarised data table is sorted by loci, ind, K and runs followed by elpdmean, elpdsd, elpdmin and elpdmax. For other run formats, the table is sorted by ind, K and runs. Summarised data is not displayed if uploaded files consist of mixed formats. The summarised data can be download using the download button the left side.
The Evanno method is used to estimate the number of K. The method is based on Evanno et al., (2005) and only applicable to STRUCTURE runs. The Evanno method can be performed in the Evanno tab.
The Input tab shows a tabulated list of STRUCTURE files on the left side. By default, all runs are selected. A subset of runs can be used for the Evanno analysis by selecting (by clicking) the rows/runs in the input table. A K-Plot (Mean of Estimated Log-likelihood over K) for the selected run is shown on the right side.
The Output tab shows a table of the Evanno results showing various derivatives as well as a scatterplots based on the same table. The data is available to download as a text file.
The Evanno method is computed only if the selected runs fulfill the following criteria:
This is the minimum requirement. Evanno method benefits from having several repeats for each K across a wide range of K. The Evanno results table and plot can be downloaded using the download options on the left of the page.
Fig. 2: An Evanno method output plot.
Two types of plots are available: Standard Plot and Interactive Plot. For finer control and print, the Standard Plot may be better. For quick assessment of results using mouse hover tool tips and zoom control, use Interactive Plot.
The Plot options are common for Standard and Interactive plots.
A list of uploaded files are displayed. The default selection type is Datatable. With this, the selection order is maintained. In case the Datatable option does not work, the Picker option is available. In this case, the selection order is not maintained.
Runs to be plotted needs to be selected. The run is plotted as a barplot on the right side. If more than one run is selected, then runs are plotted one below the other as joined plots. Multiple plots are plotted as joined plots for standard plot and plotted separately for interactive plots.
A selection of colour palettes are available to colour the clusters on barplots. The default is the Rich colour palette. Selecting any colour palette generates input widgets with colours equal to the max number of clusters in the selected runs. The colours are selected based on the palette, but can be changed if required. Manually input colours must be in hexadecimal format or picked using the colour picker. Input must look like this: #2121D9
. Transparency is not supported. The number of colours available in each palette is shown in parenthesis ()
. If a number is not displayed, then there are no limits to the number of colours. Colour palettes are grouped into Function-based colours (FB) and Pre-defined colours (PD). PD and FB have slightly different properties.
FB colours are generated from a function. FB colours are not limited in numbers and can take any K value. New colours are added to the middle of the sequence as K increases. For example, Rich palette produces Blue and Red for K=2 and Blue, LightBlue and Red when K=3 and Blue, LightBlue, Yellow and Red when K=4 etc. Function-based colour schemes are shown below.
Fig. 3: Function-based colour schemes showing scheme name. There is no limit on the number of colours when using these schemes.
PD colours have pre-defined values as well as a pre-defined length of colours. PD colours are always selected in the same sequence of colours. For example, Standard palette produces Blue and Red when K=2 and Blue, Red and Green when K=3 etc. New colours are added to the end of sequence as K increases. Pre-defined colour schemes are shown below.
Fig. 4: Pre-defined colour schemes showing scheme name and length of colours.
An extra category of Pre Defined colours called Pre Defined ColorBrewer is available. This option makes available colour palettes from Colorbrewer. Note that Colourbrewer palettes have a limit on the number of colours.
Fig. 5: Pre-defined colour schemes from Colorbrewer showing palette name and length of colours.
When two or more runs are selected, a dropdown menu is available to align clusters or merge runs. Align clusters/Merge runs only works if the selected runs are of same run format and same number of individuals. For structure runs, the number of loci must be same too. Align within K align clusters between replicate runs of each K. Align across K first aligns clusters within K and then across K. The runs are reordered after alignment, but the number of runs and the order of individuals remain the same. Merge runs aligns runs across K and then merges replicate runs within each K. The number of runs will be lower after merging. If all runs belong to the same K, aligning across K is disabled and merging returns one run.
Order/Sort individuals by individual labels, one of the clusters or all clusters. Based on the selected run/runs, the sortable clusters are shown in the dropdown menu. For eg: if runs K=3 and K=4 are selected, sortable clusters would be Cluster1, Cluster2 and Cluster3, since Cluster4 is not present in one of the files. If one of the clusters is selected (Eg. Cluster1), all selected runs are sorted by the selected cluster. If All is selected, then run(s) are sorted using all the clusters using a method similar to Sort by Q option in the STRUCTURE software. If group labels (section Group labels) are in use, individuals are sorted within the groups.
Individuals are labelled numerically by default. By checking Use individual labels, individual labels can be read in from input STRUCTURE files, pasting into the input field or by uploading an individual labels file. Selecting Upload file allows to upload a single column text file with individual labels. This file must be tab-delimited or comma-separated. It must have no headers. Selecting Paste text allows for the labels to be pasted or manually typed in. It is possible to copy-paste from a spreadsheet (Excel) or text editor. The number of labels must be equal to the number of individuals in the run file. These individual labels are used when sorting individuals by labels. By checking Show individual labels, these labels are displayed under the plot (for standard and interactive plot). Checking Use individual labels also displays the Individual label options panel. More details about this panel is described under Individual label options.
When group labels (section Group labels) are used alongside individual labels, a checkbox option Concatenate ind & group labels becomes active. Checking this concatenates all group labels with individual labels to create a long individual label per sample. The separator between the concatenated labels can be specified.
Population labels can be displayed under the plots (for standard plots) and in the hover tooltips (for interactive plot). Selecting the Upload file, group labels can be uploaded as a text file (tab-delimited or a comma-separated) with one or more columns. The file MUST have a header. Each column is referred to as a group label set. The headers are referred to as group label titles. And the actual labels are referred to as group label text. The number of rows must be equal to the number of individuals in the run file. Download a sample file in Downloads (section Downloads).
Selecting Paste text allows for the labels to be pasted or manually typed in. In this case only one label set (one column) is allowed. One label per line. There should be no header. Just to be clear, if this one column group label is uploaded as a file, it must have a header.
All available group label titles are listed in the dropdown menu. The first group label title is automatically selected as the active set. This can be changed by selecting another group label title from the dropdown menu. The active group label set is used for grouping when sorting individuals (section Order individuals), subsetting (section Subset/Order group), ordering groups (section Order group labels) and when computing mean (section Compute mean over groups) over groups.
Based on the active group label set (section Active group label title), group labels from that set are available to subset or reorder. The default is None which means do not subset or reorder. None must be deleted for this widget to be active.
Groups can be subsetted by selecting one or more groups. Delete a selected pop using the Backspace key. Use the Left Right arrow keys to jump to labels. Position of groups can be changed by changing their order in the widget. For eg. An initial order of Grp A, Grp B can be changed to Grp B, Grp A.
Fig. 6: Sample of a single plot on left and a sample of joined plot on the right.
These are basic options that apply to the standard plot. The standard options are organised into the following sections: general options, side panel options, title options, subtitle options and legend options. Each section has a Reset panel button which resets all settings in that section to default settings.
Show Y axis when checked displays the Y-axis. This also displays additional widgets to control size and length of ticks. When two or more runs are selected, the Panel spacer option is used to adjust spacing between run panels on the plot.
Show side panel when checked displays a side panel on the right side of standard displaying the filename and K value. The default labels can be changed by providing comma separated labels. Other side panel options control side panel position, text size, text colour and background colour. Click on the Reset panel button to reset all the settings in the side panel options.
Checking Show plot title displays a title on top of the plot. The label can be changed as well as the title text size, title spacer (the space between the title and the region below), horizontal and vertical justification and the title colour.
Checking Show plot subtitle displays a subtitle on top of the plot. The label can be changed as well as the subtitle text size, subtitle spacer (the space between the subtitle and the region below), horizontal and vertical justification and the subtitle colour.
Checking Show cluster legend displays a legend showing cluster colours. The legend labels default to cluster names, but this can be changed by providing a comma separated label list. The legend text size, key size and spacing can be adjusted.
Checking Use individual labels (section Individual labels) displays the Individual label options panel. The Common individual labels option when checked displays individual labels only under the lowermost plot (for standard plot and in case multiple plots are displayed). Unchecking this option displays individual labels under each plot panel. When sorting individuals by cluster (section 6.1.3), Common individual labels option must be unchecked.
The individual labels panel height can be adjusted along with the spacer (space between the individual labels and the region above). The label size, colour, horizontal and vertical justification and angle can be adjusted.
Checking Use group labels (section Group labels) displays the Group label options panel. This panel is organised into the following sections: general options, text options, point options, line options and divider options.
Fig. 7: A typical joined plot with group labels. Various parts of the figure and options to customise.
The group labels panel height can be adjusted along with the spacer (space between the group labels and the region above). The panel ratio takes two numbers separated by a comma denoting ratio of plot area to group-label area like 3,1
. The Label marker colour is the colour applied to all label marker elements such as lines and points.
The text options control the text y-position, text colour, size, angle and horizontal justification. Label points separate the groups. The point size (ex. 1) and point type can be adjusted. The default point type is |
. This can be changed to any character. If a number is provided, then a standard R plotting symbol (pch) is used instead. Point options control point size and point type.
Fig. 8: Number for plotting characters.
The horizontal line seen with the labels is the label line. Line options control line y-position, line thickness and line type. The line type can be a single digit or double digit number. A single digit denotes one of the line types in the figure below. A double digit denotes the length of mark and length of the space. For example, 22 means a line with a mark of distance 2 followed by a space of distance 2.
Fig. 9: A few of the line types.
Divider options control the vertical lines separating groups. The divider lines are drawn between groups of the active group label set (section Active group label title). This can be changed to one or more group label sets by selecting the group label title. Use arrow keys to move cursor left/right. Use backspace to remove a label. Other options allow to control the thickness of the div line and the line type.
The image preview scale slider allows to adjust the display size of the preview plot in the browser. The scale does not affect the download options.
Empty fields for height and width will generate default values (recommended). Note: The height is height of one run panel and not the height of the entire figure. If 2 runs are selected, a height of 2 cm means each run will be 2 cm and the final figure will be around 4 cm high. The default image resolution is 200dpi. When specifying height and width, note that it is in cm. Changing the file type does not change the preview but it changes the downloaded file type.
The interactive plot makes it easier to explore the data. It allows zoom control, drag/slide and hover tool tips to identify samples. Note that interactive plots are computationally more intensive to display and change especially for datasets with large number of individuals.
Hovering the cursor over the plot shows the current position (Cur Pos), original position (Orig Pos), individual ID, cluster (K) and the y-axis value (assignment probability) in the tool tip. If group labels are in use, group label from each group label set is also displayed for that individual. The original position may be useful when sorting the individuals. The original position does not work with Align/Merge runs (within K) option. When individual labels are in use, individual labels is shown as individual ID.
Click and drag on the plot (horizontally) to zoom in. Click Reset to reset zoom. When zoomed in, press Shift and drag (horizontally) to slide the view. Clusters can be turned on or off by clicking on the legend icons (top right). When multiple runs are selected, plots are created one below the other in the order in which the files are selected. Click on the top left corner and choose an option to download the plot.
Samples files for download are available here.
Nothing visible in Evanno tab: Evanno option is available only for structure runs. Likely issue is that uploaded files are not STRUCTURE runs or consists of mixed formats.
When uploading lots of files to the web app, the upload bar tends to be frozen: The bar sometimes appear frozen, but the upload is in progress. The summary table should be visible after a few minutes. Better to zip all the files into one file and upload. This issue is only applicable to the online version of the app.
A popup error with ‘The application unexpectedly exited.’: This error is reported when the app has crashed. This can happen due to a whole range of unexpected issues. The solution is to refresh the browser and continue as normal. Else, close tab and open webpage in new tab. If this happens consistently, please report to me with explanation on what you were doing when it happened.
After working with some files, reuploading new files in the same session often produces a pop-up error like ‘The application unexpectedly exited’: The solution is to refresh the browser or reload the page.
Under standard plot options, the sub sections can be collapsed by clicking on the section title. For example > Standard options, > General options etc. This might be useful to quickly reduce clutter. But note that variables in the hidden widgets are not accessible to the plotting function and can lead to strange errors.
Downloads do not work: When running locally, open app in a system browser rather than the RStudio browser. In the RStudio browser, click on Open In Browser. Downloads only work in a system browser.
To exit the running app locally, press the Esc key in the R console.
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/local/R/4.0.0/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 captioner_2.2.3 knitr_1.30
##
## loaded via a namespace (and not attached):
## [1] viridisLite_0.3.0 rprojroot_2.0.2 digest_0.6.27 crayon_1.3.4
## [5] assertthat_0.2.1 R6_2.5.0 magrittr_2.0.1 evaluate_0.14
## [9] stringi_1.5.3 rlang_0.4.9 rstudioapi_0.13 fs_1.5.0
## [13] ragg_0.4.0 rmarkdown_2.6 pkgdown_1.6.1 textshaping_0.2.1
## [17] desc_1.2.0 tools_4.0.0 stringr_1.4.0 yaml_2.2.1
## [21] xfun_0.19 compiler_4.0.0 systemfonts_0.3.2 memoise_1.1.0
## [25] htmltools_0.5.0