F.A.I.R. and reproducible analysis for the paper: "Exploring the data that explores the oceans: working towards robust eDNA workflows for ocean wildlife monitoring (submitted)"
by Jessica R. Pearce 1, Philipp E. Bayer 1,2, Adam Bennett 1, Eric J. Raes 1,2, Marcelle E. Ayad 1, Shannon Corrigan 1,2, Matthew W. Fraser 1,2, Denise Anderson 3, Priscila Goncalves 1,2, Benjamin Callahan 4, Michael Bunce 5, Stephen Burnell 1,2, Sebastian Rauschert 1,2,*
1 Minderoo Foundation, Perth 6000, WA
2 The UWA Oceans Institute, The University of Western Australia, Crawley 6009, WA
3 INSiGENe Pty Ltd.
4 North Carolina State University, Raleigh, 27606, USA
5 Department of Conservation, Wellington, New Zealand
*Corresponding author
Start here to immediately re-analyse the data
Launch analysis:
This repository contains all data and code to generate the figures and statistics in the paper. Simply click on the above binder
button to launch a Rstudio session in the browser, with access to all code and data in this GitHub repository. There, the code can interactively be changed and different plots and statistics can be (re-)created.
For an overview of what binder is, please check out this link.
This repository contains the phyloseq
objects for all three data sets analysed in the paper. The objects were generated with Minderoo OceanOmics amplicon nextflow pipeline. Below is a detailed description, including code, to recreate the phyloseq
objects.
The three data sets can be found here:
- West et al. 2021: North West Western Australia
- Minderoo OceanOmics: Cocos Keeling Island transect
- Minderoo OceanOmics: Rowley Shoals Islands
Additionally, this repository includes a list of Australian marine fish species, named Aust_fish_species_list.csv. This was manually curated by domain experts, with data drawn from Atlas of Living Australia and the Global Biodiversity Information Facility.
The files in metadata were generated as part of the data collection and sequencing, and are downloaded is part of the downloading the data description.
Lastly, The read_qc folder contains QC output from the seqkit
part of the nextflow pipeline and contains read QC statistics.
Everything in this section is optional and not required for re-analyzing the results of the paper. It is documented for full transparency and reproducibility of all results, should anyone desire to want to do so. Information on setting up the compute environment, downloading the data and creating the phyloseq
object can be found in the docs
folder or via the clickable links.