
Bioinformatic contribution for analysis of Pacific Ocean Metagenome

Bioinformatic contribution for analysis of Pacific Ocean Metagenome.
This reposiory is for metagenomic analysis of pacific ocean data.

To use this tutorial:


Metagenomic sequencing reads from oceans were taken from:

Biller, S., Berube, P., Dooley, K., et al. Marine microbial
metagenomes sampled across space and time. Sci Data 5, 180176(2018).

Marine sample collected by GEOTRACES crusie ship in the Pacific Ocean (32.49567 S 164.99233 W)



Samples Deep Zone Material Collection date Latitud Longitude
SRR5788417 50m Epipelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233
SRR5788416 75m Epipelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233
SRR5788415 100m Epipelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233
SRR5788422 204m Mesopelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233
SRR5788421 1023m Abyssopelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233
SRR5788420 5100m Abyssopelagic Water 2011-06-13 T22:40:00 -32.49567 -164.99233

Downloading the data

The accession numbers to download were chosen according to …. FIXME …. and are listed in scripts_download/SRA_Acc_List.txt.

For this sections the used software is:

  1. sratoolkit version 2.9.6
  2. fasterq-dump version 2.11.3

To download all the accessions inside the server Mazorka first you need to obtain the paths of the listed accessions with the script scripts_download/make_paths.sh and run it in the server with the command :


qsub make_paths.sh

This step should be very quick.

When you have your path_file.txt file you are ready to download the SRA files using the script scripts_download/download_ocean_sra.sh and run it with:

qsub download_ocean_sra.sh

This step will take several hours. It is recommended that you leave it overnight.

Unfortunately, Mazorka is not able to convert the downloaded files into the .fastq.gz files. So we need to move the downloaded files to the local computer and from the local computer to the server Betterlab. For this, run:

scp username@mazorka.langebio.cinvestav.mx:/LUSTRE/usuario/username/SRR* .

This step may take a few hours. Make sure you have a stable internet conection.

When this is done now move the files to Betterlab:

scp SRR* betterlab@

This step may take a few hours. Make sure you have a stable internet connection.

Enter raw_data/. Now you can convert the SRA files into .fastq.gz files with:


ls | while read line; do fasterq-dump $line -S -p -e12; gzip $line_*.fastq; done

Once you have the .fastq.gz yo can transfer them again to Mazorka to process them there.

Raw-reads processing

This guide is for processing the raw-reads of shutgun metagenomic libraries of Pacific Ocean. Before starting is important installing FastQC and Trimmomatic for quality control analysis and read filtering in your computer (in Mazorka these programs are already installed):

  1. FastQC
  2. Trimmomatic

Flow chart for processing raw-reads in each library

It’s important to use a high-performance computing cluster.

  1. Quality control analysis of raw-reads to know the quality of each library using FastQC: :corn:
    qsub fastqc_po.sh
  1. Filter of reads and clipping Nextera Transposase adapters (NexTranspSeq-PE.fa) using Trimmomatic: :corn:
    qsub trimming_pocean.sh
  1. Evaluation of filtered reads in each library to know the quality and number of filtered reads: :corn:
    qsub fastqc_po.sh

Overrepresentation analysis

Here you can see the R script made for the overrepresentation analysis.

Co-ocurrence analysis

Here you can see the R script made for the co-ocurrence analysis.

NetCoMi analysis

Here you can see the R script made for the NetCoMi analysis.