P. Universidad Católica de Chile

Facultad de Ciencias Biológicas
Departamento de Genética Molecular y Microbiología

Plant Systems Biology Lab

Genomic Footprinting Analyses from DNase-seq data to Construct Gene Regulatory Networks


The pipeline described in Moyano et al 2020 should be used primarily as a learning tool and starting point from which carefully evaluate and decide how to implement a pipeline in a research project. The user can download the text files containing all commands lines used in this chapter in this link script.txt. Many of the parts used in this pipeline are extracted and modified from https://slowkow.github.io/CENTIPEDE.tutorial.



Requirements:



Personal computer or server with access to the internet. Computer requirements vary depending on the amount of data to be analyzed. In this guide, we use 64 GB RAM, 2 hexa-core processor (24 threads) and 1 TB of free disk space.


The pipeline was built in unix based system. The example was performed on Ubuntu 18.04 distribution. It can be downloaded from: Ubuntu web site


To run the pipeline you must download and install the following tools:


Conda https://anaconda.org/anaconda/conda) with the bioconda channel https://bioconda.github.io) .


JAVA openjdk version "1.8.0_112".


R version 3.6.1.


For network visualization we recommend to use Cytoscape: http://www.cytoscape.org.



The files needed to run the pipeline are as follows.:



TAIR10 genome file without spaces in the name in fasta format T10.fa


WARNING! If you are using another genome, be careful with characters that are not letters. We recommend for example to replace "." by "_" in chromosome names.


DNAse-seq data in FASTQ format used in the chapter:

DH_012_KCl_replicate1.fastq.gz
DH_012_KCl_replicate2.fastq.gz
DH_012_KNO3_replicate1.fastq.gz
DH_012_KNO3_replicate2.fastq.gz


A file with R functions extracted and modified from https://slowkow.github.io/CENTIPEDE.tutorial and other functions created for the development of this chapter: DNAse-functions.R.


A file with the 4 conditions footprints: protected.tgz.


A perl tool to group data in one row collapse.pl.


Genome coordinates of 1000 pb upstream of 5' utr of each gene in .bed format: TAIR10.1000pb5p.bed.


Regulated genes in response to nitrate at 60 min (KNO3 vs KCl 5mM) response_to_nitrate060min.txt.


Gene symbol and descriptions for Arabidopsis Genes gene.att.araport11.