High throughput or Next
generation sequencing

Sanger sequencing
method (1977) and capillary-based automated sequencing technology represent the
first generation sequencing technology that contributes to a series of
break-through discoveries, including the completion of the human genome project
(Sanger et al. 1977; Kheterpal et al. 1996; International Human Genome
Sequencing Consortium, 2004).  In spite
of its many successes, first generation technology is limited by its low
throughput, sample requirements, and high expense. To overcome its limitation, a
decade ago “second generation sequencing” or “next generation sequencing” (NGS)
technologies has emerged. NGS technologies generate millions of short
sequencing reads when compared to first generation sequencing which are useful
in the annotation of genomes and in understanding the underlying biological processes
for the future molecular-genetics based research.  The most widely used NGS sequencing platforms
in genomics and transcriptomics studies are Roche 454, Illumina/Solexa, and ABI/SOLiD.
The Roche 454 platform uses the emulsion PCR for the isolation and
amplification of DNA fragments, and pyrophosphate-based single-nucleotide
addition sequencing method on a micro-fabricated array of picoliter-scale wells
(Margulies et al. 2005). In the Illumina/Solexa platform, the template amplicon
is achieved through bridge PCR, followed by four-color cyclic reversible
termination steps in sequencing and imaging process (Bentley et al. 2008). The
SOLiD platform uses DNA ligase, rather than polymerase, to drive the sequencing
by synthesis (Valouev et al. 2008). These three platforms are widely used
because of their advantage of low cost and high throughput sequencing compared
to the first generation sequencing technologies (Margulies et al. 2005; Bentley
et al. 2008; Metzker 2010; Shendure and Ji, 2008; Mardis  2008).

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Transcriptome
sequencing (RNA-seq) and analysis

Transcriptome is the entire set of expressed
RNA transcripts in a specific cell in a given time under specific physiological
condition (Ozsolak and Milos, 2011). Understanding the transcriptome is
essential for interpreting the function and regulation of genes that extend invaluable
insights into the mechanism of development and diseases (Wang et al. 2013; Li
et al. 2015; Chen et al. 2016; Jayaswall et al. 2016; Matic et al. 2016). Previously,
microarray has been used for high-throughput large-scale RNA-level studies. In
recent years, transcriptome sequencing has rapidly replaced microarray
technology because of its          high
throughput, better resolution and higher reproducibility. The main objectives of
transcriptomics studies are: (i) reconstructing/ assembling all transcripts,
including mRNA and noncoding RNA (Garg et al. 2011; Scarano et al. 2017), small
RNA (Kumar et al. 2017), etc; (ii) identifying transcript structures, e.g.,
transcript start/end sites (Yang et al. 2011), exon-intron structure (Marquez
et al. 2012), alternative splicing (Sun et al. 2015),  and (iii) quantifying the expression levels of
transcripts under certain biological conditions, e.g., development, and stress
(Trapnell et al. 2012).  

            Transcriptome
sequencing took over the advantage of studying complete transcriptome of
organisms, with or without their genome sequences at low cost.  There are an increasing number of non-model
species that have undergone transcriptomics study before their genome sequences
were determined, especially for those crops of polyploidy (Wang et al. 2013;
Ward et al. 2012; Duan et al. 2012). Currently, the submitted RNA-seq data to
the public databases are exponentially increasing each year (Rung and Brazma,
2012). There are more than 60 plants (Schliesky et al. 2012) that are subjected
to de novo transcriptome study by RNA-seq,
yielding insights into the mechanism of development and gene regulation. In
transcriptome sequencing, the reconstruction or assembly of the transcriptome
can be performed following two different approaches, a reference-based method
in which reads are mapped back to a reference genome, and a ‘de novo’ assembly strategy where reads
are assembled based on the overlapping sequences between reads without the need
of a reference genome. Without reference genomes, the de novo assembly approach
is considered to be more difficult than the reference based assembly using
short sequence reads. Overall approach in non-model
organism de
novo analysis are: 1) pre-processing
of high-throughput short-read sequences to remove adapter or low quality
contaminated sequences, 2) optimize assembly and mapping parameters using partial data, 3) then process the
total data using these optimized mapping and de
novo parameters, 4) Assembled contigs are mapped to the BLAST database, leading to the extraction of
candidate duplicate genes. 5) Further,
function annotation of the assembled contigs/unigenes allows for insight into
the particular molecular functions, cellular components, and biological
processes in which the putative proteins are involved. Following annotation, KEGG (Kyoto
Encyclopedia of Genes and Genomes) enables visualization of metabolic pathways
and molecular interaction networks captured in the transcriptome. Results of the analysis are visualized at
different stages for validation purposes. To date, a large number
of non-model plants like chickpea, turmeric, coconut, red raspberry, pea,
rubber, radish, Faba bean and many more have been sequenced successfully by NGS
for the crop improvement and also for studying biological processes (Garg et
al. 2011; Annadurai et al. 2013; Hyun et al. 2014; Fan et al. 2013;
Alves-Carvalho et al. 2015; Liu et al. 2015; Sun et al. 2016; Braich et al.
2017).