High throughput or Nextgeneration sequencingSanger sequencingmethod (1977) and capillary-based automated sequencing technology represent thefirst generation sequencing technology that contributes to a series ofbreak-through discoveries, including the completion of the human genome project(Sanger et al. 1977; Kheterpal et al. 1996; International Human GenomeSequencing Consortium, 2004). In spiteof its many successes, first generation technology is limited by its lowthroughput, sample requirements, and high expense. To overcome its limitation, adecade ago “second generation sequencing” or “next generation sequencing” (NGS)technologies has emerged.
NGS technologies generate millions of shortsequencing reads when compared to first generation sequencing which are usefulin the annotation of genomes and in understanding the underlying biological processesfor the future molecular-genetics based research. The most widely used NGS sequencing platformsin genomics and transcriptomics studies are Roche 454, Illumina/Solexa, and ABI/SOLiD.The Roche 454 platform uses the emulsion PCR for the isolation andamplification of DNA fragments, and pyrophosphate-based single-nucleotideaddition sequencing method on a micro-fabricated array of picoliter-scale wells(Margulies et al.
2005). In the Illumina/Solexa platform, the template ampliconis achieved through bridge PCR, followed by four-color cyclic reversibletermination steps in sequencing and imaging process (Bentley et al. 2008). TheSOLiD platform uses DNA ligase, rather than polymerase, to drive the sequencingby synthesis (Valouev et al. 2008). These three platforms are widely usedbecause of their advantage of low cost and high throughput sequencing comparedto the first generation sequencing technologies (Margulies et al. 2005; Bentleyet al. 2008; Metzker 2010; Shendure and Ji, 2008; Mardis 2008).
Transcriptomesequencing (RNA-seq) and analysisTranscriptome is the entire set of expressedRNA transcripts in a specific cell in a given time under specific physiologicalcondition (Ozsolak and Milos, 2011). Understanding the transcriptome isessential for interpreting the function and regulation of genes that extend invaluableinsights into the mechanism of development and diseases (Wang et al. 2013; Liet al. 2015; Chen et al. 2016; Jayaswall et al. 2016; Matic et al.
2016). Previously,microarray has been used for high-throughput large-scale RNA-level studies. Inrecent years, transcriptome sequencing has rapidly replaced microarraytechnology because of its highthroughput, better resolution and higher reproducibility. The main objectives oftranscriptomics studies are: (i) reconstructing/ assembling all transcripts,including mRNA and noncoding RNA (Garg et al.
2011; Scarano et al. 2017), smallRNA (Kumar et al. 2017), etc; (ii) identifying transcript structures, e.
g.,transcript start/end sites (Yang et al. 2011), exon-intron structure (Marquezet al. 2012), alternative splicing (Sun et al. 2015), and (iii) quantifying the expression levels oftranscripts under certain biological conditions, e.
g., development, and stress(Trapnell et al. 2012).
Transcriptomesequencing took over the advantage of studying complete transcriptome oforganisms, with or without their genome sequences at low cost. There are an increasing number of non-modelspecies that have undergone transcriptomics study before their genome sequenceswere determined, especially for those crops of polyploidy (Wang et al. 2013;Ward et al. 2012; Duan et al.
2012). Currently, the submitted RNA-seq data tothe public databases are exponentially increasing each year (Rung and Brazma,2012). There are more than 60 plants (Schliesky et al. 2012) that are subjectedto de novo transcriptome study by RNA-seq,yielding insights into the mechanism of development and gene regulation. Intranscriptome sequencing, the reconstruction or assembly of the transcriptomecan be performed following two different approaches, a reference-based methodin which reads are mapped back to a reference genome, and a ‘de novo’ assembly strategy where readsare assembled based on the overlapping sequences between reads without the needof a reference genome. Without reference genomes, the de novo assembly approachis considered to be more difficult than the reference based assembly usingshort sequence reads.
Overall approach in non-modelorganism denovo analysis are: 1) pre-processingof high-throughput short-read sequences to remove adapter or low qualitycontaminated sequences, 2) optimize assembly and mapping parameters using partial data, 3) then process thetotal data using these optimized mapping and denovo parameters, 4) Assembled contigs are mapped to the BLAST database, leading to the extraction ofcandidate duplicate genes. 5) Further,function annotation of the assembled contigs/unigenes allows for insight intothe particular molecular functions, cellular components, and biologicalprocesses in which the putative proteins are involved. Following annotation, KEGG (KyotoEncyclopedia of Genes and Genomes) enables visualization of metabolic pathwaysand molecular interaction networks captured in the transcriptome. Results of the analysis are visualized atdifferent stages for validation purposes. To date, a large numberof non-model plants like chickpea, turmeric, coconut, red raspberry, pea,rubber, radish, Faba bean and many more have been sequenced successfully by NGSfor the crop improvement and also for studying biological processes (Garg etal. 2011; Annadurai et al. 2013; Hyun et al. 2014; Fan et al.
2013;Alves-Carvalho et al. 2015; Liu et al. 2015; Sun et al. 2016; Braich et al.2017).