Mol. Cells 2017; 40(10): 714-730
Published online October 31, 2017
https://doi.org/10.14348/molcells.2017.2297
© The Korean Society for Molecular and Cellular Biology
Correspondence to : *Correspondence: kim750a11@gmail.com
Pre-mRNA splicing further increases protein diversity acquired through evolution. The underlying driving forces for this phenomenon are unknown, especially in terms of gene expression. A rice alternatively spliced transcript detection microarray (ASDM) and RNA sequencing (RNA-Seq) were applied to differentiate the transcriptome of 4 representative organs of
Keywords alternatively spliced transcript detection microarray, coefficient value, rice, RNA sequencing, transcriptome
Eukaryotic organisms have evolved ingenious mechanisms to increase protein diversity. At the genome level, diversity is amplified through various mechanisms of gene duplication, such as whole-genome (WG) duplication and segment and tandem duplications, during evolution (Freeling, 2009). At the protein level, alternative RNA splicing provides another layer of diversity, whereby transcripts of many multi-exonic genes are processed with the inclusion or exclusion of particular exons in the final mRNA molecule (Berget et al., 1977; Chow et al., 1977). It has been reported that ~95% of multi-exonic genes are alternatively spliced in humans; in contrast, the number is ~33% in rice (Pan et al., 2008; Zhang et al., 2010). Pre-mRNA splicing is regulated by a system of trans-acting RNA-binding proteins that bind to cis-acting sites on the primary transcript, followed by recruitment of spliceosomal proteins (Ser/Arg-rich proteins, SRs) to the target site. By recruiting small nuclear ribonucleoproteins (snRNPs) that form the spliceosome, SRs are well known for their role in alternative splicing (Glisovic et al., 2008; Lunde et al., 2007). Although many RNA isoforms can be generated from a single locus, the majority of loci generate only two or three isoforms in abundance.
Splicing isoforms consist of several groups corresponding to patterns of exon truncation or extension, intron retention or inclusion/exclusion of entire exons in a mature transcript (Cambell et al., 2006; Kriventseva et al., 2003). Such splice modes have been classified by several research groups. For example, Lewis et al. (2003) proposed classifying splice modes based on splice site usage and effects on the coding sequence. These authors reported that splice sites can be introduced or lost according to the mode of splice type, such as an intra-exon splice, an imperfect exon skip, alternate sites, exon inclusion, and a perfect exon skip; changes in coding region were also considered. Campbell et al. (2006) classified alternative splice forms into 9 types, including alternate acceptor/donor sites and retained or skipped exons in rice. Several web tools are also available for detecting alternative splicing (Huang et al., 2003; Koscielny et al., 2009; Zhu et al., 2014). Expression Atlas (
Rice (
Over the last two decades, DNA microarray technology has been extensively applied in gene expression studies in various areas of biology, such as transcriptome changes during shoot, root, and callus development (Che et al., 2006; Jiao et al., 2009; Ma et al., 2005; Su et al., 2007; Wang et al., 2010). Microarrays have also been used to determine gene expression profiles of whole plants or various organs during stress or physiological changes (Kreps et al., 2002; Lenka et al., 2011; Minh-Thu et al., 2013; Rabbani et al., 2003; Sarkar et al., 2014; Seki et al., 2002; Shankar et al., 2014; Shinozaki and Yamaguchi-Shinozaki, 2000). In these analyses, several versions of rice chips have been used, such as the GeneChip (25 bp) from Affymetrix (
In this study, we designed feature probes for an alternatively spliced transcript detection microarray (ASDM) for rice. The 40,139 transcripts/36,176 loci in ASDM can be detected in the current
All four organs in this study were prepared from
An ASDM for rice was designed based on the genome information of IRGSP_1_0_representative_2014_06_25 (
Probes were labeled using Agilent Low RNA Input Linear Amplification Kit PLUS following the manufacturer’s protocol (
Biological term enrichment among CVI, CVII, CVIII-i, and CVIII-ii genes was assessed using GoMiner (Ashburner et al., 2000; Zeeberg et al., 2003), as described previously (Minh-Thu et al., 2013). To identify a tentative ortholog of a rice gene in the
mRNA was purified using the TruSeq RNA sample preparation kit (
Programs such as TopHat, Cufflinks and HTSeq were used to process sequence reads. Differential exon usage was examined with the DEXseq package in Bioconductor. Briefly, the rice genome sequence (IRGSP-1.0_2014-06-25) in the RAP database (
For RT-PCR, 5 μg of total RNA was reverse transcribed using oligo (dT) and a RevertAid H Minus First Strand cDNA Synthesis Kit (Thermo Scientific™, USA). PCR was performed with one-hundredth of the first-strand cDNA mixture and gene-specific primers. The PCR conditions used were as follows: 10 min at 94°C and 30 cycles of 30 s at 94°C, 30 s at 55°C, and 30 s at 72°C. The 20 μl contained 10 μl Solg™ 2X Real-Time PCR Smart mix [Solgent, Korea], 2 μl cDNA, 1 μl primers, and 7 μl water, and real-time PCR was performed using the CFX96 touch real-time PCR detection system (Bio-Rad, USA) with three technical replicates. Expression was assessed by evaluating threshold cycle (CT) values. The relative expression level of tested genes was normalized to the
It has been reported that the 3′ regions of genes in plants are more specific than the 5′ regions (Eveland et al., 2008). To reflect these features, four 60-nt-long feature probes were designed for each gene, starting 60 bp upstream of the end of the stop codon and shifting 30 bp, such that the four probes covered 60 bp of the CDS and 90 bp of the 3′ UTR (arrows collectively indicate R for locus
A total of 37,868 genes are representatively annotated in the IRGSP and RAP databases (
We applied the rice ASDM to detect alternatively spliced genes among the leaf, root, panicle at 0.5–1.5 cm (P1cm) and seed at 20–30 days after pollination (S21DAP). The 175,193 feature probes for 41,953 transcripts were normalized as described in the Methods section (
To confirm the detection of alternatively spliced transcripts by ASDM, several genes were tested. As shown in Fig. 1B (ASDM), among the four organs, the strongest intensity for
We also examined
The microarray data showed that transcript
The microarray probe intensities showed the highest expression of
In a parallel analysis, RNA-Seq was performed using the Illumina HiSeq 2000 platform, as indicated in the Methods section. Approximately 17–43 million reads were obtained from 30–80 Gb raw reads (
We tested genome-wide co-linearity between the microarray intensities and RNA-Seq counts. The log2-based average intensities of the leaf microarray data and the log2-based counts for the organs were compared (Fig. 2). Pearson’s correlation from two methods ranged 0.30–0.36 for the leaf, root, P1cm and S21DAP, suggesting correlation between the data. We also downloaded publicly available RNA-Seq data from TENOR (
A list of exons of the transcripts was generated from a GTF file in IRGSP-1.0 subversion 2016-03-09 with a python script, as described in the Methods section (Anders et al., 2012). The overlapping exon regions are marked as “counting bins” by the program, and the read counts are distinguished by the statistical model in the program. The bins of transcripts such as
Among 45,128 transcripts, the number with normalized counts below 1 ranged from 12,000 to 15,000, and the median intensity of ASDM for these transcripts was 100.5. Based on the microarray data, the transcripts (32,718) with intensity greater than 300 (3* the median intensity of the transcripts, where RNA-Seq counts were greater than 1) in at least one organ were chosen for further analysis. The average values are presented in
Based on these CV values in ASDM, genes were tentatively categorized into 3 groups, as shown in Fig. 3A: CVI (higher than 1.1), CVII (higher than 0.7), and CVIII (below 0.7). We classified these groups as an organ-preferential group (CVI, 5,410), an organ-enriched group (CVII, 6,547), and a nonspecific group (CVIII, 20,761). CVIII was sub-divided as -i (highest), -ii (medium) and -iii (lowest) according to the degree of expression according to ASDM. As the CV values of RNA-Seq were comparable to those of ASDM, the CV values from RNA-Seq are also categorized as those from ASDM in Fig. 3B and
When the intensities and counts were set ≥ 300 and ≥ 1 for ASDM and RNA-Seq, respectively, the effective number of expressed genes was approximately 37,000 for both technologies. Genes commonly detected using these technologies were searched according to CV group (Fig. 3C): approximately 40–50% of transcripts were commonly detected for each of the CVI and CVII groups, whereas approximately 60% of transcripts were detected for CVIII. For each technology, the number of transcripts commonly detected for most CV groups was greater than the number of excluded transcripts.
As the genes of the CVI group for ASDM showed organ enrichment, those with a single transcript (4,172) and alternatively spliced transcripts (1,169) in this group were delineated by hierarchical clustering (Fig. 4). Highly expressed genes (618) in the leaf with a single transcript were enriched for a metabolic process represented by photosynthesis (
Highly expressed genes with alternatively spliced transcripts in the leaf clustered into 4-5 groups (
To examine biological term enrichment, GO analyses were performed for genes of the CVI, CVII, CVIII-i, and CVIII-ii groups from the ASDM dataset (Ashburner et al., 2000; Zeeberg et al., 2003). This comparison was performed for loci of whole genes and those with single (16,999) or alternatively spliced (5,844) transcripts (
In the rice genome, there are 5,254 loci with more than two alternative transcripts (
We also assessed IAST for CV groups based on RNA-Seq analysis (Table 3). The IAST values for CVI, CVII, and CVIII were 0.09, 0.12 and 0.21, respectively, and comparable to those based by ASDM. The values among the CVI and CVII organs were relatively higher in the leaf and lower in the root, P1cm, and S21DAP samples, but the values were somewhat lower than those based on ASDM. The IAST values among the CVIII subgroups were higher than those among other groups, as shown for ASDM, and were also less than those for 100 constitutive genes (Consti100 in Table 2). These data confirm the observation in the ASDM analysis, whereby IAST of genes constitutively expressed throughout organs was higher than that of organ-enriched genes.
We tested organ preference among alternatively spliced transcripts of the CVI group enriched in each organ. The number of these transcript variants were 527, 179, 120 and 124 from 367, 137, 91 and 96 loci of the leaf, root, P1cm, S21DAP, respectively (Table 4). We tested CV groups and organ preference between a major transcript and its alternatively spliced transcripts. For example, two transcripts,
We tested the extent of expression among spliced transcripts within a locus by comparing transcripts with the highest and lowest expression values. To simplify expression variability, we compared the log2-based ratio between transcripts with the highest and lowest expression according to the ASDM dataset. The log ratio of the highest and lowest values is shown for all tested organs in
Campbell et al. (2006) classified types of alternative splicing into 9 groups based on the results of Program to Assemble Spliced Alignments (PASA): alternate acceptor, alternate donor, alternate terminal exon, retained exon and skipped exon, initiation within an intron, termination within an intron, and spliced intron and retained intron. We categorized the type of alternative splicing to assess splice types and variation in expression between transcripts with the highest and lowest expression. PASA was applied first to whole transcripts, detecting 5,028 alternative splices for 2,438 of 4,143 loci in the genome and 3,598 alternative splices for 2,415 loci in the microarray (
It has been reported that expression varies among alternatively spliced transcripts. Nonsense-mediated mRNA decay (NMD) has been postulated as the target of RNA degradation (Lewis et al., 2003). When an alternative splice is introduced, the stop codon is classified, and the corresponding mRNA isoform is labeled an NMD candidate, resulting in lower expression compared to the isoform with a longer CDS. If the spliced transcript escapes NMD, the result is a transcript with a shorter CDS. To examine how CDS length, due to alternative splicing, affects the degree of expression, transcripts with alternative splicing are sorted by the degree of expression and the CDS lengths compared according to organ and CV group. As shown in Table 5, among alternatively spliced transcripts, longer CDS transcripts appeared to be more expressed than shorter ones for all organs and CV groups. A paired t-test showed a p-value of 0.029, suggesting that these differences are significant.
Genome-wide gene expression profiling using microarray and RNA-Seq technologies has provided abundant information regarding the basic biological mechanisms underlying development, stress responses, and other processes (Gordon et al., 2014; Ma et al., 2005; Severin et al., 2010; Shinozaki and Yamaguchi-Shinozaki, 2000; Wakasa et al., 2014). In the current version of rice ASDM, 60-bp-long oligomers were used, and 40,953 of the 44,552 transcripts deposited in IRGSP 1.0 were included. We improved the feature probe design to assess alternative splicing. Because long oligomers can be designed for spliced regions composed of two exons, the approach may have advantages over short oligomer-based microarrays. Several loci, such as
CV values for constitutive expression vary according to the number of organ samples. In this report, CV values of 0.7 and 0.8 were tentatively used for ASDM and RNA-Seq, respectively (Fig. 3). Wang et al. (2010) and our previous study (Chae et al., 2016) reported values of 0.1 and 0.3 using 39 tissues and 7 representative organs/tissues, respectively. For both technologies, the variation in and criteria for constitutive expression increased as the number of samples decreased. Because values vary depending on the approach, the medium range of CVII groups was adopted as an intermediate for each technology. These demarcations showed that analysis of genes with alternatively spliced transcripts can be achieved for both in a similar manner. The effective number of expressed transcripts was approximately 33,000 for ASDM and RNA-Seq (Fig. 3C), and the genes commonly detected ranged from 40 to 70%, more than the number transcripts exclusive for each technology in each CV group. Among the organs examined, higher numbers for constitutively expressed genes in both datasets were found for commonly expressed group. In particular, the results of GO analysis of the CV group between ASDM and RNA-Seq were quite similar (
Expressed TFs with a single transcript in P1cm included MADS box TFs
Analysis of the transcriptome using ASDM might be useful for alternatively spliced transcripts that include TFs. Indeed, a TF with an alternatively spliced transcript in P1cm,
Pre-mRNA or alternative splicing is found in eukaryotic organisms, some of which become highly developed through various processes. Many trans- and cis-acting factors have been identified, and one basic question with regard to alternative splicing pressure for common and organ-preferential genes remains. To address this question, we examined the influence of organ preference on the number of alternatively spliced transcripts and the influence of the degree of expression on the genome. Ratios were designated as IAST. The IAST value of expressed genes was 0.19, slightly higher than those of the genome and ASDM. These values were 0.14, 0.22 and 0.24 for CVI, CVII, and CVIII, respectively (Table 2). When CVIII was divided according to the extent of expression, IAST ranged from 0.31–0.34. IAST values also were higher in groups with greater constitutive expression, culminating in 0.43 for the most constitutive 100 genes in the literature (Wang et al., 2010). These observations were also confirmed by RNA-Seq data (Table 3). Although IAST values for CVI, CVII, and CVIII were 0.09, 0.12 and 0.21, respectively, slightly lower than those based on ASDM, the tendency toward increase with constitutively expressed groups and variations among organs were similar in these independent analyses.
To date, it remains unclear why constitutively expressed genes would have a higher IAST. All exon boundaries require canonical splice sites (e.g., GT-AG, GC-AG, AT-AC), and all bases in the genome can be exposed to mutation (Burset et al., 2000; Lorkovic et al., 2005; Mount and Steize 1981). DNA damage events can occur at a high frequency in higher eukaryotes, such that a single mammalian cell can accumulate ~10,000 abasic (apurinic) lesions per day (Lindahl and Nyberg, 1972). For comparison, mutation rates in microbes with DNA-based chromosomes are close to 1/300 per genome per replication (Drake et al., 1998). A high level of transcription has been associated with elevated spontaneous mutations and recombination rates in eukaryotic organisms (Datta and Jinks-Robertson, 1995). For example, the effect of transcription on the rate of spontaneous mutation in the yeast
According to our analysis, a transcript confers organ preference and alternatively spliced transcript(s) confer(s) the same preference or show less organ specificity. However, alternatively spliced transcripts do not exhibit additional organ distinctness or preference (Table 4). In other words, among CVI genes that show strong organ preference, none of alternatively spliced forms belong to the CVI group of the other organ. These finding suggest that multiple specificity may not be allowed for any two alternatively spliced transcripts in the CVI group. However, alternative splicing may be allowed for genes that are less detrimental to organogenesis and development, as mentioned above with regard to the observation of lower IAST in CVI compared to CVIII. Nonetheless, alternatively spliced transcripts that escape an organism’s surveillance evolve additional functionality and may be more abundant in other organs, resulting in CVII or CVIII. These aspects may place more constraint on pre-mRNA splicing of CVI genes, with lower IAST, as shown in Tables 2 and 3.
Numerous alternatively spliced transcripts have been implicated in flowering (Marquardt et al., 2014; Wang et al., 2014), circadian clock (Filichkin et al., 2010; Seo et al., 2011), and abiotic stress (Shang et al., 2017). In these analyses, alternatively spliced transcripts have a similar biological function. In the example of
Pre-mRNA splicing might be involved in balancing maximization of function and additional functionality to avoid distinct functions. For alternatively spliced variants of mRNAs, there is no way to distinguish functional differences that may be at risk of a waste of resources. Differential regulation through the components of the spliceosome might be possible, which would provide additional complexity in gene regulation. In this regard, pre-mRNA splicing might be involved in unique strategies contrasting those shown in gene families that provide direct functional distinctness by using unique promoters to respond to different signals (Freeling, 2009). As an example of a gene family, two transcripts
Our data also suggest that reciprocal expression is complementary among transcripts within a locus, regardless of organ specificity (Liu et al., 2011). The log2-based values of the highest and lowest expressed transcripts were 2.5–2.7 (sd = 1.9) (
There may be additional mechanisms for maintaining alternative splicing. Reciprocal or complementary expression patterns have been reported, whereby a copy among gene duplicates is expressed in a certain organ or tissue type and another copy is expressed to a different extent in other organs or tissue types (Liu et al., 2011). Approximately 30% of whole-genome duplicate pairs and 38% of tandem duplicate pairs show reciprocal expression patterns in
In this study, rice ASDM and RNA-Seq were applied to differentiate alternatively spliced transcripts between representative rice organs, including the leaf, root, panicle at 1 cm, and young seed. Transcripts were classified according to organ enrichment, and the results are collectively explained in terms of evolutionary pressure. The portion of loci with multiple transcripts due to alternative splicing was higher among constitutively expressed genes than organ-preferential loci. The allowance of alternative splicing is maximized by avoiding expression of the other alternative splicing transcript under unwanted circumstances. Among loci, transcripts with a longer CDS tended to be more highly expressed than those with a shorter CDS. These genome-wide technologies might be useful for studying transcriptomes and their biological significance with regard to pre-mRNA splicing.
Summary of loci in IRGSP v1.0 and design features of the rice ASDM
Chr | Loci | Alternative locia) | Alternative transcriptb) | Total transcriptsc) | Designed locid) | Designed alternative locie) | Designed alternative transcriptsf) | Designed total_transcriptsg) |
---|---|---|---|---|---|---|---|---|
Os01 | 5,273 | 778 | 1,765 | 6,260 | 5,070 | 522 | 1,093 | 5,641 |
Os02 | 4,216 | 642 | 1,464 | 5,038 | 4,056 | 423 | 908 | 4,541 |
Os03 | 4,550 | 761 | 1,735 | 5,524 | 4,407 | 478 | 1,010 | 4,939 |
Os04 | 3,359 | 473 | 1,083 | 3,969 | 3,200 | 347 | 729 | 3,582 |
Os05 | 3,027 | 437 | 985 | 3,575 | 2,914 | 284 | 610 | 3,240 |
Os06 | 3,159 | 406 | 919 | 3,672 | 3,042 | 278 | 592 | 3,356 |
Os07 | 2,870 | 362 | 829 | 3,337 | 2,753 | 260 | 547 | 3,040 |
Os08 | 2,588 | 350 | 799 | 3,037 | 2,456 | 241 | 514 | 2,729 |
Os09 | 2,111 | 285 | 641 | 2,467 | 2,012 | 205 | 434 | 2,241 |
Os10 | 2,115 | 260 | 582 | 2,437 | 1,981 | 174 | 369 | 2,176 |
Os11 | 2,424 | 247 | 543 | 2,720 | 2,240 | 161 | 340 | 2,419 |
Os12 | 2,176 | 253 | 593 | 2,516 | 2,045 | 166 | 356 | 2,235 |
Summary | 37,868 | 5,254 | 11,938 | 44,552 | 36,176 | 3,539 | 7,502 | 40,139 |
RAP-DB annotation based on the genome IRGSP_1_0_representative_2014_06_25 was used
b)The number of transcripts due to alternative splicing at these loci
c)The total number of transcripts, both single and alternatively spliced transcript(s)
d)The number of loci included in this ASDM
e)The number of loci with alternative splicing included in this ASDM
f)The number of transcripts with alternative splicing included in this ASDM
g)The number of transcripts with single or alternative splicing included in this ASDM
The portion of loci with alternatively spliced transcripts among organ-preferential transcripts based on ASDM analysis
ts_totala) | loci_totalb) | loci_single_ts c) | loci_as_ts d) | loci_single_ts/loci_total | loci_as_ts/loci_totale) | ts_variant_from_as_loci | |
---|---|---|---|---|---|---|---|
Genome | 44,553 | 37,869 | 32,615 | 5,254 | 0.86 | 0.14 | 11,938 |
ASDM | 38,399 | 36,141 | 30,897 | 5,244 | 0.85 | 0.15 | 7,502 |
CV total | 32,718 | 26,573 | 21,486 | 5,087 | 0.81 | 0.19 | |
CVI | 5,410 | 4,862 | 4,172 | 690 | 0.86 | 0.14 | |
CVII | 6,547 | 5,883 | 4,607 | 1,276 | 0.78 | 0.22 | |
CVIII | 20,761 | 16,735 | 12,707 | 4,028 | 0.76 | 0.24 | |
CVI_Leaf | 1,755 | 1,397 | 1,030 | 367 | 0.74 | 0.26 | |
CVI_Root | 1,716 | 1,638 | 1,501 | 137 | 0.92 | 0.08 | |
CVI_P1cm | 771 | 720 | 629 | 91 | 0.87 | 0.13 | |
CVI_S21DAP | 1,168 | 1,108 | 1,012 | 96 | 0.91 | 0.09 | |
CVII_Leaf | 1,636 | 1,433 | 1,029 | 404 | 0.72 | 0.28 | |
CVII_Root | 1,713 | 1,534 | 1,203 | 331 | 0.78 | 0.22 | |
CVII_P1cm | 1,980 | 1,821 | 1,488 | 333 | 0.82 | 0.18 | |
CVII_S21DAP | 1,218 | 1,127 | 887 | 240 | 0.79 | 0.21 | |
CVIII_i | 5,293 | 4,786 | 3,154 | 1,632 | 0.66 | 0.34 | |
CVIII_ii | 7,937 | 7,302 | 5,056 | 2,246 | 0.69 | 0.31 | |
CVIII_iii | 7531 | 6848 | 4497 | 2351 | 0.66 | 0.34 | |
Consti100f) | 100 | 100 | 57 | 43 | 0.57 | 0.43 |
These ratios are designated as the index of loci with alternatively spliced transcripts in the group (IAST). These values are 0.14, 0.22 and 0.24 in CVI, CVII, and CVIII, respectively
b)Total number of genes
c)Number of genes that produce a transcript
d)Number of genes that have alternatively spliced transcripts
e)IAST, a fraction of the number of loci with alternatively spliced transcript among the total gene loci
f)List of 100 constitutively expressed genes in Supplementary Table S8, reported by Wang et al.(2010), was used.
The portion of loci with alternatively spliced transcripts among organ-preferential transcripts based on RNA-Seq analysis
ts_totala) | loci_totalb) | loci_single_tsc) | loci_as_tsd) | loci_single_ts/loci_total | loci_as_ts/loci_totale) | |
---|---|---|---|---|---|---|
CV total | 32,666 | 26,759 | 22,071 | 4,688 | 0.82 | 0.18 |
CVI | 5,721 | 5,161 | 4,700 | 461 | 0.91 | 0.09 |
CVII | 8,574 | 7,495 | 6,603 | 892 | 0.88 | 0.12 |
CVIII | 18,371 | 14,477 | 11,378 | 3,099 | 0.79 | 0.21 |
CVI_Leaf | 1,938 | 1,570 | 1,280 | 290 | 0.82 | 0.18 |
CVI_Root | 1,648 | 1,565 | 1,493 | 72 | 0.95 | 0.05 |
CVI_P1cm | 706 | 668 | 631 | 37 | 0.94 | 0.06 |
CVI_S21DAP | 1,429 | 1,362 | 1,301 | 61 | 0.96 | 0.04 |
CVII_Leaf | 2,542 | 2,143 | 1,815 | 328 | 0.85 | 0.15 |
CVII_Root | 2,127 | 1,850 | 1,616 | 234 | 0.87 | 0.13 |
CVII_P1cm | 2,197 | 2,014 | 1,857 | 157 | 0.92 | 0.08 |
CVII_S21DAP | 1,708 | 1,531 | 1,389 | 142 | 0.91 | 0.09 |
CVIII_i | 3,467 | 2,678 | 2,048 | 630 | 0.76 | 0.24 |
CVIII_ii | 5,858 | 4,746 | 3,805 | 941 | 0.80 | 0.20 |
CVIII_iii | 9,046 | 7,760 | 6,675 | 1,085 | 0.86 | 0.14 |
These ratios are designated as the index of loci with alternatively spliced transcripts in the group (IAST) as indicated in the ASDM dataset. These values are 0.09, 0.12 and 0.21 in CVI, CVII, and CVIII, respectively
b)Total number of genes
c)Number of genes that produce a transcript
d)Number of genes that have multiple (alternatively spliced) transcripts
e)IAST, a fraction of the number of loci with alternatively spliced transcript among the total gene loci.
Summary of CVI transcripts from loci that have alternatively spliced transcripts
CVI_organ | as_tsa) | loci_as_tsb) | as_ts_CVI_organc) | loci_CVI_same_organd) | as_ts_CVI_diff_organe) | as_ts_Not_CVIf) |
---|---|---|---|---|---|---|
CVI_Leaf | 527 | 367 | 305 | 147 | 0 | 222 |
CVI_Root | 179 | 137 | 81 | 39 | 0 | 98 |
CVI_P1cm | 120 | 91 | 57 | 28 | 0 | 63 |
CVI_S21DAP | 124 | 96 | 54 | 26 | 0 | 70 |
a)The number of transcript variants in each CVI organ
b)Number of loci of these transcripts
c)Number of CVI transcript variants that are grouped in the same organ
d)Number of loci of these organs
e)Number of CVI alternatively spliced transcript variants that are grouped in different CVI organ(s)
f)Number of transcripts that have the alternatively spliced transcript in CVII, CVIII or a group expressed in low amounts.
Number of genes with a longer CDS among alternatively spliced transcripts with different degrees of expression
Highest intensitya) | Lowest intensityb) | |
---|---|---|
CVI_Leaf | 63 | 54 |
CVI_Root | 23 | 10 |
CVI_P1cm | 16 | 13 |
CVI_S21DAP | 21 | 13 |
CVII_Leaf | 72 | 41 |
CVII_Root | 76 | 35 |
CVII_P1cm | 63 | 40 |
CVII_S21DAP | 48 | 30 |
CVIII_i | 227 | 172 |
CVIII_ii | 408 | 250 |
CVIII_iii | 492 | 287 |
Total | 1,509 | 945 |
Genes with alternatively spliced transcripts were collected, and the degrees of intensity were sorted and then assessed to determine which transcript has a longer CDS. A paired t-test between the highest and lowest expressed transcripts within locus with a longer CDS showed a p-value of 0.029, suggesting that the differences are significant
b)The number of transcripts with lowest intensity and a longer CDS
Mol. Cells 2017; 40(10): 714-730
Published online October 31, 2017 https://doi.org/10.14348/molcells.2017.2297
Copyright © The Korean Society for Molecular and Cellular Biology.
Songhwa Chae1, Joung Sug Kim1, Kyong Mi Jun2, Sang-Bok Lee3, Myung Soon Kim4, Baek Hie Nahm1,2, and Yeon-Ki Kim1,*
1Division of Bioscience and Bioinformatics, Myongji University, Yongin 17058, Korea, 2GreenGene Biotech Inc., 116, Yongin 17058, Korea, 3Central Area Crop Breeding Research Division, National Institute of Crop Science, Chuncheon 24219, Korea, 4Genomictree Inc., 44-6, Daejeon 34027, Korea
Correspondence to:*Correspondence: kim750a11@gmail.com
Pre-mRNA splicing further increases protein diversity acquired through evolution. The underlying driving forces for this phenomenon are unknown, especially in terms of gene expression. A rice alternatively spliced transcript detection microarray (ASDM) and RNA sequencing (RNA-Seq) were applied to differentiate the transcriptome of 4 representative organs of
Keywords: alternatively spliced transcript detection microarray, coefficient value, rice, RNA sequencing, transcriptome
Eukaryotic organisms have evolved ingenious mechanisms to increase protein diversity. At the genome level, diversity is amplified through various mechanisms of gene duplication, such as whole-genome (WG) duplication and segment and tandem duplications, during evolution (Freeling, 2009). At the protein level, alternative RNA splicing provides another layer of diversity, whereby transcripts of many multi-exonic genes are processed with the inclusion or exclusion of particular exons in the final mRNA molecule (Berget et al., 1977; Chow et al., 1977). It has been reported that ~95% of multi-exonic genes are alternatively spliced in humans; in contrast, the number is ~33% in rice (Pan et al., 2008; Zhang et al., 2010). Pre-mRNA splicing is regulated by a system of trans-acting RNA-binding proteins that bind to cis-acting sites on the primary transcript, followed by recruitment of spliceosomal proteins (Ser/Arg-rich proteins, SRs) to the target site. By recruiting small nuclear ribonucleoproteins (snRNPs) that form the spliceosome, SRs are well known for their role in alternative splicing (Glisovic et al., 2008; Lunde et al., 2007). Although many RNA isoforms can be generated from a single locus, the majority of loci generate only two or three isoforms in abundance.
Splicing isoforms consist of several groups corresponding to patterns of exon truncation or extension, intron retention or inclusion/exclusion of entire exons in a mature transcript (Cambell et al., 2006; Kriventseva et al., 2003). Such splice modes have been classified by several research groups. For example, Lewis et al. (2003) proposed classifying splice modes based on splice site usage and effects on the coding sequence. These authors reported that splice sites can be introduced or lost according to the mode of splice type, such as an intra-exon splice, an imperfect exon skip, alternate sites, exon inclusion, and a perfect exon skip; changes in coding region were also considered. Campbell et al. (2006) classified alternative splice forms into 9 types, including alternate acceptor/donor sites and retained or skipped exons in rice. Several web tools are also available for detecting alternative splicing (Huang et al., 2003; Koscielny et al., 2009; Zhu et al., 2014). Expression Atlas (
Rice (
Over the last two decades, DNA microarray technology has been extensively applied in gene expression studies in various areas of biology, such as transcriptome changes during shoot, root, and callus development (Che et al., 2006; Jiao et al., 2009; Ma et al., 2005; Su et al., 2007; Wang et al., 2010). Microarrays have also been used to determine gene expression profiles of whole plants or various organs during stress or physiological changes (Kreps et al., 2002; Lenka et al., 2011; Minh-Thu et al., 2013; Rabbani et al., 2003; Sarkar et al., 2014; Seki et al., 2002; Shankar et al., 2014; Shinozaki and Yamaguchi-Shinozaki, 2000). In these analyses, several versions of rice chips have been used, such as the GeneChip (25 bp) from Affymetrix (
In this study, we designed feature probes for an alternatively spliced transcript detection microarray (ASDM) for rice. The 40,139 transcripts/36,176 loci in ASDM can be detected in the current
All four organs in this study were prepared from
An ASDM for rice was designed based on the genome information of IRGSP_1_0_representative_2014_06_25 (
Probes were labeled using Agilent Low RNA Input Linear Amplification Kit PLUS following the manufacturer’s protocol (
Biological term enrichment among CVI, CVII, CVIII-i, and CVIII-ii genes was assessed using GoMiner (Ashburner et al., 2000; Zeeberg et al., 2003), as described previously (Minh-Thu et al., 2013). To identify a tentative ortholog of a rice gene in the
mRNA was purified using the TruSeq RNA sample preparation kit (
Programs such as TopHat, Cufflinks and HTSeq were used to process sequence reads. Differential exon usage was examined with the DEXseq package in Bioconductor. Briefly, the rice genome sequence (IRGSP-1.0_2014-06-25) in the RAP database (
For RT-PCR, 5 μg of total RNA was reverse transcribed using oligo (dT) and a RevertAid H Minus First Strand cDNA Synthesis Kit (Thermo Scientific™, USA). PCR was performed with one-hundredth of the first-strand cDNA mixture and gene-specific primers. The PCR conditions used were as follows: 10 min at 94°C and 30 cycles of 30 s at 94°C, 30 s at 55°C, and 30 s at 72°C. The 20 μl contained 10 μl Solg™ 2X Real-Time PCR Smart mix [Solgent, Korea], 2 μl cDNA, 1 μl primers, and 7 μl water, and real-time PCR was performed using the CFX96 touch real-time PCR detection system (Bio-Rad, USA) with three technical replicates. Expression was assessed by evaluating threshold cycle (CT) values. The relative expression level of tested genes was normalized to the
It has been reported that the 3′ regions of genes in plants are more specific than the 5′ regions (Eveland et al., 2008). To reflect these features, four 60-nt-long feature probes were designed for each gene, starting 60 bp upstream of the end of the stop codon and shifting 30 bp, such that the four probes covered 60 bp of the CDS and 90 bp of the 3′ UTR (arrows collectively indicate R for locus
A total of 37,868 genes are representatively annotated in the IRGSP and RAP databases (
We applied the rice ASDM to detect alternatively spliced genes among the leaf, root, panicle at 0.5–1.5 cm (P1cm) and seed at 20–30 days after pollination (S21DAP). The 175,193 feature probes for 41,953 transcripts were normalized as described in the Methods section (
To confirm the detection of alternatively spliced transcripts by ASDM, several genes were tested. As shown in Fig. 1B (ASDM), among the four organs, the strongest intensity for
We also examined
The microarray data showed that transcript
The microarray probe intensities showed the highest expression of
In a parallel analysis, RNA-Seq was performed using the Illumina HiSeq 2000 platform, as indicated in the Methods section. Approximately 17–43 million reads were obtained from 30–80 Gb raw reads (
We tested genome-wide co-linearity between the microarray intensities and RNA-Seq counts. The log2-based average intensities of the leaf microarray data and the log2-based counts for the organs were compared (Fig. 2). Pearson’s correlation from two methods ranged 0.30–0.36 for the leaf, root, P1cm and S21DAP, suggesting correlation between the data. We also downloaded publicly available RNA-Seq data from TENOR (
A list of exons of the transcripts was generated from a GTF file in IRGSP-1.0 subversion 2016-03-09 with a python script, as described in the Methods section (Anders et al., 2012). The overlapping exon regions are marked as “counting bins” by the program, and the read counts are distinguished by the statistical model in the program. The bins of transcripts such as
Among 45,128 transcripts, the number with normalized counts below 1 ranged from 12,000 to 15,000, and the median intensity of ASDM for these transcripts was 100.5. Based on the microarray data, the transcripts (32,718) with intensity greater than 300 (3* the median intensity of the transcripts, where RNA-Seq counts were greater than 1) in at least one organ were chosen for further analysis. The average values are presented in
Based on these CV values in ASDM, genes were tentatively categorized into 3 groups, as shown in Fig. 3A: CVI (higher than 1.1), CVII (higher than 0.7), and CVIII (below 0.7). We classified these groups as an organ-preferential group (CVI, 5,410), an organ-enriched group (CVII, 6,547), and a nonspecific group (CVIII, 20,761). CVIII was sub-divided as -i (highest), -ii (medium) and -iii (lowest) according to the degree of expression according to ASDM. As the CV values of RNA-Seq were comparable to those of ASDM, the CV values from RNA-Seq are also categorized as those from ASDM in Fig. 3B and
When the intensities and counts were set ≥ 300 and ≥ 1 for ASDM and RNA-Seq, respectively, the effective number of expressed genes was approximately 37,000 for both technologies. Genes commonly detected using these technologies were searched according to CV group (Fig. 3C): approximately 40–50% of transcripts were commonly detected for each of the CVI and CVII groups, whereas approximately 60% of transcripts were detected for CVIII. For each technology, the number of transcripts commonly detected for most CV groups was greater than the number of excluded transcripts.
As the genes of the CVI group for ASDM showed organ enrichment, those with a single transcript (4,172) and alternatively spliced transcripts (1,169) in this group were delineated by hierarchical clustering (Fig. 4). Highly expressed genes (618) in the leaf with a single transcript were enriched for a metabolic process represented by photosynthesis (
Highly expressed genes with alternatively spliced transcripts in the leaf clustered into 4-5 groups (
To examine biological term enrichment, GO analyses were performed for genes of the CVI, CVII, CVIII-i, and CVIII-ii groups from the ASDM dataset (Ashburner et al., 2000; Zeeberg et al., 2003). This comparison was performed for loci of whole genes and those with single (16,999) or alternatively spliced (5,844) transcripts (
In the rice genome, there are 5,254 loci with more than two alternative transcripts (
We also assessed IAST for CV groups based on RNA-Seq analysis (Table 3). The IAST values for CVI, CVII, and CVIII were 0.09, 0.12 and 0.21, respectively, and comparable to those based by ASDM. The values among the CVI and CVII organs were relatively higher in the leaf and lower in the root, P1cm, and S21DAP samples, but the values were somewhat lower than those based on ASDM. The IAST values among the CVIII subgroups were higher than those among other groups, as shown for ASDM, and were also less than those for 100 constitutive genes (Consti100 in Table 2). These data confirm the observation in the ASDM analysis, whereby IAST of genes constitutively expressed throughout organs was higher than that of organ-enriched genes.
We tested organ preference among alternatively spliced transcripts of the CVI group enriched in each organ. The number of these transcript variants were 527, 179, 120 and 124 from 367, 137, 91 and 96 loci of the leaf, root, P1cm, S21DAP, respectively (Table 4). We tested CV groups and organ preference between a major transcript and its alternatively spliced transcripts. For example, two transcripts,
We tested the extent of expression among spliced transcripts within a locus by comparing transcripts with the highest and lowest expression values. To simplify expression variability, we compared the log2-based ratio between transcripts with the highest and lowest expression according to the ASDM dataset. The log ratio of the highest and lowest values is shown for all tested organs in
Campbell et al. (2006) classified types of alternative splicing into 9 groups based on the results of Program to Assemble Spliced Alignments (PASA): alternate acceptor, alternate donor, alternate terminal exon, retained exon and skipped exon, initiation within an intron, termination within an intron, and spliced intron and retained intron. We categorized the type of alternative splicing to assess splice types and variation in expression between transcripts with the highest and lowest expression. PASA was applied first to whole transcripts, detecting 5,028 alternative splices for 2,438 of 4,143 loci in the genome and 3,598 alternative splices for 2,415 loci in the microarray (
It has been reported that expression varies among alternatively spliced transcripts. Nonsense-mediated mRNA decay (NMD) has been postulated as the target of RNA degradation (Lewis et al., 2003). When an alternative splice is introduced, the stop codon is classified, and the corresponding mRNA isoform is labeled an NMD candidate, resulting in lower expression compared to the isoform with a longer CDS. If the spliced transcript escapes NMD, the result is a transcript with a shorter CDS. To examine how CDS length, due to alternative splicing, affects the degree of expression, transcripts with alternative splicing are sorted by the degree of expression and the CDS lengths compared according to organ and CV group. As shown in Table 5, among alternatively spliced transcripts, longer CDS transcripts appeared to be more expressed than shorter ones for all organs and CV groups. A paired t-test showed a p-value of 0.029, suggesting that these differences are significant.
Genome-wide gene expression profiling using microarray and RNA-Seq technologies has provided abundant information regarding the basic biological mechanisms underlying development, stress responses, and other processes (Gordon et al., 2014; Ma et al., 2005; Severin et al., 2010; Shinozaki and Yamaguchi-Shinozaki, 2000; Wakasa et al., 2014). In the current version of rice ASDM, 60-bp-long oligomers were used, and 40,953 of the 44,552 transcripts deposited in IRGSP 1.0 were included. We improved the feature probe design to assess alternative splicing. Because long oligomers can be designed for spliced regions composed of two exons, the approach may have advantages over short oligomer-based microarrays. Several loci, such as
CV values for constitutive expression vary according to the number of organ samples. In this report, CV values of 0.7 and 0.8 were tentatively used for ASDM and RNA-Seq, respectively (Fig. 3). Wang et al. (2010) and our previous study (Chae et al., 2016) reported values of 0.1 and 0.3 using 39 tissues and 7 representative organs/tissues, respectively. For both technologies, the variation in and criteria for constitutive expression increased as the number of samples decreased. Because values vary depending on the approach, the medium range of CVII groups was adopted as an intermediate for each technology. These demarcations showed that analysis of genes with alternatively spliced transcripts can be achieved for both in a similar manner. The effective number of expressed transcripts was approximately 33,000 for ASDM and RNA-Seq (Fig. 3C), and the genes commonly detected ranged from 40 to 70%, more than the number transcripts exclusive for each technology in each CV group. Among the organs examined, higher numbers for constitutively expressed genes in both datasets were found for commonly expressed group. In particular, the results of GO analysis of the CV group between ASDM and RNA-Seq were quite similar (
Expressed TFs with a single transcript in P1cm included MADS box TFs
Analysis of the transcriptome using ASDM might be useful for alternatively spliced transcripts that include TFs. Indeed, a TF with an alternatively spliced transcript in P1cm,
Pre-mRNA or alternative splicing is found in eukaryotic organisms, some of which become highly developed through various processes. Many trans- and cis-acting factors have been identified, and one basic question with regard to alternative splicing pressure for common and organ-preferential genes remains. To address this question, we examined the influence of organ preference on the number of alternatively spliced transcripts and the influence of the degree of expression on the genome. Ratios were designated as IAST. The IAST value of expressed genes was 0.19, slightly higher than those of the genome and ASDM. These values were 0.14, 0.22 and 0.24 for CVI, CVII, and CVIII, respectively (Table 2). When CVIII was divided according to the extent of expression, IAST ranged from 0.31–0.34. IAST values also were higher in groups with greater constitutive expression, culminating in 0.43 for the most constitutive 100 genes in the literature (Wang et al., 2010). These observations were also confirmed by RNA-Seq data (Table 3). Although IAST values for CVI, CVII, and CVIII were 0.09, 0.12 and 0.21, respectively, slightly lower than those based on ASDM, the tendency toward increase with constitutively expressed groups and variations among organs were similar in these independent analyses.
To date, it remains unclear why constitutively expressed genes would have a higher IAST. All exon boundaries require canonical splice sites (e.g., GT-AG, GC-AG, AT-AC), and all bases in the genome can be exposed to mutation (Burset et al., 2000; Lorkovic et al., 2005; Mount and Steize 1981). DNA damage events can occur at a high frequency in higher eukaryotes, such that a single mammalian cell can accumulate ~10,000 abasic (apurinic) lesions per day (Lindahl and Nyberg, 1972). For comparison, mutation rates in microbes with DNA-based chromosomes are close to 1/300 per genome per replication (Drake et al., 1998). A high level of transcription has been associated with elevated spontaneous mutations and recombination rates in eukaryotic organisms (Datta and Jinks-Robertson, 1995). For example, the effect of transcription on the rate of spontaneous mutation in the yeast
According to our analysis, a transcript confers organ preference and alternatively spliced transcript(s) confer(s) the same preference or show less organ specificity. However, alternatively spliced transcripts do not exhibit additional organ distinctness or preference (Table 4). In other words, among CVI genes that show strong organ preference, none of alternatively spliced forms belong to the CVI group of the other organ. These finding suggest that multiple specificity may not be allowed for any two alternatively spliced transcripts in the CVI group. However, alternative splicing may be allowed for genes that are less detrimental to organogenesis and development, as mentioned above with regard to the observation of lower IAST in CVI compared to CVIII. Nonetheless, alternatively spliced transcripts that escape an organism’s surveillance evolve additional functionality and may be more abundant in other organs, resulting in CVII or CVIII. These aspects may place more constraint on pre-mRNA splicing of CVI genes, with lower IAST, as shown in Tables 2 and 3.
Numerous alternatively spliced transcripts have been implicated in flowering (Marquardt et al., 2014; Wang et al., 2014), circadian clock (Filichkin et al., 2010; Seo et al., 2011), and abiotic stress (Shang et al., 2017). In these analyses, alternatively spliced transcripts have a similar biological function. In the example of
Pre-mRNA splicing might be involved in balancing maximization of function and additional functionality to avoid distinct functions. For alternatively spliced variants of mRNAs, there is no way to distinguish functional differences that may be at risk of a waste of resources. Differential regulation through the components of the spliceosome might be possible, which would provide additional complexity in gene regulation. In this regard, pre-mRNA splicing might be involved in unique strategies contrasting those shown in gene families that provide direct functional distinctness by using unique promoters to respond to different signals (Freeling, 2009). As an example of a gene family, two transcripts
Our data also suggest that reciprocal expression is complementary among transcripts within a locus, regardless of organ specificity (Liu et al., 2011). The log2-based values of the highest and lowest expressed transcripts were 2.5–2.7 (sd = 1.9) (
There may be additional mechanisms for maintaining alternative splicing. Reciprocal or complementary expression patterns have been reported, whereby a copy among gene duplicates is expressed in a certain organ or tissue type and another copy is expressed to a different extent in other organs or tissue types (Liu et al., 2011). Approximately 30% of whole-genome duplicate pairs and 38% of tandem duplicate pairs show reciprocal expression patterns in
In this study, rice ASDM and RNA-Seq were applied to differentiate alternatively spliced transcripts between representative rice organs, including the leaf, root, panicle at 1 cm, and young seed. Transcripts were classified according to organ enrichment, and the results are collectively explained in terms of evolutionary pressure. The portion of loci with multiple transcripts due to alternative splicing was higher among constitutively expressed genes than organ-preferential loci. The allowance of alternative splicing is maximized by avoiding expression of the other alternative splicing transcript under unwanted circumstances. Among loci, transcripts with a longer CDS tended to be more highly expressed than those with a shorter CDS. These genome-wide technologies might be useful for studying transcriptomes and their biological significance with regard to pre-mRNA splicing.
. Summary of loci in IRGSP v1.0 and design features of the rice ASDM.
Chr | Loci | Alternative locia) | Alternative transcriptb) | Total transcriptsc) | Designed locid) | Designed alternative locie) | Designed alternative transcriptsf) | Designed total_transcriptsg) |
---|---|---|---|---|---|---|---|---|
Os01 | 5,273 | 778 | 1,765 | 6,260 | 5,070 | 522 | 1,093 | 5,641 |
Os02 | 4,216 | 642 | 1,464 | 5,038 | 4,056 | 423 | 908 | 4,541 |
Os03 | 4,550 | 761 | 1,735 | 5,524 | 4,407 | 478 | 1,010 | 4,939 |
Os04 | 3,359 | 473 | 1,083 | 3,969 | 3,200 | 347 | 729 | 3,582 |
Os05 | 3,027 | 437 | 985 | 3,575 | 2,914 | 284 | 610 | 3,240 |
Os06 | 3,159 | 406 | 919 | 3,672 | 3,042 | 278 | 592 | 3,356 |
Os07 | 2,870 | 362 | 829 | 3,337 | 2,753 | 260 | 547 | 3,040 |
Os08 | 2,588 | 350 | 799 | 3,037 | 2,456 | 241 | 514 | 2,729 |
Os09 | 2,111 | 285 | 641 | 2,467 | 2,012 | 205 | 434 | 2,241 |
Os10 | 2,115 | 260 | 582 | 2,437 | 1,981 | 174 | 369 | 2,176 |
Os11 | 2,424 | 247 | 543 | 2,720 | 2,240 | 161 | 340 | 2,419 |
Os12 | 2,176 | 253 | 593 | 2,516 | 2,045 | 166 | 356 | 2,235 |
Summary | 37,868 | 5,254 | 11,938 | 44,552 | 36,176 | 3,539 | 7,502 | 40,139 |
RAP-DB annotation based on the genome IRGSP_1_0_representative_2014_06_25 was used.
b)The number of transcripts due to alternative splicing at these loci
c)The total number of transcripts, both single and alternatively spliced transcript(s)
d)The number of loci included in this ASDM
e)The number of loci with alternative splicing included in this ASDM
f)The number of transcripts with alternative splicing included in this ASDM
g)The number of transcripts with single or alternative splicing included in this ASDM
. The portion of loci with alternatively spliced transcripts among organ-preferential transcripts based on ASDM analysis.
ts_totala) | loci_totalb) | loci_single_ts c) | loci_as_ts d) | loci_single_ts/loci_total | loci_as_ts/loci_totale) | ts_variant_from_as_loci | |
---|---|---|---|---|---|---|---|
Genome | 44,553 | 37,869 | 32,615 | 5,254 | 0.86 | 0.14 | 11,938 |
ASDM | 38,399 | 36,141 | 30,897 | 5,244 | 0.85 | 0.15 | 7,502 |
CV total | 32,718 | 26,573 | 21,486 | 5,087 | 0.81 | 0.19 | |
CVI | 5,410 | 4,862 | 4,172 | 690 | 0.86 | 0.14 | |
CVII | 6,547 | 5,883 | 4,607 | 1,276 | 0.78 | 0.22 | |
CVIII | 20,761 | 16,735 | 12,707 | 4,028 | 0.76 | 0.24 | |
CVI_Leaf | 1,755 | 1,397 | 1,030 | 367 | 0.74 | 0.26 | |
CVI_Root | 1,716 | 1,638 | 1,501 | 137 | 0.92 | 0.08 | |
CVI_P1cm | 771 | 720 | 629 | 91 | 0.87 | 0.13 | |
CVI_S21DAP | 1,168 | 1,108 | 1,012 | 96 | 0.91 | 0.09 | |
CVII_Leaf | 1,636 | 1,433 | 1,029 | 404 | 0.72 | 0.28 | |
CVII_Root | 1,713 | 1,534 | 1,203 | 331 | 0.78 | 0.22 | |
CVII_P1cm | 1,980 | 1,821 | 1,488 | 333 | 0.82 | 0.18 | |
CVII_S21DAP | 1,218 | 1,127 | 887 | 240 | 0.79 | 0.21 | |
CVIII_i | 5,293 | 4,786 | 3,154 | 1,632 | 0.66 | 0.34 | |
CVIII_ii | 7,937 | 7,302 | 5,056 | 2,246 | 0.69 | 0.31 | |
CVIII_iii | 7531 | 6848 | 4497 | 2351 | 0.66 | 0.34 | |
Consti100f) | 100 | 100 | 57 | 43 | 0.57 | 0.43 |
These ratios are designated as the index of loci with alternatively spliced transcripts in the group (IAST). These values are 0.14, 0.22 and 0.24 in CVI, CVII, and CVIII, respectively.
b)Total number of genes
c)Number of genes that produce a transcript
d)Number of genes that have alternatively spliced transcripts
e)IAST, a fraction of the number of loci with alternatively spliced transcript among the total gene loci
f)List of 100 constitutively expressed genes in Supplementary Table S8, reported by Wang et al.(2010), was used.
. The portion of loci with alternatively spliced transcripts among organ-preferential transcripts based on RNA-Seq analysis.
ts_totala) | loci_totalb) | loci_single_tsc) | loci_as_tsd) | loci_single_ts/loci_total | loci_as_ts/loci_totale) | |
---|---|---|---|---|---|---|
CV total | 32,666 | 26,759 | 22,071 | 4,688 | 0.82 | 0.18 |
CVI | 5,721 | 5,161 | 4,700 | 461 | 0.91 | 0.09 |
CVII | 8,574 | 7,495 | 6,603 | 892 | 0.88 | 0.12 |
CVIII | 18,371 | 14,477 | 11,378 | 3,099 | 0.79 | 0.21 |
CVI_Leaf | 1,938 | 1,570 | 1,280 | 290 | 0.82 | 0.18 |
CVI_Root | 1,648 | 1,565 | 1,493 | 72 | 0.95 | 0.05 |
CVI_P1cm | 706 | 668 | 631 | 37 | 0.94 | 0.06 |
CVI_S21DAP | 1,429 | 1,362 | 1,301 | 61 | 0.96 | 0.04 |
CVII_Leaf | 2,542 | 2,143 | 1,815 | 328 | 0.85 | 0.15 |
CVII_Root | 2,127 | 1,850 | 1,616 | 234 | 0.87 | 0.13 |
CVII_P1cm | 2,197 | 2,014 | 1,857 | 157 | 0.92 | 0.08 |
CVII_S21DAP | 1,708 | 1,531 | 1,389 | 142 | 0.91 | 0.09 |
CVIII_i | 3,467 | 2,678 | 2,048 | 630 | 0.76 | 0.24 |
CVIII_ii | 5,858 | 4,746 | 3,805 | 941 | 0.80 | 0.20 |
CVIII_iii | 9,046 | 7,760 | 6,675 | 1,085 | 0.86 | 0.14 |
These ratios are designated as the index of loci with alternatively spliced transcripts in the group (IAST) as indicated in the ASDM dataset. These values are 0.09, 0.12 and 0.21 in CVI, CVII, and CVIII, respectively.
b)Total number of genes
c)Number of genes that produce a transcript
d)Number of genes that have multiple (alternatively spliced) transcripts
e)IAST, a fraction of the number of loci with alternatively spliced transcript among the total gene loci.
. Summary of CVI transcripts from loci that have alternatively spliced transcripts.
CVI_organ | as_tsa) | loci_as_tsb) | as_ts_CVI_organc) | loci_CVI_same_organd) | as_ts_CVI_diff_organe) | as_ts_Not_CVIf) |
---|---|---|---|---|---|---|
CVI_Leaf | 527 | 367 | 305 | 147 | 0 | 222 |
CVI_Root | 179 | 137 | 81 | 39 | 0 | 98 |
CVI_P1cm | 120 | 91 | 57 | 28 | 0 | 63 |
CVI_S21DAP | 124 | 96 | 54 | 26 | 0 | 70 |
a)The number of transcript variants in each CVI organ
b)Number of loci of these transcripts
c)Number of CVI transcript variants that are grouped in the same organ
d)Number of loci of these organs
e)Number of CVI alternatively spliced transcript variants that are grouped in different CVI organ(s)
f)Number of transcripts that have the alternatively spliced transcript in CVII, CVIII or a group expressed in low amounts.
. Number of genes with a longer CDS among alternatively spliced transcripts with different degrees of expression.
Highest intensitya) | Lowest intensityb) | |
---|---|---|
CVI_Leaf | 63 | 54 |
CVI_Root | 23 | 10 |
CVI_P1cm | 16 | 13 |
CVI_S21DAP | 21 | 13 |
CVII_Leaf | 72 | 41 |
CVII_Root | 76 | 35 |
CVII_P1cm | 63 | 40 |
CVII_S21DAP | 48 | 30 |
CVIII_i | 227 | 172 |
CVIII_ii | 408 | 250 |
CVIII_iii | 492 | 287 |
Total | 1,509 | 945 |
Genes with alternatively spliced transcripts were collected, and the degrees of intensity were sorted and then assessed to determine which transcript has a longer CDS. A paired t-test between the highest and lowest expressed transcripts within locus with a longer CDS showed a p-value of 0.029, suggesting that the differences are significant.
b)The number of transcripts with lowest intensity and a longer CDS
Heng Lin, Peng Hu, Hongyu Zhang, Yong Deng, Zhiqing Yang, and Leida Zhang
Mol. Cells 2022; 45(5): 329-342 https://doi.org/10.14348/molcells.2022.2176Wenbo Zhou, Huiyan Wang, Yuqi Yang, Fang Guo, Bin Yu, and Zhaoliang Su
Mol. Cells 2022; 45(5): 317-328 https://doi.org/10.14348/molcells.2021.0211Seung-Young Roh, Ji Yeon Kim, Hyo Kyeong Cha, Hye Young Lim, Youngran Park, Kwang-No Lee, Jaemin Shim, Jong-Il Choi, Young-Hoon Kim, and Gi Hoon Son
Mol. Cells 2020; 43(4): 408-418 https://doi.org/10.14348/molcells.2020.2164