Shn Yng,Ptr Cápl,Jroslv Dolzˇl,Xuting Li,Wng Qin,Zhiqing Wng,Ki Zng,Piting Li,Hongki Zhou,Rui Xi,Muqing Zhng,,*,Zuhu Dng,,*
a National Engineering Research Center for Sugarcane & State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources,Fujian Agriculture and Forestry University,Fuzhou 350002,Fujian,China
b State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources & Guangxi Key Laboratory of Sugarcane Biology,Guangxi University,Nanning 530004,Guangxi,China
c Institute of Experimental Botany of the Czech Academy of Sciences,Center of the Region Hanáfor Biotechnological and Agricultural Research,Olomouc CZ-77900,Czech Republic
d College of Coastal Agricultural Sciences,Guangdong Ocean University,Zhanjiang 524088,Guangdong,China
e State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources,South China Agricultural University,Guangzhou 510642,Guangdong,China
Keywords:Flow cytometry GISHIS Chromosome sorting Genome analysis Sugarcane
ABSTRACT Erianthus arundinaceus is a wild relative of sugarcane(Saccharum officinarum L.)with many desirable agronomic traits for sugarcane genetic improvement.However,limited knowledge of the complex genome of hexaploid E.arundinaceus has impeded the development of required molecular tools.Dissecting complex genomes into single chromosomes can simplify analyses.The flow-cytometric sorting of a single chromosome of E.arundinaceus in a Saccharum-Erianthus introgression line is reported.A novel approach called genomic in situ hybridization in suspension was used to discriminate the alien chromosome from sugarcane chromosomes at the same size.A total of 218,000 E.arundinaceus chromosome 1(EaC1)were sorted to>97% purity and amplified DNA was sequenced using Illumina and PacBio technologies.The resulting assembly had a 70.93 Mb contig sequence with an N50 of 19.62 kb.A total of 56.69 Mb repeat sequences were predicted,accounting for 79.1%of the chromosome and 2646 genes having a total length of 1.84 Mb that represented 2.59%of the chromosome.Of these genes,1877(70.9%)genes were functionally annotated.The phylogenetic relationship of E.arundinaceus with other species using the chromosome 1 sequence revealed that E.arundinaceus was distantly related to Oryza sativa and Zea mays,followed by Sorghum bicolor,and was closely related to S.spontaneum and Saccharum spp.hybrids.This study provides the first insights into the characteristics of EaC1,and the results will provide tools to support molecular improvement and alien introgression breeding of sugarcane.
Erianthus arundinaceus Retz.Jeswiet is a perennial grass native to southern China and comprises diploid,tetraploid,and hexaploid cytotypes with the basic chromosome number x=10[1].E.arundinaceus is a wild relative of sugarcane(Saccharum officinarum L.),which produces over 85% of the global sugar supply and 40% of the bioethanol[2].Modern sugarcane cultivars,highly heterozygous poly-aneuploids with somatic chromosome number 2n=100-130,originated from a few interspecific crosses between octoploid S.officinarum(2n=8x=80),and wild relatives(2n=4x=32 to 16x=128).S.officinarum with high sugar content was domesticated about 8000 years ago,whereas wild S.spontaneum of varying ploidies contributes to disease resistance,vigor,and adaptability.Nobilization breeding gave rise to modern cultivars with complex genomes comprising a majority of chromosomes from S.officinarum,10%-20% chromosomes from its wild relatives,and about 10% recombinant chromosomes[3,4].Thus,modern sugarcane cultivars are vegetatively propagated and have a narrow genetic base.
A promising approach to introduce new genes and alleles to sugarcane is intergeneric crosses with wild relatives from the so-called‘‘Saccharum complex”,which comprises four genera:Erianthus,Miscanthus,Narenga,and Sclerostachya[5].Of these,E.arundinaceus is the most attractive gene donor owing to its high biomass productivity,superior ratooning ability,and tolerance to biotic and abiotic stresses[6].However,attempts to transfer favorable genes from E.arundinaceus have been hampered by cross incompatibilities,low seed setting,and irregular chromosome transmission[7,8].Thus,F1hybrids between S.officinarum and E.arundinaceus have 68 to 69 chromosomes rather than the expected 70,with 40 S.officinarum and 28-29 E.arundinaceus chromosomes[9,10].Similar chromosome transmission irregularities were observed in hybrids between Saccharum spp.and E.arundinaceus[11].
Although the absence of some sugarcane chromosomes in hybrids could compromise the performance of newly developed lines,the loss of E.arundinaceus chromosomes should not be problematic,as the aim is to introduce only small regions of the E.arundinaceus genome.Elimination of unexpected E.arundinaceus chromosomes can be achieved by backcrossing F1hybrids to Saccharum spp.as demonstrated by Wu et al.[10]and Huang et al.[12].The resulting plants in the BC4progeny carried only 1-6 E.arundinaceus chromosomes[13].Intergeneric chromosome translocations were detected in the backcross progenies[12,13],indicating the occurrence of recombination events during meiosis that should facilitate introgression of small genome regions from E.arundinaceus.The appearance and agronomic traits of BC4progenies were similar to those of commercial sugarcane cultivars[14],confirming the potential of this strategy for sugarcane breeding and genetic improvement.
The development of chromosome introgression lines requires methods for identifying alien chromosomes and their segments during the breeding process.Genomic in situ hybridization(GISH)was performed to discriminate chromosomes from Saccharum spp.and E.arundinaceus and detect a sizeable intergeneric chromosome translocation[11,12,15].However,GISH is slow and laborious,and higher-throughput methods are needed to screen larger populations.PCR with primers for 5S rDNA spacer and microsatellite loci demonstrated their suitability for identifying S.officinarum×E.arundinaceus F1hybrids and the presence of E.arundinaceus chromatin in backcross(BC1)progeny derived from crosses between selected F1clones and Saccharum spp.[8,16].However,the presence of only certain chromosomes or chromosome regions of E.arundinaceus could be verified owing to the small number of genome loci amplified.More single-copy markers are needed,but their development has been slowed by the poor availability of genomic information for Saccharum spp.and E.arundinaceus.
Significant efforts have been made to sequence the sugarcane genome[17].It is challenging to assemble and annotate the large and complex genome[11],comprising a mixture of homo(eo)logous chromosomes in different numbers[18].Alignment of sugarcane genomic BAC library clones to the sorghum genome revealed 4660 clones corresponding to a‘‘mosaic”monoploid with minimum tiling path[18].The clones were sequenced,assembled,and annotated to yield a 382 Mb sequence with 25,316 predicted protein-coding genes.Soon afterward,a whole-genome shotgun approach was employed to characterize the sugarcane gene space[19].After assembly of a 4.26-Gb sequence,373,869 putative genes and promoter regions were predicted.However,to date,no reference genome of modern sugarcane has been published.
Genome complexity may also explain why a genome sequence is unavailable for the main parent of octoploid S.officinarum(2n=8x=80)with an average genome size of 7.88 Gb[20].Autopolyploid S.spontaneum is the second main progenitor of modern sugarcane cultivars.A haploid plant AP85-441 from the octoploid cytotype SE208(2n=8x=64)has been sequenced and assembled into chromosomes[21].The haploid had 2n=4x=32 and a genome size of 3.36 Gb[20].A total of 35,156 BAC clones were sequenced using Illumina technology and genomic DNA by both PacBio and Illumina technologies.The resulting assembly of 32 pseudochromosomes comprises eight homologous chromosome groups of four chromosomes each,anchoring 2.9 Gb of the genome and defining 35,525 genes with alleles.Nascimento et al.[22]identified more genes(39,234)using a reference-guided transcript analysis of six different tissues.
Despite the high potential of E.arundinaceus for alien introgression breeding of sugarcane,knowledge of its genome remains poor.The only genome sequence currently available was produced recently by Zeng et al.[23].A genome sequence of hexaploid line HN-92-77(2n=6x=60)used in sugarcane breeding was surveyed,whose genome size was estimated by k-mer analysis to be 3.23 Gb,which is less than the 3.67 Gb[11]and 3.57 Gb[24]estimated using flow cytometry.The genome was sequenced to~52-fold coverage by Illumina HiSeq 2100 platform(Illumina,Inc.,San Diego,CA,USA)and assembled into 3156 Mb in 15,238,738 scaffolds with an N50 length of 216 bp[23].A total of 36,616 microsatellite motifs were screened from the scaffolds and used for identifying E.arundinaceus chromatin introgressed into sugarcane.
Flow cytometric sorting can simplify complex genomes by dissecting them into single chromosomes[25].This approach has been beneficial for mapping and cloning genes from alien introgressed chromosomes or their segments[26-28].Flow cytometry requires samples in the form of single-chromosome suspensions prepared from plant materials enriched for mitotic metaphase cells[29,30].Metcalfe et al.[31]and Yang et al.[32]demonstrated the feasibility of inducing a high degree of mitotic synchrony in sugarcane root tips cells,allowing for isolation of large quantities of chromosomes.Five groups of chromosomes were identified by flow cytometric analysis based on the relative DNA content of sugarcane chromosomes,with each group containing one or more homo(eo)logs of specific chromosomes[31].Each group represented an ancestral subgenome,or more likely,subgenomes that have undergone full-genome duplication.A study by Metcalfe et al.[31]indicated that the sugarcane genome could be dissected into small parts but not into a single chromosome.A similar situation has been observed for other species and was overcome by sorting single chromosome copies[33,34]or by fluorescent labeling of DNA repeats on chromosomes prior to flow cytometry by fluorescence in situ hybridization in suspension[35,36].
Despite its potential for genome analysis,no attempts have been made to flow-sort E.arundinaceus chromosomes.The research aimed to develop a novel approach for discriminating the alien chromosome from the Erianthus-sugarcane introgression line carrying a single chromosome from E.arundinaceus based on GISHIS.The single chromosome of E.arundinaceus was flowsorted,purified,sequenced,and assembled to identify and annotate genes on the chromosome from E.arundinaceus and establish its synteny with related species.
The clone 1679-33 used in the study is the BC5progeny of a cross between Saccharum spp.and Erianthus arundinaceus.This clone contains 114 Saccharum spp.chromosomes and one E.arundinaceus chromosome and has no translocations between E.arundinaceus and Saccharum chromosomes.The maternal and paternal parents of the 1679-33 clone were LC03-1137 and YCE07-71,respectively,with the latter containing five E.arundinaceus chromosomes.The 1679-33 clone was planted in the sugarcane breeding field of Fujian Agriculture and Forestry University.Stems of adult plants were cut into single-bud segments,cleaned and soaked in 0.5% carbendazim solution for 24 h,placed in a plastic tray,covered with wet perlite,and incubated at 25 °C in the dark.Cell-cycle synchronization and accumulation of metaphases were performed following Dolezˇel et al.[37].The roots were about 1.5 cm long;the segments were washed in ddH2O,then transferred to a plastic tray filled with 150 mL 0.1×Hoagland Solution(0.1×HS)containing 2 mmol L-1hydroxyurea(HU),and incubated at 25°C for 18 h in the dark.After the recovery treatment,the roots were immersed in 2.5μmol L-1amiprophos-methyl(APM)solution and incubated for 3 h at 25°C,then thoroughly rinsed in ddH2-O,transferred to HU-free 0.1×HS,and incubated at 25 °C for 5 h with aeration.
Chromosome suspensions were prepared with modifications following Dolezˇel et al.[37].Synchronized root tips were cut off and rinsed in ddH2O,fixed in 2%(v/v)formaldehyde(10 mmol L-1Tris,10 mmol L-1Na2EDTA,100 mmol L-1NaCl,0.1% Triton X-100,2% formaldehyde stock solution(37%,v/v),pH 7.5)in Tris buffer(10 mmol L-1Tris,10 mmol L-1Na2EDTA,100 mmol L-1NaCl,pH 9.0)for 20 min at 4°C,and then washed three times with Tris buffer for 5 min each at 4°C.The terminal 1-1.5 mm of 30 root tips was cut off with a sterile scalpel,collected in 0.5 mL LB01 buffer(15 mmol L-1Tris,2 mmol L-1Na2EDTA·2H2O,0.5 mmol L-1spermine·4HCl,80 mmol L-1KCl,20 mmol L-1NaCl,15 mmol L-1HOCH2CH2SH,0.1%(v/v)Triton X-100,pH 7.5)[38],and homogenized with a Polytron PT1300 homogenizer(Kinematica AG,Litau,Switzerland)at 18,000 r min-1for 15 s to release chromosomes.The chromosome suspension was filtered through 50μm nylon mesh and stored at 4 °C.The quality of the suspensions was checked microscopically after DAPI staining.
For preparing the probe,1μg E.arundinacesus gDNA was added to 6μL dNTP mix with fluorescein-12-dUTP(FITC,Thermo Fisher Scientific,Waltham,MA,USA),4μL nick translation mix,and ddH2O in a total volume of 20μL,which was then incubated at 15 °C for 5 h.The correct probe size(<800 bp)was confirmed by gel electrophoresis.The hybridization solution contained 50μL formamide,10μL 20×saline sodium citrate(SSCs),and 20μL E.arundinacesus gDNA probe,in a total volume of 100μL.The probe was denatured at 99°C for 5 min and then cooled on ice before 200μL formamide was added to 500μL of filtered chromosomal suspension and mixed thoroughly.The samples were denatured at 99 °C for 5 min and immediately put into ice-water for 10 min.Then,100μL denatured hybridization solution was added to the sample tube,which was incubated for 5 h at 37°C in a Hula Mixer sample mixer(Thermo Fisher Scientific)that rotates and shakes continually.After hybridization,the samples were centrifuged at 5000 r min-1for 8 min,and the pellet was resuspended in 500μL LB01 buffer.The effectiveness of chromosome labeling was assessed by fluorescence microscopy.Chromosome suspensions were stored at 4 °C prior to flow sorting.
Chromosome suspensions were filtered through a 20μm nylon mesh and stained with DAPI(2μg mL-1).Chromosomal populations were gated on FSC-A vs.DAPI-A parameters,with subsequent dependent gates representing E.arundinaceus chromosomes drawn in a DAPI-A vs.FITC-A(log)scatter plot.The samples were separated in a FACSAria II SORP flow cytometer(BD Biosciences,San Jose,CA,USA)at a rate of 3000-4000 events per second with E.arundinaceus chromosomes sorted into ddH2O water.The purity of the sorted chromosomes was evaluated microscopically from 1000 to 2000 chromosomes sorted onto a slide(repeat 3 slides,6 screens/slide).E.arundinaceus chromosomes were identified by their FITC fluorescence,whereas Saccharum spp.chromosomes carried only the DAPI signal.
The sorted E.arundinaceus chromosomes were treated with proteinase and their DNA was purified and amplified using a Genomi-Phi V2 DNA Amplification Kit(GE Healthcare,Chalfont St.Giles,UK)following?imkováet al.[39].Two sequencing libraries were constructed from the amplified DNA.The gDNA was randomly broken into 350-bp fragments by sonication for the Illumina library and into 11-kb fragments using g-TUBE(https://www.pacb.com/wp-content/uploads/2015/09/Covaris-g-TUBE-protocol.pdf) for the PacBio library.A short-read Illumina library was prepared with a NEBNext Ultra II DNA Library Prep Kit(Illumina,Inc.),and a longread PacBio sequencing library was prepared with PacBio’s SMRTbell Express Template Prep Kit 2.0(Pacific Biosciences,Menlo Park,CA,USA).The short-read library was sequenced on an Illumina NovaSeq 6000 instrument(Illumina,Inc.,San Diego,CA,USA)in 150-bp paired-end mode,and the long-read library was sequenced on a PacBio Sequel II instrument(Pacific Biosciences)in Circular Consensus Sequencing(CCS)mode.Short-read sequences of the single chromosome from E.arundinaceus were checked with FastQC(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/),trimmed with TrimGalore(https://github.-com/FelixKrueger/TrimGalore)and aligned to the Sorghum bicolor genome using bwa[40]before conversion to BAM format using Samtools[41].Coverage depth was characterized using bedtools[42]and plotted using a Python script.The PacBio CCS reads were quality-filtered using SMRTlink 8.0(https://www.pacb.com/wpcontent/uploads/SMRT_Link_Troubleshooting_Guide_v80.pdf)before being subjected to single-chromosome genome assembly with HiFiASM software(https://github.com/chhylp123/hifiasm).
2.6.1.DNA repeat prediction
Two methods were used to predict DNA repeats.(i)Homologous prediction:the DNA repeat sequence databases RepBase[43],RepeatMasker[44],and RepeatProteinMask[44]were used to predict sequences similar to known repeat sequences.(ii)Ab initio structure prediction method:RepeatModeler[45]was used to establish a repeat sequence library for ab initio structure prediction,and then RepeatMasker was used to make the prediction,and TRF[46]software was used to find tandem repeat sequences in the genome.
2.6.2.Gene prediction
Three methods were used for gene prediction.(i)The ab initio structure prediction method used Augustus 2.4[47]and SNAP[48]software to make gene predictions.(ii)Homologous prediction:BLAST[49]and GeneWise[50]were used for gene prediction based on homologous species.(iii)PASA 2.0.2[51]was used to predict genes based on transcriptome unigene sequences.Then,EVM 1.1.1[52]was used to remove redundancy and integrate the predicted results.
2.6.3.Non-coding RNA prediction
The genome-wide alignment against Rfam[53]was performed with BLAST to identify microRNA,rRNA,snRNA,and snoRNA.tRNAscan-SE[54]was used to identify tRNA.
2.6.4.Gene function annotation
The predicted genes were aligned with BLAST 2.2.31[49]against protein and gene databases including SwissProt[55],NT(https://www.ncbi.nlm.nih.gov/nucleotide/),NR[56],PFAM[57],eggnog[58],KOG[59],GO[60],and KEGG[61],with the e-value threshold set to 1e-5 for gene function annotation.
All 2646 gene sequences predicted from the scaffolds of E.arundinaceus chromosome 1(EaC1)were compared to identify syntenic regions in the two genomes by BLAST against the coding sequences(CDS)from Sorghum bicolor and S.spontaneum.The following filtering criteria were applied:the first BLAST hits showing at least 70.0% identity and a minimum alignment of 200 bp were considered to be homologous.Then,the recurring gene events were analyzed using Multiple Collinear Scanning(MCScanX)toolkits[62].The results were visualized with TBtools software[63].
The CDS of chromosome 1 from E.arundinaceus,Sorghum bicolor,S.spontaneum,Saccharum hybrids,Zea mays,and Oryza sativa were aligned using MAFFT v7.205[64].Genome regions with low alignment scores were removed using Gblocks v0.91b[65](parameter:-b5=h),and‘‘supergenes”were obtained by connecting all matching gene family sequences of each species.The JTT+1 model was used for evolutionary tree construction with the ModelFinder detection tool[66]from the IQ-Tree 1.6.11 package[67],and the maximum likelihood method was used to build an evolutionary tree with a bootstrap value of 1000.
No individual chromosome peaks were resolvable in univariate histograms obtained after flow cytometric analysis of DAPI-stained chromosomes isolated from the 1679-33 clone(Fig.S1A).Each group of 1000 particles was sorted to verify the quality of chromosome suspension from 1 to 4 sort windows(Fig.S1B).Particles in these sort windows were chromosomes,indicating that the chromosome suspension could be used for subsequent experiments(Fig.S1C-F).The chromosome population of E.arundinaceus could be identified on bivariate flow karyotypes by DAPI-A vs.FITC-A fluorescence after GISHIS(Fig.1).The chromosome morphology was retained intact after GISHIS,as represented by a similar univariate histogram(Fig.2A vs.Fig.S1A).
A total of 1200 particles were sorted from sort windows P2 and P3(Fig.2B)onto a microscopic slide and evaluated by fluorescence microscopy.All chromosomes sorted from the P2 sort window carried FITC fluorescence,and the purity in the sorted fraction was 97.85%(Fig.3).The particles sorted from the P3 sort window contained a mixture of Saccharum chromosomes and E.arundinaceus chromosomes,with approximately 75% of chromosomes carrying FITC signals(Fig.S2).Based on these observations,a total of 218,000 chromosomes were flow-sorted from the P2 sort window and used for DNA amplification.
The chromosomes sorted from the P2 sort window were deproteinized and purified,and their DNA was amplified to yield 29.63μg DNA.The size ranged from 3 to 45 kb,with the central band of about 45 kb(Fig.S3).The DNA fragments obtained after the amplification was checked by 1% agarose gel electrophoresis.
The amplified DNA was initially sequenced with 150-bp pairedend reads with an Illumina NovaSeq 6000,yielding 10.45 Gb raw data with 91.29% of Q30.The reads were aligned to the Sorghum bicolor genome(BioProject:PRJNA13876)to identify the homologous group to which the sorted E.arundinaceus chromosome belonged.The percentages of mappable and unique reads were 41.68% and 58.32%,respectively,with the reads mapping mainly to chromosome 1 of Sorghum bicolor(Fig.4),indicating that the sorted chromosomes were E.arundinaceus chromosome 1(EaC1).
PacBio sequencing of the amplified chromosomal DNA yielded 72.43 Gb reads with an N50 length of 179,114 bp(Table S1).Removal of sequencing adapters and low-quality reads yielded 58.60 Gb subreads with an N50 length of 9819 bp(Table S1).Final CCS yielded 1.33 Gb,with an N50 of 10,499 bp(Table S1).The CCS assembly of EaC1 had a total length of 70.93 Mb consisting of 3696 contigs,with a contig N50 of 19,624 bp and 44.90% GC content.Although the total length of the assembled chromosome was lower than its predicted length(~119 Mb),the contig N50 was low given that the N50 of CCS reads was 10.50 kb.CCS reads were aligned against Sorghum chromosome 1,and the resulting mean length of mapped fragments was 1725 bp,explaining the discontinuity of the assembly.
Fig.1.Microscopic examination of a crude chromosome suspension.The suspension was stained with DAPI(blue)and subjected to GISHIS to label E.arundinaceus chromosomes(green).(A)All particles show DAPI fluorescence.(B)Only E.arundinaceus chromosomes are labeled by FITC.Red arrows indicate chromosomes with a green signal.Scale bars,5μm.
Fig.2.Flow karyotypes obtained by separating chromosomes isolated from the Saccharum spp.hybrid-E.arundinaceus chromosome introgression line 1679-33.(A)Monovariate plot of DAPI-A fluorescence.(B)Bivariate plot of DAPI-A vs.FITC-A.The positions of the sort are shown in windows P2 and P3.
Fig.3.Microscopic examination of a chromosome population sorted from the P2 sort window.(A)and(C)show green signals.(B)and(D)show green and blue signals together.Scale bars,5μm.
Fig.4.Mapping depth of next-generation sequence reads on the Sorghum bicolor genome.
A total of 2646 genes were predicted,with a total length of 1.84 Mb,accounting for 2.59% of the chromosome assembly(Table 1).Of these,1877 genes were annotated,representing 70.94% of the total number of predicted genes(Table S2).For non-coding RNAs,861 with a total length of 109,391 bp were predicted,accounting for 0.153%of the assembly(Table S3).A total of 56.44 Mb repeat sequences were detected in the assembly of EaC1 using different prediction methods,accounting for 79.58% of the assembly(Table S4).Long terminal repeat(LTR)retrotransposons,DNA transposons,long interspersed repetitive elements(LINE),and short interspersed repetitive elements(SINE)occupied respectively 36.26,9.36,4.54,and 0.065 Mb in the EaC1 assembly,representing 51.11%,13.20%,6.40%,and 0.09%,of the chromosome(Table S5).
All 2646 predicted gene sequences from EaC1 scaffolds were used to identify syntenic regions in Sorghum bicolor and S.spontaneum genomes.Total 2666 and 5542 genes were annotated from the EaC1 scaffolds in Sorghum bicolor and S.spontaneum.After filtering,2161 out of 2646 identified homologous genes(81.67%)were shared between Sorghum and S.spontaneum,suggesting the presence of collinear regions among E.arundinaceus,Sorghum,and S.spontaneum.The 2161 homologous genes of EaC1 were plotted according to their positions on the chromosomes of their respective species,and clear syntenic regions between Sorghum and S.spontaneum were observed(Fig.5).The EaC1 syntenic regions in Sorghum and S.spontaneum were distributed on chromosome 1.There were syntenic regions on other chromosomes of the two species,but the gene density was low.
As expected,E.arundinaceus showed the closest evolutionary relationship with S.spontaneum and Saccharum hybrids.Sorghum bicolor branched in the second tier,marking it more evolutionary distance(Fig.6).The most significant evolutionary distance was found between E.arundinaceus,O.sativa,and Zea mays(Fig.6).
Alien introgression breeding is an attractive approach to improving crops by introducing desired genes from wild relatives[68].Sugarcane would benefit significantly from the introgression of genes from its wild relatives.One of them is E.arundinaceus,a potential donor of genes for disease resistance,drought resistance,and high biomass[6].Unfortunately,knowledge of its genome remains limited[23].Examination and sequencing of complex plant genomes can be simplified by dissecting the genomes into individual chromosomes using flow cytometric sorting[69].Chromosome sorting technology has been developed for 29 species,including sugarcane[25,31].Metcalfe et al.[31]demonstrated the feasibility of dissecting the sugarcane genome into small parts,though not single chromosomes.Flow cytometric classification of chromosomes according to DNA content alone resulted in flow karyotypes having composite peaks representing more than one chromosome of similar sizes.This problem was encountered for several species[29,70].One approach to overcoming this problem was labeling DNA repeats on chromosomes prior to flow cytometry using fluorescence in situ hybridization in suspension[35,36].
The result presented here represents the first demonstration that DNA of E.arundinaceus chromosomes in suspension can be fluorescently labeled using a genomic probe and that GISHIS permits purification of alien introgressed chromosomes by flow cytometry.As sugarcane chromosome introgression lines carrying chromosomes of E.arundinaceus were available[13],GISH in suspension was used to discriminate the introgressed chromosomes in analogy with FISH in suspension.If the introgression lines carry different alien chromosomes,this approach should make it possible to sequence single chromosomes of E.arundinaceus.The same strategy could be used to sequence other E.arundinaceus chromosomes introgressed into sugarcane.This method should generally be applicable for any species and,in principle,could also facilitate the sorting of translocated chromosomes if the translocated segment of an alien chromosome is sufficiently large to allow detection.
The short reads from the sorted chromosome 1(EaC1)were sequenced and mapped to Sorghum chromosome 1.EaC1 syntenic regions were distributed on chromosome 1 from Sorghum and S.spontaneum.The long-range organization of chromosomes was conserved relative to other species,and no exchange or translocation occurred during evolution[21].We inferred that EaC1 was relatively conserved based on its high collinearity with S.spontaneum and Sorghum.
The phylogenetic relationship of E.arundinaceus confirmed its evolutionary distance from O.sativa and Zea mays and that it was close to Sorghum and closest to Saccharum hybrids and S.spontaneum.Thus,although E.arundinaceus,Sorghum,Saccharum hybrids,and S.spontaneum had a common ancestor,their divergence times were different.Tsuruta et al.[71]showed that Sorghum was more closely related to S.hybrids than E.arundinaceus and estimated that E.arundinaceus diverged from the subtribe Sorghinae before the divergence of Sorghum bicolor.The difference between our results and those of Tsuruta et al.[71]could be due to the analysis of different genotypes,and its explanation invites further study.
Considering that the genome size of hexaploid E.arundinaceus is~3.57 Gb[24],the expected average size of one chromosome should be~119 Mb,a value greater than the PacBio assembly length of 70.93 Mb.The PacBio assembly comprised 3696 contigs with an N50 of 19,624 bp.The assembly quality was probably affected by sequencing amplified chromosomal DNA[28,72-74].To our knowledge,our first report of PacBio sequencing of chromosomal DNA and the assembly is better than those of chromosome assemblies obtained after sequencing using the Illumina platform[28,74].We expect that this high-quality assembly of E.arundinaceus will facilitate gene cloning,the development of DNA markers,and the development of chromosome typing markers in this species.
Table 1Predictive gene information statistics for EaC1.
Fig.5.Syntenic regions for E.arundinaceus chromosome 1 annotated genes in the Sorghum bicolor and S.spontaneum genome.Blue-green chromosomes indicate S.spontaneum chromosomes.Gray chromosomes indicate Sorghum bicolor chromosomes.
Fig.6.Diagram of evolutionary relationships among six species.
Our analysis predicted 56.44 Mb repeat sequences on EaC1,accounting for 79.58% of its sequence,of which 51.11% were LTR repeats,13.20% were DNA TEs,6.40% were LINEs,and 0.09% were SINES.The proportion of repeat sequences is consistent with various DNA repeats in the Sorghum,Zea mays,O.sativa,and S.spontaneum genomes[21,75-77].Total 2646 encoding genes in EaC1 were predicted,with a total length of 1.84 Mb accounting for 2.59%of the chromosome.The functions of 1877(70.94%)of these genes were annotated.The low number of coding genes in EaC1 might be due to the poor assembly of regions around the centromeres.Zhang et al.[21]identified 35,525 putative coding genes in S.spontaneum,and the genome of diploid Sorghum bicolor contains 34,129 genes[78].
To conclude,this new approach allowed purification of a single E.arundinaceus chromosome by flow-cytometric sorting,and the DNA sequence of this chromosome was determined.Genes on the chromosome were identified and annotated from the assembled sequences,and synteny with related species was established.This study provides initial insights into the characteristics of EaC1,and the results will support molecular improvement and alien introgression breeding of sugarcane.
Data availability
The EaC1 genome assembly has been deposited in the NCBI database under BioProject number PRJNA715955 and BioSample number SAMN18388831.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Shan Yang:Visualization,Writing-original draft.Petr Cápal:Visualization,Data curation,Writing-review & editing.Jaroslav Dolezˇel:Funding acquisition,Project administration,Writingreview& editing.Xueting Li:Methodology,Software.Wang Qian:Investigation.Zhiqiang Wang:Methodology.Kai Zeng:Resources.Peiting Li:Visualization,Data curation.Hongkai Zhou:Formal analysis.Rui Xia:Funding acquisition.Muqing Zhang:Methodology,Resources,Supervision,Funding acquisition,Writing-review&editing.Zuhu Deng:Conceptualization,Funding acquisition,Project administration,Writing-review & editing.
Acknowledgments
This work was funded by the National Natural Science Foundation of China(31771863),Science and Technology Major Project of the Fujian Province of China(2015NZ0002-2),Special Fund for Scientific and Technological Innovation of the Fujian Agriculture and Forestry University(KFA17168A),Doctoral Students of Fujian Agriculture and Forestry University Going Abroad to Cooperative Research(324-112110082),Key Laboratory of Conservation and Utilization of Subtropical Agricultural Biological Resources(SKLCUSA-a201912).Jaroslav Dolezˇel and Petr Cápal were supported by ERDF Project‘‘Plants as a tool for sustainable global development”(CZ.02.1.01/0.0/0.0/16_019/0000827).None of the funding institutions played a role in the study design,data collection,analysis,or manuscript writing.We thank the Key Laboratory of Conservation and Utilization of Subtropical Agricultural Biological Resources for providing the flow cytometer used in this study.We thank Annoroad Gene Technology company(Beijing,China)for Illumina and PacBio sequencing.
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2022.02.001.