Keywords

10.1 Introduction

10.1.1 Economic Importance of the Crop

Tobacco (Nicotiana spp.), also termed as “golden leaf”, is an important commercial crop in the world and is cultivated in more than 120 countries (FAO 2019). China, India, Brazil, United Republic of Tanzania, Indonesia, Zimbabwe, Malawi, USA, Zambia, Mozambique, Turkey, Democratic People's Republic of Korea, Bangladesh, Argentina and Pakistan, are some of the major countries growing tobacco. It is grown on less than one percent of the world’s agricultural land, and on a wide variety of soils and climate. Currently, tobacco is being grown in an area of 3.62 million hectares with the global tobacco production of 6.69 million tons (FAO 2019). China is the largest tobacco producer in the world with 2.61 million tons/year and India comes second with 0.8 million tons of yearly production. China and India together produce more than 50% of World's total. Dried tobacco leaves are mainly used for smoking in the form of cigarettes, cigars, pipe tobacco, and flavored shisha tobacco. Leaf is also consumed as snuff, chewing tobacco, dipping tobacco and snus. It is generating huge revenue to the National governments in addition to providing employment to billions of people worldwide.

10.1.2 Reduction in Yield and Quality due to Biotic Stresses

Diseases and pests infect tobacco from seedling until leaf harvest (and even during the curing process) there-by affecting yield and quality of leaf which is the economic product. However, the extent of losses due to pests and diseases varies from year to year and location to location depending upon the weather and percentage of pest infection (Seebold et al. 2007).

According to a rough estimate, the world annual loss due to tobacco diseases is in excess of 5.7 hundred million dollars (Hossain 2020). Loss in yield due to leaf blight and black shank varies from 2 to 10% annually depending on weather conditions. Frog eye disease infection lowers nicotine (46.7%) and reducing sugars (24.3%) contents in tobacco leaves and affect leaf quality (Patel et al. 2001). In severe cases frog eye spots coalesce to become bigger spots leading to drying of leaves (Lucas 1975). Losses to the crop caused by hallow stack range from 5 to 30% (Roy et al. 2012). Tobacco brown spot caused by Alternaria fungal species is one of the most damaging diseases, and results in significant yield losses (Luo et al. 2009a; Mo et al. 2012). Alternaria produces a variety of toxic secondary metabolites that damage plant tissues resulting in necrotic lesions and premature leaf aging there-by affecting leaf quality (Melton and Shew 2000). Tobacco blue mold is one of the major foliar diseases in the United States and Canada that causes an annual loss of more than $200 million (Schiltz 1981; Heist et al. 2002). Annual losses exceeding $200 million recorded in North America due to blue mold epidemics (Nesmith 1984; Heist et al. 2002). Blue mold caused severe losses in Cuba between 1978 and 1980 (Pérez et al. 2003). Fusarium wilt adversely affects the tobacco crop in view of its seed transmissibility and its ability to survive in soil without a host plant for years (LaMondia 2015). Wilt in addition to causing severe disease, it may render heavily infested fields not suitable from tobacco production. The losses due to bacterial or granville wilt exceed up to 7% of the crop in South Carolina in 1998 (Fortnum and Martin 1998).

Tobacco Mosaic Virus (TMV) was estimated to cause a loss of one million dollars each year between 1960 and 1965 in North Carolina’s flue-cured tobacco (Gooding 1969). Damage caused by TMV infection depends on the stage of infection and genetic structure of the variety and affects both yield and quality. TMV infection of younger plants results in a greater yield loss than infection of older plants. Early infection can result in up to 60% reduction in crop value (Valleau and Johnson 1927). When plants were inoculated with virus at transplanting, 20–31% (Wolf and Moss 1933) loss in yield and 24% in value was reported while late infections result in reduced yield losses of 13–17% at topping (Wolf and Moss 1933; Johnson et al. 1983; Hossain 2020). In severe leaf curl infection, yield losses vary from 60 to 70% as infected leaves do not cure well. While in mild infections, the yield loss is very marginal (Valand and Muniyappa 1992).

Plant parasitic nematodes cause severe damage to tobacco crop. Worldwide loss in tobacco yield due to nematodes was estimated to the tune of 14.7% (Sasser and Freckman 1987). Estimation of losses in production of transplants and cured leaf yield of bidi tobacco in India due to root-knot nematodes reported to be around 50% (Markose and Patel 1977; Shah et al. 1983; Patel et al. 1986). The symptoms may get aggravated under drought conditions as nematodes directly impact the plants’ ability to uptake water and nutrients along with predisposition to secondary pathogens (i.e. Fusarium wilt).

Crop yield losses due to root parasite, Orobanche spp. reported to vary from 20 to 75% depending on the time of infection and the availability of soil moisture (CABI 2021a). Annual yield decline due to Orobanche is estimated about $1.3 to $2.6 billion in Middle East (Aly 2007).

Severe incidence of ground beetles reduces crop stand up to 50–60% (Sitaramaiah et al. 1999) necessitating replanting which not only adds to the cost of cultivation, but reduce the yield and quality of tobacco due to variation in crop growth. The tobacco caterpillar (Spodoptera) incidence causes up to 80–100% loss of transplantable seedlings in the nursery (ICAR-CTRI 2021a) and 33–71% yield loss in planted crop of tobacco (Sitaramaiah et al. 1994). The white fly infestation causes stunted plant growth leading to considerable yield reduction. Tobacco aphid causes not only loss of cured leaf and bright leaf yields (125 kg and 70 kg/ha, respectively) but also leads to deterioration of leaf quality due to development of sooty mold (Sitaramaiah et al. 1994). Further, aphid and white fly transmit viral diseases causing additional yield losses. In FCV tobacco, early incidence of capsule borer, recorded up to 2891 and 426 kg/ha loss in green leaf and cured leaf yields, respectively (Sreedhar et al. 2005). The average estimated seed loss due to capsule borer was around 89 kg/ha in chewing tobacco (Chari et al. 1983). In the years of severe incidence of capsule borer, seed production may be severely affected resulting in huge losses.

Thus, based on the percent infestation and measures taken for control in field condition, losses in tobacco due to biotic stresses may range from minimum to 100% in nursery and up 80% in planted crop. These stresses not only affect yield, but also cause reduction in leaf quality and hence, require suitable control measures to contain the losses.

10.1.3 Growing Importance in the Face of Climate Change and Increasing Population

Climate change is affecting every country on every continent. It is disrupting national economies and affecting lives. Weather patterns are altering, sea levels are rising, and weather events are becoming extreme. Year 2019 was the second warmest year and the end of the warmest decade (2010–2019) ever recorded. Carbon dioxide (CO2) levels and other greenhouse gases in the atmosphere rose to new records in 2019 (United Nations 2019).

Climate change is likely to alter the balance between pests, their natural enemies and their hosts. Climate change is an important factor driving the spread of pests and diseases. It affect the population size, survival rate and geographical distribution of pests as well as the intensity, development and geographical distribution of diseases (Doody 2020). Temperature and rainfall are the big drivers of shifts in how and where pests and diseases spread. Since 1960, crop pests and diseases have been found to move at an average of 3 km a year in the direction of the earth’s north and south poles as temperatures increase (Bebber et al. 2013). Higher temperatures and precipitation levels can slow the growth and reproduction of some pest species and destroy their eggs and larvae by washing off from the host plant. Climate change also impacts the ecology and biology of insect pest. Increase in temperature would cause migration of insect species towards higher latitudes. Studies using computer prediction models on crop yield losses show that the crop yield loss worldwide is likely to increase by 10–25% due to global warming.

Insect pests of crop plants are highly affected by global climate change. The changing climate could result in insect outbreaks, migration, change in biodiversity, species extinction, change in host shift, and emergence of new pests or biotypes (Kumar and Singh 2016). Insect pest and disease problems in tobacco have shown a shift in the recent past due to climate, ecosystem, and technological changes. There has been an overall decline of budworm and a rise in incidence of tobacco caterpillar, aphid, white fly, stem borer, ground beetle, Cucumber Mosaic Virus (CMV) and Orobanche (Sreedhar 2016). The damage due to certain pests like ground beetles is usually noticed more in drought years and during prolonged hot spells immediately after planting of tobacco (Sitaramaiah et al. 1999). Climate change also impacts disease incidence and their severities. In case of higher latent infection, the entire tobacco crop can be wiped out due to hallow stack in the event of increased rainfall and water logging (ICAR-CTRI 2005, 2006). Tobacco plants aged 35–60 days were found to be highly susceptible to bacterial wilt infection in the field following high temperature (25–30 °C) and rainfall. Thus, climate change is likely to change the pests and disease composition and their severities in tobacco, requiring preparedness to contain the possible crop losses.

10.1.4 Limitations of Traditional Breeding and Rational of Genome Designing

One of the strategies for effectively managing the menace of biotic stresses in tobacco is through host plant resistance. Conventional breeding played a significant role in developing tobacco genotypes for higher yields and biotic stress resistance in the past and continues to be important for tobacco improvement. However, it has its limitations in terms of availability of sources of resistance and other desirable traits, narrow genetic variability, barriers for natural crossing among existing species, undesirable associations between the presence of a resistant gene and yield and quality contributing characters either due to pleiotropic effects of the resistance gene per se, or due to linkage drag effects caused by the presence of deleterious genes linked to gene of interest (Chaplin et al. 1966; Chaplin and Mann 1978; Legg et al. 1981; Zeven et al. 1983; Friebe et al. 1996; Brown 2002). Also, suppressed recombination within introgressed chromatin (Paterson et al. 1990; Liharska et al. 1996) can make it difficult to alleviate linkage drag effects through back crossing (Stam and Zeven 1981; Young and Tanksely 1989). This can even complicate the efforts to discriminate between pleiotropic and linkage drag effects (Purrington 2000; Brown 2002). Other limitations in conventional breeding are the time required (relatively longer) to combine different genes and laborious process of screening/phenotyping segregating generations in developing a variety. Traditional breeding depends on phenotypic screening for biotic stress resistance. Phenotypic screening for biotic stress resistance under natural conditions is not reliable as the stress incidence depend on various environmental factors which are highly variable under field conditions. Screening under artificial conditions is dependent on crop stage and creation of favorable conditions for infection development. Therefore, screening at early crop stages may not be possible and hence, large undesirable plants have to be handled up to full crop stage before rejection. Such screening related issues are acting as limitations in achieving progress in resistance breeding.

The recent technological advancements makes it possible to design plant genomes with desirable phenotypes with high yield by accumulating favorable alleles and eliminating deleterious and undesirable alleles. Genome manipulation strategies alleviate the limitation faced in conventional breeding as they depend on precise modification or editing of genome regions responsible for a phenotypic trait. The complete knowledge about target trait in terms of genomic areas and genes responsible, gene expression patterns and regulation, linked markers/quantitative trait loci (QTLs) etc. aids in targeted manipulation of desirable traits. Such modifications overcome the linkage drag effects and undesirable gene associations and reduce the time required for resistance gene transfer. Molecular marker-assisted breeding for fore ground and background selection helps in saving time required in classical breeding and assists in selection of characters in precise manner in segregating populations from early stages itself without artificial inoculations and environmental influence. The current advancements in gene editing technologies are making it possible to modify and edit target genes to yield desirable phenotypes avoiding nontarget effects observed when mutation breeding employed. In genome-assisted breeding (GAB), selection of genotypes based on their breeding values that are estimated considering the all available information on markers will pave the way for the development of genotypes with all the favorable alleles for achieving maximum attainable yields along with desirable quality and resistance to biotic stresses. The time taken for developing genotypes with desirable genes can be reduced through precise manipulation of genomes without requiring generations of selfing to achieve homozygosity. Transgenic and cisgenic transfer of desirable genes from unrelated and distantly related species will enlarge the sources of resistance and other desirable traits. Genome edited crops may be considered as non-GMO (Genetically Modified Organisms) plants and hence, may avoid GMO regulations. Thus, genome designing offers greater scope for accelerated varietal development and overcomes the limitations of conventional breeding.

10.2 Description on Different Biotic Stresses

10.2.1 Taxonomy of Diseases and Insects Infecting Tobacco

Several species of insect pests, diseases and root parasite pose serious threat to the tobacco crop adversely affecting the leaf yield and quality. Some pests also transmit virus pathogens that cause viral diseases (Lucas 1975; Gopalachari 1984; Doroszewska et al. 2013).

10.2.1.1 Insect Pests Infesting Tobacco Crop

Number of insects attack tobacco and cause damage to various degrees. The taxonomic details of insect pests that infest tobacco crop in majority of its growing areas are given at Table 10.1. Insects belonging to various families of Lepidoptera, Homoptera, Coleoptera and Hemiptera orders found to attack tobacco crop. Among them, tobacco caterpillar, Spodoptera litura (nursery and main field), stem borer, Scrobipalpa heliopa (nursery and main field), whitefly, Bemisia tabaci (nursery and main field) vector of Leaf Curl Virus disease, tobacco aphid, Myzus persicae nicotianae, bud worm and seed capsule borer, Helicoverpa armigera (main field) are the major pests, whereas ground beetles, Mesomorphus villiger and Spodoptera exigua are minor and sporadic in nature. Others pets cause insignificant damage to the crop.

Table 10.1 Taxonomic details of insect pests infesting tobacco crop

10.2.1.2 Microorganisms that Cause Diseases in Tobacco

Number of fungal, bacterial and viral pathogens infect tobacco causing economic losses (Lucas 1975; Gopalachari 1984; Shew and Lucas 1991)). The taxonomic details of various organisms causing disease in tobacco areas detailed at Table 10.2. Major fungal diseases are damping off, leaf blight and black shank, brown spot and blue mold; Bacterial diseases are leaf spots, wilt and hallow stack; viral diseases are TMV, leaf curl, CMV etc. Mixed infections of viruses are quite frequently observed in tobacco (Blancard et al. 1999).

Table 10.2 Taxonomic details of microorganisms that cause diseases in tobacco

10.2.1.3 Nematodes Infecting Tobacco

Two types of nematodes mostly affect tobacco causing malformation on tobacco root. They are root-knot nematodes (Meloidogyne spp.) and cyst or gall nematodes (Globodera spp.). Both the nematodes belongs to Kingdom Animalia, Phylum Nematoda, Class Secernentea, Subclasses Tylenchia and family Heteroderidae. Out of several Meloidogyne spp., M. incognita (Kofoid and White) and M. javanica (Treub) are the widespread and the most damaging ones on tobacco Two other species, M. arenaria and M. hapla (Chitwood 1949) have limited distribution and cause less damage. Several subspecies of Globodera viz. G. tabacumG. tabacum solanacearumG. solanacearum; G. tabacum virginiaeG. virginianae etc. have been reported on tobacco in the world (CABI 2021b). Other nematodes that cause damage in few areas are reniform (Rotylenchulus reniformis) and stunt nematodes (Tylenchorhynchus vulgaris) (Bairwa and Patel 2016).

10.2.1.4 Broomrape (Orobanche spp.)

Orobanche generally known as broomrape is a root parasites on tobacco and belongs to Orobanchaceae family (Gevezova et al. 2012). This genus is divided into four sections: Gymnocaulis Nutt., Myzorrhiza (Phil.) Beck, Trionychon Wallr., and Orobanche (syn. Osproleon Wallr.) (Greuter et al. 2000). The most important species (from an agronomic perspective) are found in the sections Trionychon and Orobanche. Section Trionychon includes O. ramosa L. and O. aegyptiaca (Paran et al. 1997). O. cernua, O. ramosa and O. aegyptiaca found to cause damage in tobacco.

10.2.2 Races, Isolates, Biotypes

Race/isolates/biotypes that are distinguishable based on host differentials exist in causal organisms of few biotic stresses of tobacco. B. tabaci is documented as a complex of cryptic species with two most important biotypes, MEAM1 (Middle East-Asia Minor 1; biotype B) and MED (Mediterranean; biotype Q) (Yao et al. 2017). Two races, race 0 and race 1 found to exist in Phytophthora parasitica f. sp. Nicotianae (Woodend and Mudzengerere 1992). Likewise, two races viz. race 0 and race 1 are reported in Pseudomonas syringae pv. tabaci causing wildfire. Two isolates, PVYNTN and PVYNW in the PVYN strain group that produce necrotic symptoms on “VaVa” plants (BURLEY 21, K 326, NC 95) found to infect tobacco (Verrier and Doroszewska 2018). CMV constitutes two subgroups, I and II based on severity of symptoms and virulence (Blume et al. 2017). TMV has mutated into many strains and strains such as TMV-O, TMV-C and TMV-N can infect most members of Solanaceae (Holmes 1946).

10.2.3 Stages and Extent of Damage

Biotic stresses infest and cause damage to tobacco at different stages of crop growth from seedling stage in the nursery to seed collection stage in the main field. The damage caused by some major insect pests, diseases, nematodes and Orobanche in tobacco are discussed here.

10.2.3.1 Insect Pests

Tobacco caterpillar (S. litura): The tobacco caterpillar is one of the most destructive polyphagous pests worldwide (Xue et al. 2010). The young caterpillars feed on the leaf tissues in both nursery and main field. Larvae feed voraciously leaving only veins and petioles and also cut the stems of small and tender seedlings, hence, known as cut worms. In severe cases, larva feed the entire lamina leaving only veins and petioles leading to leaf skeletonisation and heavy defoliation.

Tobacco hornworm (M. sexta): The tobacco hornworm is a common pest of plants in the family Solanaceae, that includes tobacco, eggplant, tomato, pepper, various ornamentals and weeds (del Campo and Renwick 1999). They are voracious feeders and may completely defoliate plants if not controlled. M. sexta has a large distribution throughout the New World, occurring as far south as Chile.

Stem borer (S. heliopa): The larva bore inside the stem and midribs in nursery as well as in the transplanted crop and feed on internal tissues. As a result, swelling appears where the borer stays. Borer infested seedlings when planted in the field remain stunted and sometimes unusual branching of the plant is seen.

Whitefly (B. tabaci): Whitefly is a complex cryptic species and destructive insect pest, reported to attack and damage about 600 plant species (Nombela and Muniz 2010). Whiteflies are small fly like insects seen on the underside of leaves. Both adults and nymphs suck the sap from the leaves and transmit the tobacco leaf curl virus disease to the healthy seedlings/plants. The virus infected plants are stunted and twisted; leaves are puckered and thickened with abnormally prominent veins. “B” biotype was found to infest tobacco and transmit TLCV.

Ground beetles (M. villiger): These beetles damage newly transplanted tobacco plants by gnawing/cutting the tender stem, resulting in death of the seedlings causing up to 50–60% gaps in the field (Sitaramaiah et al. 1999).

Tobacco aphid (M. persicae nicotianae): In case of heavy infestation, hundreds of aphid can be seen on the underside of the leaf. By constantly sucking the sap from the leaf they debilitate the plant and there-by retard the growth. They secrete sugary juice known as ‘honey dew’ on the leaf due to which sooty mold develops rendering the leaf unfit for curing. In addition, they also transmit virus diseases eg. CMV, Rosette or Bushy Top Virus etc.

Tobacco budworm (H. virescens): It is principally a field crop pest, attacking such crops as tobacco, alfalfa, clover, cotton and soybean. The budworm larvae make holes in shoots and flower buds. Sometimes larve can be found on the growing tips, the leaf petioles and the stems. In the absence of reproductive tissue, the larvae would feed on leaf material.

Budworm/Capsule borer (H. armigera): It is a polyphagous pest. During the vegetative phase (30–50 days), it feeds on the terminal bud and surrounding young leaves causing loss. Generally, one larva is seen on terminal buds in earlier stages and during flowering phase more than one borer per plant is seen. During flowering and capsule formation stage, larvae feed on flower buds and bore the capsules to feed on the developing seeds.

10.2.3.2 Fungal Diseases

Damping off (Pythium spp.): Damping off is one of the most important diseases of tobacco nurseries and is responsible for poor stand of seedlings or complete loss of nursery beds. It is caused by several soil inhabiting fungi predominant being P. aphanidermatum (Edson) Fitz; P. debatyanum Hesse, Phytophthora spp. and sometimes Rhizoctonia solani are also involved. The disease attacks the root or stem region near the soil surface. The disease may appear at any stage of the seedlings but maximum damage is observed 5–6 weeks after sowing causing rotting of the tiny seedlings. Older seedlings show shriveling and dark brown discoloration of stem at the base and ultimately collapse and topple over. The wet rotting and collapse of seedlings start in circular patches and may extend to the entire bed, if unchecked.

Leaf blight and Black shank (P. parasitica f. sp. nicotianae): Disease occur both in nursery and field crop. Young tiny seedlings in the nursery rot and die suddenly. Seedlings show blackening of roots and stem at ground level. Under continuous wet weather conditions, large circular to irregular water-soaked patches appear on the leaf surface causing leaf blight. Symptoms of black-shank on the transplanted tobacco are seen in the form of blackening of roots and stalk. Blackening of the stalk starts at the base near the soil gradually extending upwards up to 30 cm or more above the ground level. The leaves turn yellow and the whole plants wilt and die.

Brown spot (A. alternate): It is a disease of senescence. The symptoms on older leaves appear as small water- soaked lesions which enlarge quickly. Once the spots enlarge, the center of the spots die and become brown, leaving a clear demarcation between diseased and healthy tissue. Circular brown spot lesions with concentric rings appear on lower leaves. In severe infection spots enlarge, coalesce and damage large areas making leaf dark brown, aged and worthless.

Blue mold of tobacco (P. hyoscyamif. sp. tabacina): Blue mold is one of the most important foliar diseases of tobacco that causes significant losses in the America, south‐eastern Europe and the Middle East. It is highly destructive to tobacco seed beds, transplants and production fields. Single or groups of yellow lesions appear on the older and shaded leaves. The spots, often, grow together to form light brown necrotic areas. Leaves become puckered, distorted, large portions disintegrated and may lead to fall apart of the entire leaf. Blue mold can destroy all leaves at any growth stage in the event of continuous favorable weather conditions. Lesions may also appear on buds, flowers, and capsules.

Fusarium wilt (F. oxysporum f. sp. nicotianae): Fusarium infection causes chlorosis, wilting and necrosis of tobacco leaves that leads to stunted growth and death. The symptoms often appear vertically on one side of the plant or even one side of the leaf midvein. Diagnostic chocolate-brown discoloration of the vascular tissue develops up to top of the plant. In due course of time, the discoloration on the exterior of the green stalk becomes visible.

Frog eye leaf spot (C. nicotianae): Generally, this disease is seen 4–5 weeks after germination and 30 days after transplanting and on the harvested crop. Round shaped brown spots akin to frog-eye form appear on the lower leaves of the seedlings. Spots appear initially on lower leaves and spread gradually to upper leaves. Under hot, dry weather frog eye lesions may be pin point in size and would not be recognized.

Powdery mildew (E. cichoracearum var. nicotianae): Powdery mildew, also known as white mold or ash disease, occurs in all types of tobacco in several Asian countries, Oceania, the Mediterranean, Africa and Canada. The disease causes severe damage in flue-cured tobacco compared to other types of tobacco. Under favorable conditions of low temperatures (16–23 °C) and high humidity, white patches spread to upper leaves, enlarge and cover the entire surface of the leaf. Mildew affected leaves get scorched on curing and show brown patches or blemishes rendering them unfit for marketing or reduce their commercial value.

Black root rot (T. basicola): Infected roots appear dark brown or black due to the presence of large numbers of black spores and rotting of root tissues greatly reduces the number of roots. Large scars may be seen on the main tap root. The above-ground symptoms show temporary nutrient deficiency symptoms, stunting and irregular growth of the tobacco.

10.2.3.3 Bacterial Diseases

Angular leaf spot (P. syringae pv. angulate): Pseudomonas syringae pv. angulate causes angular dark brown to black colored spots surrounded by yellow halo. Lesions are restricted between veins and leaves appear puckered and tears easily. It was regarded as a mutant of Pseudomonas syringae pv. tabaci, which does not produce tobtoxinine (Braun 1955). P. syringae pv. tabaci causes a severe type of angular leaf spot well known as wild fire in Tamil Nadu, India (Gnanamanickam et al. 1977).

Bacterial leaf spot (D. dadantii): It affect bidi and late planted FCV tobacco in Karnataka, India. Plant develops circular water soaked yellow lesions with a minute brown centre, which expand with a translucent border and wide chlorotic halo (Wolf and Foster 1917). Causes vascular discoloration of the stems, wilting and stunting (Johnson 1923; Komatsu et al. 2002).

Bacterial wilt (R. solanacearum): Also known as ‘Granville wilt’, the disease affects both nursery and field crop. In the field, the first symptom of the disease is drooping of 1–2 leaves during day and their recover during evening. One half of the affected leaves become flaccid, a characteristic symptom of bacterial wilt of tobacco. On slow progression of the disease, the affected leaves turn light green and may gradually turn yellow, midribs and veins get flaccid and large leaves may droop in an umbrella like fashion.

Hollow stalk (E. carotovora sub sp. carotovora): Hollow stalk has been reported from U.S.A, Canada, India and China (Roy et al. 2008). The disease may appear at any time due to stem injury but it is commonly observed 35–40 days after topping operations. Pith undergoes rapid browning and hollowing due to soft rot and tissue collapse eventually. Initially, the top leaves wilt and the infection slowly spreads downward. Black leg phase of the disease is characterized by the formation of black stripes or bands girdling the stalk and cured leaves.

10.2.3.4 Viral Diseases

Tobacco Mosaic Virus (TMV): TMV is worldwide in distribution and reduces cured leaf yield, quality and price. It is sap transmissible and spreads mainly through mechanical means. Characteristic symptoms include irregular mosaic pattern of dark and light green areas on leaf, leaf malformation and stunted plant growth. Infected young leaves are often malformed and show puckering or wrinkling. “Mosaic burn” with large, irregular, burned or necrotic areas appear on affected mature leaves causing extensive loss to the crop.

Potato Virus Y (PVY): This virus is aphid transmitted and affects tobacco crop worldwide. The virus is transmitted by aphids. This virus causes tobacco veinal necrosis or vein banding i.e. appearance of dark green bands along the brown necrotic veins. Necrosis extends to the vascular region and plants die out of pith necrosis.

Tobacco Leaf Curl Virus (TLCV): Caused by a virus of the genus begomovirus belonging to Geminiviridae. TLCV is transmitted by the insect vector, whitefly, B. tabaci. Diseased leaves become brittle, puckered, exhibit downward curling of margins with enations or leafy outgrowths on the under surface of leaves. Leaves show vein clearing, abnormal vein thickening and twisting of petioles. Internodes get shortened resulting in dwarfing of the plant. Late infected plants show mild symptoms with small sized inwardly curled top leaves.

Cucumber Mosaic Virus (CMV): CMV enters into the leaf through wounds, principally those made by aphid. Infected plants show typical mottling and mosaic patterns, narrowing and distortion of the leaves, sometimes accompanied by plant stunting. Under severe infection, vein banding, inter veinal yellowing are observed accompanied with leaf blisters, shriveled, chlorotic and necrotic lines causing filiform leaves. Mild CMV strains would cause a faint mottling of the leaves.

Tobacco Distorting Virus: This virus causes abnormal suckering, stunting of the plant. Leaves get mottled, puckered, distorted with rat-tails. Only long midrib is seen without lamina at times.

Tobacco Etch Virus: It is transmitted by sap and also by aphids in a non-persistent manner. It infests solanaceous plants, and causes rat-tailing. On lower leaves vein clearing is seen along with some necrotic lines or etching. Mottling is seen with chlorotic and necrotic spots.

10.2.3.5 Nematode Diseases and Broomrape

Meloidogyne spp.: In most countries Meloidogyne spp. (root-knot nematodes) are considered as major factors limiting the production of tobacco crops. The nematode infection causes root galls and plant stunting. As nematodes damage plants at root level, above ground symptoms may not be prominent. At moderate to high infection, plants may appear wilted, yellowed and stunted.

Globodera spp.: Globodera presence on the roots is evident by the numerous cysts scattered over young or old roots with resultant reduced growth. The cysts of the species vary in sizes. Cyst nematodes cause lesser damage on tobacco compared to Meloidogyne spp.

Broomrape (Orobanche spp.): Broomrape affects the growth, development and morphogenesis of tobacco in the main field (Krishnamurthy 1994).

10.2.4 Control Methods

10.2.4.1 Cultural Methods of Control

Selection of disease free nursery site, deep summer ploughing, raising the seed beds, rabbing seed beds, use of optimum seed rate and regulation of watering to avoid dampness etc. controls fungal diseases in the nursery.

Deep summer ploughing, Soil fumigation, crop rotation, intercropping, companion/boarder crops (maize, sorghum, Tagetes etc.), trap cropping (castor as ovipositional trap for insects; sesame, jowar, black gram, green gram etc. for Orobanche), application of fermented farm yard manure (avoid Orobanche seed) etc. have been highly effective in the management of insect, nematode, viral diseases and Orobanche weed (Chari et al.1999; Sreedhar et al. 2007).

The phyto-sanitary measures that are adopted for controlling insect and diseases in tobacco include adopting right time of planting, transplanting of healthy seedlings, removal and destruction of diseased/virus infected plants and alternate hosts from the field, use of recommended fertilizer doses (including potash and avoiding excess nitrogen), collecting and destroying the crop debris after harvest etc.

Planning inter-culture operations in infected fields at the end, disinfection of implements before entering healthy fields, washing hands with soap water before and after entering infected fields, prevention of smoking and use of other tobacco products, etc. effectively control viral diseases and Orobanche (Lucas 1975). Physical removal of Orobanche before its flowering and burning reduce Orobanche incidence in the next season (Krishnamurthy 1994).

10.2.4.2 Chemical Methods of Control

Chemicals recommended and their doses for the control of insects and diseases vary from country to country. Guidance Residue Levels (GRLs) have been developed by the CORESTA (Cooperation Centre for Scientific Research Relative to Tobacco) Agro-Chemical Advisory Committee (ACAC) for guiding tobacco growers and those in the tobacco industry interested in Crop Protection Agents (CPAs) application and implementation of Good Agricultural Practice (GAP) in tobacco production (CORESTA 2020). The importing countries buy tobaccos when the pesticide residues in the cured leaf are below established GRLs and Maximum residue level (MRL) fixed by them. In view of the increasing stringency of regulations controlling the registration and use of CPAs resulted in an ever decreasing list of insecticides registered and recommended for use on tobacco (CORESTA 2020). Hence, new selective and low active ingredient (ai) insecticides, bio-pesticides and novel molecules are been evaluated and recommended to cope up with pesticide residue problem (Sreedhar 2020).

10.2.4.3 Biocontrol Methods with Natural Products and Biotic Agents

In view of the increasing concern about pesticide residues in tobacco, biopesticides viz. Bacillus thuringiensis var. kurstaki based biopesticides, Neem seed kernel suspension (NSKS), Nuclear Polyhedrosis Virus (NPV), Fungal pathogens viz. Nomuraea rileyi, Beauveria bassiana,Verticillium lecani, Trichoderma harzianum, T. viride etc. and bacterium, Pseudomonas aeruginosa, VA mycorrhiza Glomus fasciculatum, Azotobacter etc. have emerged as a strong component of integrated pest management strategies in tobacco (Chari et al. 1996; Sreeramulu et al. 1998; Ramaprasad et al. 2000; Dam et al. 2010; Sreedhar et al. 2014).

10.2.4.4 Integrated Management of Pests and Disease (IPM/IDM)

Integration of host plant resistance, cultural, chemical and biological management are always necessary for effectively controlling various diseases, nematodes and Orobanche without environmental pollution and for minimum residues of Crop Protection Agents (CPA), which is an issue of concern in exported tobacco. Bio-intensive IPM modules with genetic, cultural, biological and habitat management techniques as major components along with need based use of selective insecticides were highly effective in reducing pest damage, enhancement of natural enemy activity and helps in increasing the production of residue free tobaccos with favorable economics (Rao et al. 1994; Sitaramaiah et al. 2002; Sreedhar and Subbarao 2014).

10.2.5 Traditional Breeding Methods

Tobacco is a self-pollinated crop with very low natural out-crossing and all the breeding methods that are applicable to self-pollinated crops, such as introduction, mass selection, pure line breeding, pedigree method, back cross breeding, hybridization etc., are being utilized in tobacco (Bowman and Sisson 2000; Sarala et al. 2012). Introduction has played an important role in the initial years till 1960s in introducing tobacco cultivation into European, Australian and Asian countries. Pure line selection, hybridization of selected parents followed by selection in the segregating populations through pedigree, back cross and recurrent selection methods was the most pre-dominant methods adopted in breeding high yielding tobacco varieties worldwide. Back-cross breeding was used for the efficient transfer of a number of resistant factors viz. TMV (fromVamorr-50), Powdery mildew (Kofun), Black shank (Beinhart 1000-1), Caterpillar (DWFC), Fusarium wilt (Speight G33) etc. into tobacco varieties, from other tobacco types and wild species (Sarala et al. 2012). Backcrossing to the common tobacco variety was practiced to eliminate the genetic drag due to undesirable alleles and to recover the plant type and quality characteristics of the adapted variety. Distant hybridization/Interspecific hybridization was used to introgress useful genes like resistance to TMV, wild fire, black shank, brown spot, black root rot, blue mold, aphid, tobacco caterpillar, root knot nematode, powdery mildew, Tomato Spotted Wilt Virus (TSWV), PVY, cyst nematode etc. (Milla et al. 2005; Sarala et al. 2012). Incompatibility barriers in interspecific hybridization were overcome through utilization of hormones, bridge cross technique and in vitro rescue methods (Ramavarma et al. 1980). Mutation breeding has also played an important role in creating variability and developing high yielding improved tobacco varieties and parental materials for use in breeding programs (Sarala et al. 2012).

10.2.6 Use of Morphological Markers

The characters which can be readily detected by phenotype and useful to identify and characterize plants is referred to as morphological marker. In tobacco, these markers are mostly related to variation in plant (shape, habit, height, intermodal length etc.), leaf (number of economic leaves, size, color, shape, margin, tip shape, maturity interval period, etc.) floral characters (time of 50% flowering, size, color, development of stamens, height of pistil relative to stamens, etc.), capsule (shape and size), and seed (shape, color and surface characters) (Sarala et al. 2018). Morphological markers generally require neither sophisticated equipment nor preparatory procedures for scoring. Monogenic or oligogenic morphological traits are generally simple, rapid, and inexpensive to score, even from preserved specimens (Bretting and Widrlechner 1995).

Morphological, karyotypical and physiological characters have been used to study the genetic background of tobacco (Goodspeed 1954; Zhang 1994; Lu 1997). Morphological markers played an important role in breeding improved tobacco varieties till date and continue to be effective in future also. As leaf is the economical product, improvement in number of harvestable leaves and leaf weight is important in realizing higher yields in tobacco along with desirable chemical quality characters. Sarala et al. (2005) observed continuous improvements for plant height, total leaves, harvestable leaves, days to flowering, leaf area, leaf growth rate, specific leaf weight and carotene from old varieties to recently released tobacco varieties. They further suggested that improvement in leaf number, leaf area and specific leaf weight in future cultivars can results in obtaining higher yields.

Conventional plant-breeding approaches in tobacco, as in other crops, rely on morphological markers representing desirable agronomic and product characteristics (that is, phenotypes including resistance to biotic stresses) for the selection of the parents, for creation of variation through their crossing or mutagenesis from the tobacco germplasm and for identification of targeted genotypes in segregating generations. Resistance related morphological characters are only expressed when there is an incidence of pests/diseases and such characterization may not be possible under natural conditions. These morphological traits are limited in number, influenced by the plant growth stages and various environmental factors (Eagles et al. 2001) because of which phenotypic identification could be misleading due to complex genotype and environment interaction that governs the trait of interest. Consequently, development of resistant varieties requires more than twelve years in case of recessive traits. As these markers are generally expressed late into the development of tobacco plant and are highly influenced by the environmental factors or growing conditions, their detection is dependent on the development stage of the crop. Further, they are less polymorphic, and exhibit dominance, pleiotropy and epistasis. In view of these limitations, genotypic markers that can be identified in early stages without environmental influence are highly useful in accelerating the tobacco varietal improvement.

10.2.7 Limitations and Prospect of Genomic Designing

Conventional biotic resistance breeding approaches in tobacco are phenotype-based, time-taking and resource-intensive. Hence, the progress thus obtained through traditional tobacco breeding is slow and is hampered by linkage drag and lack of easily transferrable sources of resistance. The genome designing strategies overcome this as they depend on the knowledge of the genome composition of the plants and gene sequence information. Targeted modification or designing of plant genome including addition of alien genes will accelerate the tobacco varietal developmental process through precise manipulation of gene functions for higher yields and stress resistance. Transfer of resistance from distance wild relatives and other unrelated sources to cultivated tobacco can also be successfully made using genome designing through trans- and cis-genesis approaches involving various processes viz. gene mapping, identification, gene transfer, gene editing etc. Marker-assisted and genome assisted breeding are going to be the order of the day with rapidly evolving technological advancements.

Availability of draft genome of N. tabacum and few wild species and various data sharing and analysis platforms (databases) in recent times, making it possible to understand genes, their sequences and linked molecular markers for target traits with the involvement of innovative bioinformatics tools and comparative genomic studies. The information thus available can effectively utilized to edit the genome sequences with rapidly evolving relatively precise gene editing technologies viz. meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), homing endonucleases, clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein (Cas) 9 etc. However, gene editing technologies yet times do suffer from lower specificity due to their off-targets side effects (Khan 2019). Availability of high density of molecular maps and genome information further helpful in providing the knowledge of linked molecular markers and QTLs that are either closely linked or with in the target gene (s) and also allow map-based cloning of target traits. Linked markers and QTLs identified in tobacco for various biotic stresses is going to pave the way for marker-assisted breeding for resistant traits. The available information can be effectively utilized for estimating the breeding value of individuals in genome assisted breeding and accordingly selections can be made.

However, genetic engineering tools have certain limitations, including time-consuming and complicated protocols, potential tissue damage, DNA incorporation in the host genome, and low transformation efficiency. Unlike tradition breeding strategies, genome designing technology is resource intensive and require technology expertise for handling the processes.

10.3 Genetic Resources of Resistance/Tolerance Genes

Availability of stable and heritable sources of resistance to biotic stresses is essential in any crop while breeding resistant varieties. The gene pool consists of various easily crossable tobacco lines and Nicotiana species are to be explored for available variability and resistance factors in developing tobacco cultivars resistant. In case of non-availability in any of these gene pools it needs to be created through mutations or incorporated through genome designing.

Currently, large number of Nicotiana species and cultivated tobacco varieties are available (Lewis 2011). Around 83 species are available within the genus Nicotiana (Berbec and Doroszewska 2020). Currently there are 92 Nicotiana species and varieties listed at the Taxonomy Browser of National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) and 307 records of Nicotiana species and varieties available in The Plant List database (http://www.theplantlist.org/tpl1.1/search?q=Nicotiana). International Plant Name Index (http://www.ipni.org/ipni/plantnamesearchpage.do) contains 450 records on the keyword ‘Nicotiana’. Large number of these species are reliable sources of resistance to biotic stresses that infest tobacco crop (Lewis 2011). Another important advantage of wild Nicotiana species is their cytoplasmic genomes, which have provided the source of cytoplasmic male sterility (CMS) for developing male-sterile isolines of inbred lines and cultivars. CMS is a prerequisite for technically feasible and economically viable seed production of hybrid cultivars. Various available sources of resistance in tobacco genetic resources are discussed below.

10.3.1 Primary Gene Pool

The primary gene pool consists of tobacco genotypes that can easily be crossed to produce fertile offspring with the cultivated tobacco. They may be cultivated ones and in wild gene pools. The cultivated gene pool covers commercial varieties of the crop, as well as landraces. While the wild gene pool comprises putative ancestors and closely related species that show a fair degree of fertile relationships with the domesticated tobacco. Large number of varieties are developed by breeders in different countries and fairly large collections of germplasm are available in N. tabacum and N. rustica that are resistant to various pests and diseases (Table 10.3). As gene transfers from such sources is easy, the first priority of the breeders is to explore them for the availability of resistance to target biotic stresses and transfer them to cultivated varieties through either traditional method of breeding or genetic engineering.

Table 10.3 Few genetic resources of resistance/tolerance available in the primary gene pool of tobacco

10.3.2 Secondary Gene Pool

The secondary gene pool referred to wild relatives of tobacco that are discrete from the cultivated species and still closely related in such a way that are crossable with cultivated species to at least certain extent to produce few fertile offspring. Genetic resources collected from gene centers as closely related species, primitive cultivars, old land races evolved and adopted to different environments are valuable source for resistance to biotic stresses. Majority of Nicotiana species (58 No.) would hybridize either with N. tabacum or at least with one other sister Nicotiana species. Some (N. mutabilis, N. petunioides, N. attenuata, N. corymbosa, N. linearis, N. burbidgeae, N. thyrsiflora, and N. wigandioides) hybridize with at least one other Nicotiana but not with N. tabacum (Berbec and Doroszewska 2020). N. tabacum found to produce inviable hybrids with N. africana (2n = 46), N. excelsior (2n = 38), N. goodspeedii (2n = 40), N. gossei (2n = 36), N. maritima (2n = 32), N. megalosiphon (2n = 40) and N. velutina (2n = 32) after crossing at 28 °C (Tezuka et al. 2010). However, Type II hybrid lethality with the characteristic symptoms of browning of hypocotyl and roots observed in these crosses was suppressed at elevated temperatures (34 or 36 °C). Utilization of genes from these materials in breeding program is tedious due to incompatibility and undesirable linkages. Modern molecular and biotechnological tools can be an aid in overcoming such difficulties.

10.3.3 Tertiary Gene Pool

This pool made up of even more distantly related crop wild relative species. Fourteen species (N. azambujae, N. acaulis, N. ameghinoi, N. paa, N. cutleri, N. longibracteata, N. spegazzini, N. faucicola, N. fatuhivensis, N. heterantha, N. stenocarpa, N. truncata, N. monoschizocarpa, and N. symonii) had no hybridization records (Berbec and Doroszewska 2020) with N. tabacum. Genes present in both the S and T subgenomes of N. tabacum appear to be responsible for hybrid lethality in crosses with incompatible Nicotiana species (Tezuka and Marubashi 2012). Use of specific breeding techniques, such as bridge crosses, various sorts of treatments of male and female flower parts, and in vitro techniques like ovary/ovule culture and embryo rescue, chromosome and genome manipulation, such as increasing (polyploidization) or decreasing (haploidization) the number of genomes, different methods of partial genome transfer that include chromosome addition and substitution lines, mutagenesis and cell fusion, translocation breeding, exchange of nuclear and cytoplasmic genomes (mitochondrial and/or chloroplastic), grafting, marker-assisted breeding (MAB), in vitro tissue culture etc. and genetic engineering are needed to transfer genes from such pools (Weil et al. 2010).

10.3.4 Artificially Induced/Incorporated Traits/Genes

When source of resistance is not available in any of the above pools, creation of resistance through mutations (physical and chemical mutagens), genetic engineering for transfer of alien genes, gene manipulation and genome editing technologies are to be adopted in developing resistant cultivars.

Some source of resistance for biotic stresses available in primary gene pool of cultivated tobacco are provided at Table 10.3. Various resistant factors introgressed to N. tabacum from Nicotiana Species (secondary and tertiary pools) are listed at Table 10.4.

Table 10.4 Biotic stress resistance traits introgressions to N. tabacum from wild Nicotiana species

10.4 Glimpses of Classical Genetics and Traditional Breeding

10.4.1 Classical Mapping Efforts

In tobacco, initial gene mapping efforts were made by Anderson and De Winton (1931), East (1932), and Brieger (1935) with the identification of linkage between a pollen color factor and the sterility factors. Brieger estimated that S and br are about 20 crossover units apart in the first linkage group. Brieger (1935) has summarized the data on linkages in N. langsdorfii and N. sanderae, and established with some degree of certainty the first two linkage groups (1) Self-sterility allele (S) and lethality (I), (2) C is the basic gene for anthocyanin color (C) and a recessive gene causing a peculiar type of growth cr (crassa). Smith (1937) confirmed the existence of linkage between self-sterility and pollen anthocyanin color. The first accounts of monosomics in N. tabacum was reported by Clausen and Goodspeed (1926) and they established two types of monosomics in which haplo-C (called “corrugated)”) found to be associated with the chromosome where the basic color factor, Wh, is positioned. Later, Clausen and Cameron (1944) developed complete set of 24 monosomics and studied the association in transmission between monosomics and mendelian characters which led to location of 18 genes on nine chromosomes. Though inheritance of number of traits studied and the linkage of few of them established with other traits, genetic linkage maps are not fully developed in tobacco due to its allopolyploid nature (Suen et al. 1997; Narayanan et al. 2003).

10.4.2 Limitations of Classical Endeavors and Utility of Molecular Mapping

Though genes are very useful markers but they are by no means ideal for mapping studies. Mapping based on morphological markers is tedious and time taking and genes governing quantitative traits cannot be mapped (Worland et al. 1987). Limited number of visual phenotypes whose inheritance could be studied and complications arising in analyzing them due to pleiotropic effect of genes is one of the major limitations for gene (trait) mapping through classical approach. Allopolyploid nature of tobacco making it difficult to score individual gene effects and recombination events at phenotype level due to epistatic interactions between homeo alleles present on constituent genomes and their functional redundancy posing a problem in classical mapping.

For making the gene maps more comprehensive, identification of characteristics that were clearly distinctive and less complex than visual ones is necessary. But, only a fraction of the total number of genes in tobacco exist in allelic forms that can be distinguished conveniently making it difficult to construct classical maps. One of the reasons why our knowledge of the details of inheritance in tobacco was so meager, is because of the prevailingly quantitative or semi-quantitative nature of majority of character including flower color in tobacco (Clausen and Cameron 1944). Gene maps are, therefore, not very comprehensive in tobacco and a detailed map based entirely on genes is not available in tobacco.

As the plant breeding progressed, biochemical markers such as protein and isozyme markers were developed (Markert and Moller 1959). Protein and isozyme markers have been successfully applied in the detection of genetic diversity, population structure, gene flow and population subdivision in tobacco (Mateu-Andres and De Paco 2005). The isozymes and other proteins mostly have neutral effect on plant phenotype and are often expressed co-dominantly making them discriminate easily between homozygote and heterozygote. However, due to their limited availability and the requirement of a different protocol for each isozyme system, utilization of protein and isozyme markers in plant breeding programs and mapping endeavors is very limited.

Hence, there is a need for other types of marker which are abundantly available, stable over environments and can be easily classified into distinct categories. DNA based molecular markers developed in 1990 could satisfy these requirements. Because molecular markers are so abundant in a genome and can easily be detected, when mapped by linkage analysis, they fill the voids between genes of known phenotype. In mapping, DNA marker is not important in itself but, the heterozygous site is merely a convenient reference point for marking the chromosomal locations. However, the molecular marker development and genetic map construction in tobacco have lagged behind other Solanaceae crops such as the tomato, potato, and pepper plants (Tanksley et al. 1992; Barchi et al. 2007). The molecular marker based maps can be effective anchoring points for identification of linked traits for their isolation, cloning and also for use in marker-assisted breeding.

10.4.3 Breeding Objectives

The tobacco breeding mainly aims at enhancing leaf yield potential of the cultivar in addition to maintaining leaf quality, and resistance to biotic and abiotic stresses. Tobacco leaf yield is a dependent variable and is the result of the associated yield attributing independent traits such as plant height, number of leaves, leaves length and width, days to maturity, resistance to biotic and abiotic stresses (Sarala et al. 2005). Some of these traits, in breeding improved high tobacco varieties, are to be selected in positive direction such as number of leaves, days to maturity, resistance, leaf length and width etc. while some of the traits such as plant height, leaf intermodal length, reducing sugars and alkaloids etc. in negative direction. Breeding for improved (higher level) resistance to various biotic stresses as well as finding novel or improved sources of resistance remains a central part of majority of the present flue-cured breeding programs.

Bowman and Sisson (2000) analyzed the relative change for several traits of flue-cured tobacco cultivars grown in the early 1960s compared to the late 1990s and they concluded that yield, leaves per plant and days to flower were selected in positive direction while plant height, intermodal length, reducing sugars and total alkaloids were selected in negative direction. Breeders were successful in increasing the yield despite maintaining leaf quality (both are negatively correlated). The newer cultivars were modified to have longer growing period, more leaves prior to flower bud formation, relatively lower plant height, internodal length, reducing sugars and total alkaloids. Sarala et al. (2005) observed positive selection for plant height, total leaves, harvestable leaves, days to flowering, leaf area, leaf growth rate, specific leaf weight and carotene in tobacco varieties.

10.4.4 Classical Breeding Achievements

Traditional tobacco breeding aimed at developing improved tobacco varieties with higher yield, better leaf quality, resistance to pests and diseases. Significant progress has been made over the years in enhancing the tobacco leaf yield through both varietal and hybrid development, in addition to improving disease and insect resistance without significantly sacrificing in ease of curing. The classical breeding achievements in tobacco are enumerated here.

10.4.4.1 Yield and Quality

Breeders have made progress in improving tobacco leaf yield and quality over years (Bowman and Sisson 2000; Sarala et al. 2005). A number of high yielding varieties have been released so far using conventional breeding methods (https://crosscreekseed.com; https://content.ces.ncsu.edu/flue-cured-tobacco-information; ICAR-CTRI 2021b). Release of two high yielding flue-cured tobacco cultivars viz. Coker 139 (in 1955) and K 326 (in 1981) have provided important germplasm for yield enhancement and have been used widely by breeders as parental lines since release. K 326 is the most successful tobacco cultivar in the 20th century with its cultivation in majority of the tobacco growing countries (Bowman and Sisson 2000). With the development of high yielding disease resistant lines and CMS lines, F1hybrids were developed in tobacco. Hybrid burley cultivars have been available since 1960s, but few flue-cured hybrids were released before the late 1990s (Bowman and Sisson 2000; Sarala et al. 2012). Key advantage of hybrids found to be the speed and ease of obtaning multiple disease resistant ones based on the selection of desirable parental lines having target traits. Several hybrids of Burley and FCV are released and are under cultivation worldwide (https://crosscreekseed.com; https://content.ces.ncsu.edu/flue-cured-tobacco-information; ICAR-CTRI 2021b). One of the main reasons for the popularity of hybrids is the repeal of the 1944 Federal Tobacco Seed Law that prohibited the sale of domestic tobacco seed in abroad. Hybrid cultivars ensure the proprietary status of a new cultivar and seed sales, and distribution can be controlled (Bowman and Sisson 2000).

10.4.4.2 Disease Resistance

Much of the early tobacco breeding carried out was concentrated in incorporation of resistance to diseases that were the primary factors limiting production (Bowman and Sisson 2000). The varieties developed in recent years found to combine disease resistance with high yield potential. Conventional breeding and interspecific hybridization were used to transfer disease resistance traits viz. TMV, wild fire, black shank, brown spot, black root rot, blue mold, root knot nematode, powdery mildew, TSWV, PVY, cyst nematode etc. (Bowman and Sisson 2000; Sarala et al. 2012). Most commercial varieties are highly susceptible to blue mold disease (Rufty and Main 1989) and functional/partial resistance reported in N. debneyi, N. velutina, N. goodspeedii and N. exigua (Clayton 1968; Gillham et al. 1977; Wark 1970) have been transferred through interspecific hybridization (Milla et al. 2005). N gene conferring TMV resistance from N. glutinosa was successfully transferred to N. tabacum by bridge cross technique (Clausen and Goodspeed 1925; Holmes 1938). Majority of the present flue-cured tobacco cultivars possess resistance to important diseases such as TMV, black shank, Granville wilt and root knot nematodes. However, resistance to black shank and Granville wilt are variable among cultivars (Bowman and Sisson 2000). The five CMV genes from wild Nicotiana Species were pyramided into the tobacco line Holmes and resistance derived from Holmes for multiple virus resistance has been incorporated into many cultivars (Holmes 1960, 1961; Wan et al. 1983).

Once resistance to a single disease had been achieved, pyramiding resistance to several diseases into a single cultivar became a top priority (Bowman and Sisson 2000). This objective is being continued into the twenty-first century. Majority of the current flue-cured tobacco cultivars have resistance to Granville wilt, black shank, and root knot nematodes, even though levels of resistance is highly variable among cultivars. Many of the modern hybrids possess resistance to two or more biotic stress (https://crosscreekseed.com; https://content.ces.ncsu.edu/flue-cured-tobacco-information; ICAR-CTRI 2021b). The popular FCV cultivar, K 326 is resistant to the common root-knot nematodes and have low level of resistance to black shank and bacterial wilt (https://crosscreekseed.com/usa-varieties/usa-flue-cured-tobacco-seed/k-326/). While, popular burley tobacco cultivar, Banket A1 is resistant to brown spot, wildfire, and TMV (Lapham 1976).

10.4.4.3 Insect Resistance

Unlike in the case of disease resistance, incorporation of insect resistance is not met much success in tobacco. A tobacco variety, CU 263 having moderate level of resistance to tobacco budworms was released in 1995. Tobacco varieties with resistance to tobacco aphids (Myzus nicotianae) and tobacco caterpillar have been developed for commercial cultivation in India (Joshi and Sitaramaiah 1975; Joshi et al. 1978; Murthy et al. 2014). FCV variety, CTRI Sulakshana is tolerant to aphids and resistant to TMV. Meenakshi (CR) and Abirami (CR) are caterpillar resistance chewing tobacco varieties developed through back cross breeding (ICAR-CTRI 2021b).

10.4.5 Limitations of Traditional Breeding and Rationale for Molecular Breeding

Major limitation in traditional breeding in tobacco are undesirable gene associations, pleiotropic gene effects and linkage drag effects caused by the presence of deleterious genes linked to gene of interest (Legg et al. 1981; Zeven et al. 1983; Friebe et al. 1996; Brown 2002). Chaplin et al. (1966), Chaplin and Mann (1978) and suppressed recombination within introgressed chromatin (Paterson et al. 1990; Liharska et al. 1996) making it difficult to alleviate linkage drag effects through back crossing (Stam and Zeven 1981; Young and Tanksely 1989). Sources of resistance from easily crossable or that can be transferred through special techniques such as bridge crossing, embryo rescue, chemical treatments are required for developing resistance species.

With the advent of advanced molecular breeding techniques like, marker-assisted selection, plant transformation, CRISPER/CAS9 gene editing etc. only gene of interest can be modified or incorporated into the selected cultivar in short span of time without these limitations. Molecular marker-assisted plant breeding for fore ground and background selection with speed breeding strategies will help in saving time.

10.5 Brief on Diversity Analysis

The major interest to study the genetic diversity in tobacco is for the conservation of genetic resources, broadening of the genetic base and cultivar development in breeding programs. Narrow genetic diversity can lead to crop losses due to reduced flexibility of varieties to combat infestations of new strains of pests or pathogens or to adapt populations to changing environmental conditions such as increasing temperatures or salinity (Moon et al. 2009a). The study of the genetic diversity is important for identifying sources for economically important traits and diverse parental combinations to create segregating progenies with maximum genetic variability for further selection in breeding (Barrett and Kidwell 1998). Utilization of diverse parents helps in developing tobacco varieties having diverse genetic backgrounds with improvement in yield and quality along with stress resistance traits. Deploying varieties with diverse sources of resistance will reduce genetic vulnerability of cultivars to evolving races and biotype of insects and pathogens. Thus, genetic diversity is essential for continued progress in breeding varieties for adaptation to future environmental challenges.

10.5.1 Phenotype-Based Diversity Analysis

Phenotypic viz. morphological, karyotypical and physiological characters have been used to study the genetic diversity of tobacco germplasm (Goodspeed 1954; Zhang 1994; Lu 1997; Zhang et al. 2005). Till date several agro-morphological traits (Zhang 1994; Wenping et al. 2009; Zeba and Isbat 2011; Sarala et al. 2018; Baghyalakshmi et al. 2018), chemical and cytological traits (Tso et al. 1983; Okumus and Gulumser 2001; El-Morsy et al. 2009; Darvishzadeh et al. 2011) have been used to study the genetic variation of tobacco germplasm. Agro-morphological traits usually vary with environment and affect diversity estimates under different environments (Lu 1997). Germplasm are screened for identifying sources of resistance to various biotic stresses under artificial conditions based on the extent and type of damage caused to the crop through phenotyping.

The tolerance/resistance mechanism to various stresses is now being extensively studied through newer techniques of phenotyping namely high throughput phenotyping where the system quantifies a number of traits within a described set of plant population with automated image collection and analysis, thus effectively streamlining the plant phenomics. This technology is gaining importance due to its non-destructive sampling methods, rapid screening of larger population and automated data analysis. To measure tolerance, visual cameras are used to capture the plant growth, architecture, chlorosis and necrosis all of which can be negatively affected by insect infestation. Photosynthetic activity of crop plants can be measured in terms of chlorophyll fluorescence through fluorescence cameras as an indication of tolerance mechanisms occurring in response to insect attack (Buschmann and Lichtenthaler 1998). The technology can have numerous applications in the measure of insect damage and plant resistance to insects (Goggin et al. 2015). Ultimately the use of high throughput phenotyping systems could possibly reduce the amount of labour and screening time to identify plants tolerant to insects.

10.5.2 Genotype-Based Diversity Analysis

Before the advent of DNA markers, morphological, karyotypical, physiological and isozyme markers are mainly used in diversity analysis of tobacco. However, morphological characters usually differ with environments. The karyotypical characters are limited in number and the study of genotypic diversity based on isozyme variation is restricted to a small number of loci that control few polymorphic enzyme systems (Lu 1997). Until now, only limited information has been available on the relationship between morphological variability and genome diversity in cultured tobacco. Later, attempts were made to examine the degree of relatedness among tobacco cultivars and diversity of germplasm based on variability at DNA level. About 77% of the total genomic DNA content is composed of repetitive sequences in tobacco and therefore, the remaining non-repetitive sequences are responsible for morphological and quality trait variability (Narayan 1987).

With the advantages such as highly polymorphic nature, codominant inheritance, easy access, fast assay, high reproducibility and easy exchange of data between laboratories, different types of molecular markers have been introduced over the last two decades, that has revolutionized the entire scenario of biological sciences including tobacco (Liu and Zhang 2008). Molecular markers provide a relatively unbiased estimation of genetic diversity in plants and are abundantly available throughout the genome. DNA-based molecular markers found to be versatile tools having applications in various fields such as characterization of genetic variability, genome fingerprinting, genome mapping, gene localization, analysis of genome evolution, population genetics, taxonomy, genome comparisons, gene mapping, quantitative trait loci analysis, marker-assisted breeding diagnostics, etc.

Restricted fragment length polymorphism (RFLP), randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), microsatellites or simple sequence repeat (SSR), single-nucleotide polymorphism (SNP), inter simple sequence repeats (ISSR) etc. have been employed in studying genetic diversity, gene mapping and marker-assisted breeding of tobacco.

RFLPs were the most popular molecular markers in late eighties due to their reproducibility, locus-specificity, mendilian inheritance and codominant nature (Nadeem et al. 2018). RFLP methodology is simple and require no special equipment. Another advantage of RFLP is that the sequence used as a probe need not be known RFLPs, being co dominant markers, can detect coupling phase of DNA molecules as DNA fragments from all homologous chromosomes are detected. RFLPs are highly reliable markers in linkage analysis and breeding as they can easily differentiate whether the linked trait is present in a homozygous or heterozygous state in an individual and such information highly desirable for recessive traits (Winter and Kahl 1995). However, the large amount of DNA required for restriction digestion and laborious Southern blotting technique hampered the utility of the assay. Further, it is time-taking and labor-intensive and only few out of numerous markers may be polymorphic making it highly inconvenient especially for crosses between closely-related species. RFLP markers were the first molecular markers used in tobacco research specially to study the function of few cloned genes (Bretting and Widrlechner 1995).

Invention of polymerase chain reaction (PCR) technology and PCR-based markers such as RAPD and AFLP emerged in the beginning of nineties and later microsatellite markers were used by different workers to study genetic diversity in tobacco. Compared to RFLP, these PCR based markers are preferred because of the relative ease with which PCR assays can be carried out. RAPD technology utilizes short synthetic arbitrary oligonucleotides primers to amplify fragments. It methodology is simple, time-saving, and requires only small amount of genomic DNA. Due to these advantages, RAPD technology has been widely used in studying genetic diversity, localization of target genes, genetic mapping and evolution genetics. However, RAPD are anonymous and the level of their reproducibility is very low due to the non-specific binding of short, random primers. AFLP technology combines the power of RFLP with the flexibility of PCR-based markers. It offers a universal, multi-locus marker tool that can be applied to complex genomes from any organism. Though the AFLPs markers are advantageous in terms of their reproducibility and sensitivity, their detection method is lengthy, laborious and not amenable to automation. Both RAPD (Xu et al. 1998; Del Piano et al. 2000; Evanno et al. 2005; Zhang et al. 2005, 2008a; Arslan and Okumus 2006; Sarala and Rao 2008; Sivaraju et al. 2008; Denduangboripant et al. 2010; D’hoop et al. 2010) and AFLP (Huang et al. 2008; Zhang et al. 2008a; Chuanyin et al. 2009; Liu et al. 2009a) were used to analyze the genetic diversity and varietal identification in tobacco. In tobacco, the RAPD technique has also been used to identify markers linked to genes for resistance to pathogens (Bai et al. 1995; Yi et al. 1998a, b; Johnson et al. 2002; Collard et al. 2005; Julio et al. 2006; Gong et al. 2020). Based on AFLP markers, five TMV-resistant tobacco accessions were found to carry a TMV resistance gene (N gene derived from N. glutinosa L.) on chromosome H and seven accessions were found to carry a resistance factor on an alternative chromosome (Lewis et al. 2005).

Soon after the discovery of SSR markers in late 90 s and the beginning of twenty-first century, they became markers of choice as they could able to eliminate all drawbacks of earlier DNA marker technologies (Jafar et al. 2012). SSRs were highly reproducible, polymorphic, and amenable to automation. In spite of higher working costs, SSR markers are extensively employed in plant molecular genetics and breeding. Development of PCR-based molecular markers is, in general, inefficient in tobacco in view of the very limited genetic diversity between tobacco cultivars, particularly between those of the same type (Del Piano et al. 2000; Julio et al. 2006; Rossi et al. 2001) and that cultivated tobacco is a tetraploid species with a very large genome (Livingstone et al. 1999; Ren and Timko 2001; Doganlar et al. 2002). However, Bindler et al. (2007) for the first time reported the use of microsatellite markers in the identification of tobacco varieties. They used around 637 functional SSR markers out of which 282 were highly polymorphic and were used for variety identification. Since then, SSR markers are being used in estimating diversity in tobacco. Later, an additional set of 5119 new and functional SSR markers were developed for mapping and diversity studies (Bindler et al. 2011). Tong et al. (2012b) published a set of 1365 genomic SSRs and 3521 expressed sequence tag (EST)-SSRs, which only slightly overlapped with the set published by Bindler et al. (2007, 2011). Madhav et al. (2015) developed a new set of microsatellite markers using 70 motifs (which includes perfect and imperfect repeat) and validated their applicability in differentiating different types of tobacco and diverse cultivars of Flue Cured Virginia (FCV) tobacco, and the transferability of these markers in a wide range of Nicotiana species. Cai et al. (2015) used EST databases of tobacco for developing EST-SSR markers and validated them in studying the genetic differentiation between N. rustica and N. tabacum, and between oriental tobacco and other accessions of N. tabacum. Later, Wang et al. (2018) identified a huge number of SSRs through comparative genome wide characterization of ~20 Gb sequences from seven species viz. N. benthamiana, N. sylvestris, N. tomentosiformis, and N. otophora, and from N. tabacum cultivars TN90, K326, and BX representing a ~74%, ~80%, ~78%, ~81%, ~84%, and ~73% of each of the genomes, respectively. Their study resulted in the development of a total of 1,224,048 non-redundant NIX (Nicotiana multiple (X) genome) markers (SSRs), of which 99.98% are novel (Wang et al. 2018). SSR markers development in tobacco has led to the analysis of molecular diversity (Moon et al. 2009b; Davalieva et al. 2010; Fricano et al. 2012; Gholizadeh et al. 2012; Prabhakararao et al. 2012; Xiang et al. 2017) of genetic resources, DUS testing (Binbin et al. 2020), genetic relatedness of cultivated varieties (Moon et al. 2008), estimating the changes in diversity due to breeding interventions (Moon et al. 2009a) and also for the identification of linked markers and QTLs to various biotic stresses that were discussed later in this chapter.

Genetic diversity in tobacco varieties is also been assessed using Inter simple sequence repeats (ISSR) markers (Yang et al. 2005, 2007; Qi et al. 2006) and inter-retrotransposon amplification polymorphism (IRAP) markers (Yang et al. 2007).

In recent years, SNP markers, first discovered in human genome, proved to be universal, abundant, ubiquitous, and amenable to high- and ultra-high-throughput automation are being identified and used (Jafar et al. 2012). Although SNPs are less polymorphic than SSR markers, their biallelic nature compensate this drawback (Ghosh et al. 2002). The application of SNPs in tobacco is complicated and challenging due to its tetraploid nature and complex genetic architecture (Ganal et al. 2009). However, recent studies identified number of SNPs in tobacco (Xiao et al. 2015; Thimmegowda et al. 2018; Tong et al. 2020) that are being utilized in the identification of markers linked to important traits in tobacco entries including biotic stresses in addition to development of molecular maps, and studying genome structure and organization.

Wang et al. (2021) used genotyping-by-sequencing (GBS) technique on 113 cigar tobacco accessions for the identification of 47 core Kompetitive allele specific PCR (KASP) and 24 candidate core markers utilizing SNP data. KASP markers are able to discriminate between two alleles of a SNP using a common reverse primer paired with two forward primers, one specific to each allele. Core markers were used for varietal identification and fingerprinting in 216 cigar germplasm accessions.

10.5.3 Relationship with Other Cultivated Species and Wild Relatives

Cultivated tobacco belongs to the genus Nicotiana, one of the five major genera of the family Solanaceae. Comparative morphological, cytological and biochemical studies, and examinations of organellar (plastid and mitochondrial) genome organization and analysis of molecular features, such as repetitive DNA sequences and the structure of various nuclear gene family members, has been employed to study the evolution and the genetic diversity in genus Nicotiana (Kostoff 1943; Goodspeed 1954; Komarnitsky et al. 1998; Lim et al. 2000; Liu and Zhang 2008).

The genus, Nicotiana resembles the two genera, Cestrum (8 pairs) and Petunia (7 pairs of chromosomes) (Darlington and Janaki Ammal 1945) in habit and habitat. The origin, evolution and relationships among various species of the genus have been summarized in three phylogenic arcs. In the first two arcs, the genus is envisaged as derived from a pre-generic reservoir of two related genera and evolving into three complexes, at the 12-paired level, that are hypothetical precursors of the three modern sub-genera. The third arc contains the present day species, at 12- and 24-paired chromosome level. Although 6-paired species of Nicotiana is not known, the predominance of 12- paired species and their compound morphological character, along with a frequency of 4–8 pairing with a mode of 6 pairs in large number of Fl hybrids combining 12-paired species, indicates that 6 is the basic chromosome number for Nicotiana and both 12 and 24 are derived numbers. Additional evidence for this has been provided by secondary association and pairing in haploids of 12-paired species (Kostoff 1943; Goodspeed 1954). It is assumed that with a higher survival value, the allopolyploids have eliminated the older 6-paired types. Hybridization between the ancestral 6-paired and the present day 12-paired member was considered to be responsible for the evaluation of 9- and 10-paired species. The N. tabacum and N. rustica and other 24-paired species are modern descendants of the 12-paired progenitors entered into amphiploid origin. The aneuploid species of the section Suaveolentes with a chromosome number between 16 and 24 pairs are presumed to have originated as products of hybridization and segregation. Thus, among various evolutionary mechanisms, amphiploidy superimposed by amphiploidy seems to be the basic evolutionary process responsible for the 6- 12- 24-paired sequence in the genus (Goodspeed 1954). Further enlargement of the genus took place through aneuploidy resulting from hybridization, genetic recombination and mutation.

The systematic classification of the genus was presented in detail by Goodspeed (1954) and Goodspeed and Thompson (1959) mainly based on cytogenetic studies involving chromosome morphology, behaviour in interspecific hybrids, aneuploids and amphiploids. Subsequently, additions to the classification were made by Burbridge (1960). In the revised systematics based on molecular research (Chase et al. 2003; Clarkson et al. 2004; Knapp et al. 2004), subgenera were dropped retaining the division into sections. As per the recent classification, N. trigonophylla Dun. was renamed N. obtusifolia Martens et Galeotti, N. affinis Hort is synonymous with N. alata Link et Otto, and N. bigelovii (Torrey) Watson with N. quadrivalvis Pursh. N. sanderae Hort. is considered to be a hybrid between N. alata and N. forgetiana Hemsl. (Nicotiana x sanderae) whereas N. eastii Kostoff is a variant of autotetraploid version of N. suaveolens Lehm. (Chase et al. 2003; Knapp et al. 2004).

N. tabacum and N. rustica are the cultivated species among the 83 wild species of Nicotiana genera (Lewis 2011; Berbec and Doroszewska 2020). N. tabacum is highly polymorphic with wide range of morphological types and diversified utilities viz. smoking, snuff, chewing, etc. Other species cultivated in smaller scale are N. repanda Willd ex Lehm., N. attenuata Torrey ex S. Watson and N. quadrivalvis Pursh are for smoking, N. sylvestris Spegazzini & Comes, N. alata Link and Otto, N. langsdorffii Weinmannm, N. forgetiana Hemsley, and N. sanderae (Hort) for ornamental and N. glauca Graham for industrial purpose (Lester and Hawkes 2001).

It is known that N. tabacum is natural amphidiploid i.e., allopolyploid (2n = 4x = 48) ascended by hybridization of wild progenitor species (N. sylvestris (S-genome) × N. tomentosiformis (T-genome) and N. rustica L. from a cross between N. paniculata/N. knightiana (P/K-genome) × N. undulata Ruiz & Pav. (U-genome) (Goodspeed 1954; Clarkson et al. 2005; Lim et al. 2005; Leitch et al. 2008; Edwards et al. 2017; Sierro et al. 2018). Comparison of whole-genomic sequences indicated that the genome of N. sylvestris and N. tomentosiformis contributed 53 and 47%, respectively, to the genome of N. tabacum (Sierro et al. 2014). The comparative mapping studies of the diploid ancestor of the T-genome (N. tomentosiformis) and a species related to the S-genome (N. acuminata) using conserved ortholog sequences (COS) and SSR markers revealed that the tetraploid tobacco genome has undergone a number of chromosomal rearrangements compared to these diploid genomes (Wu et al. 2009). Furthermore, in the same study, it was observed that a number of reciprocal translocations and inversions (>10) differentiate the ancestral tobacco genomes from the tomato genome. Gong et al. (2016) identified a large number of genome rearrangements occurring after the polyploidization event through mapping of ancestral genomes of N. tomentosiformis and N. sylvestris with SNP markers. Based on studying the cpSSRs and MtSSRs, S genome in N. tobacum was identified to originate from N. Sylvestris ancestor (Murad et al. 2002). The chloroplast genome of N. otophora revealed that N. otophora is a sister species to N. tomentosiformis within Nicotiana genus and Atropha belladonna and Daturastramonium are the closest relatives (Asaf et al. 2016).

Sierro et al. (2018) reported that 41% of N. rustica genome is originated from the paternal donor (N. undulata) and 59% from the maternal donor (N. paniculata/N. knightiana). Comparison of families of repetitive sequences, two from N. paniculata and one from N. undulata indicated that P- and U-genomes of N. rustica was similar to the putative parents, N. paniculata and N. undulata, respectively (Lim et al. 2005). Genomic in situ hybridization confirmed that N. rustica is originated as an allotetraploid from N. paniculata (maternal P-genome donor) and N. undulata (paternal U-genome donor) and interlocus sequence homogenization has caused the replacement of many N. paniculata-type intergenic spacer (IGS) of the 18-5.8-26S rDNA in N. rustica with an N. undulata-type of sequence (Matyasek et al. 2003). However, the study of nuclear and chloroplast genomes, and gene analyses showed that N. knightiana is closer to N. rustica compared to N. paniculata. Gene clustering revealed 14,623 ortholog groups common to other Nicotiana species and 207 unique to N. rustica. (Sierro et al. 2018).

Both N. tabacum and N. rustica shares its basic chromosome number of n = 12 with many other solanaceous species such as tomato, potato, pepper and eggplant (Lim et al. 2004; Clarkson et al. 2005). Microsynteny observed between the genomes of N. tabacum cv. TN90, K326 and BX and those of tomato and potato at the protein level (Sierro et al. 2014).

10.5.4 Relationship with Geographical Distribution

The genus, Nicotiana, considered to be recent origin, is presumed to have had its original habitat in and around the Andes region in South America and Central America, probably from the mild to low altitude forest margin (Goodspeed 1954). Although, occurring naturally as a perennial plant, tobacco is farmed as an annual crop. In the genus Nicotiana, 20 are native to Australia, one to Africa and 54 are indigenous to North /South America (Goodspeed 1954; Burbridge 1960; Clarkson et al. 2004). N. benthamiana, a species native to Australia, is used extensively as a model system to study various biological processes. N. tabacum L. (common tobacco) and N. rustica L, two species native to America, are the major cultivated species cultivated throughout the world.

Some 40% of Nicotiana species are allopolyploids, which have been generated independently in six polyploidy events several million years ago (Clarkson et al. 2004; Leitch et al. 2008). The taxonomically, some of the diploid genome donors that make up various allopolyploids are closely related and others belonging to distantly related taxonomic sections (Clarkson et al. 2004; Leitch et al. 2008).

Darvishzadeh et al. (2011) reported clustering of oriental-type tobacco genotypes based on morphological traits was in agreement with their geographical distribution and growth characteristics. However, the genetic diversity studies conducted using SSR markers in oriental (Darvishzadeh et al. 2011), and RAPD and AFLP markers in flue-cured tobaccos (Zhang et al. 2008a) could not indicate any clear pattern of their geographical origins. However, the clustering of tobacco genotypes based on molecular diversity found to correspond to commercial classes (Flue-Cured, Burley, etc.), manufacturing trait and parentage (Sivaraju et al. 2008; Fricano et al. 2012).

10.5.5 Extent of Genetic Diversity

Existence of morphological diversity confirmed in various germplasm collections viz. FCV, burley, mutant lines, bidi, chewing, cheroot and cigar filler germplasm etc. (Baghyalakshmi et al. 2018; Sarala et al. 2018) maintained at tobacco genebank in India. Moon et al. (2009b) observed quite large average gene diversity among N. tabacum accessions from the U.S. Nicotiana Germplasm Collection compared FCV tobacco accessions. While, Fricano et al. (2012) reported lower SSR diversity per locus than in similar investigations carried out on TI accessions of tobacco. The genetic diversity assessment in tobacco cultivars indicated low degree of polymorphism (Xu et al. 1998; Del Piano et al. 2000; Rossi et al. 2001; Yang et al. 2005; Zhang et al. 2005; Julio et al. 2006). Richest genetic diversity reported for local group of tobacco varieties and lower diversity for introductions, and higher genetic similarity values between introductions and breeding group (Xiang et al. 2017). While, the variation among the tobacco lines were found to be higher concerning with the chemical nature of the plants (Tso et al. 1983; Darvishzadeh et al. 2011) as well as susceptibility to disease such as stem rot and powdery mildew (Darvishzadeh et al. 2010). The relatively low levels of diversity revealed by molecular markers in tobacco cultivars (Ren and Timko 2001) may be due to the utilization of only small proportion of the variability of the gene pools of the progenitor species in breeding programs (Lewis et al. 2007). The level of polymorphism among the varieties of N. tabacum observed to be higher compared to N. rustica (Sivaraju et al. 2008). However, wild tobacco accessions found to have higher genetic diversity (Chuanyin et al. 2009).

10.6 Association Mapping Studies

Association analysis, also known as linkage disequilibrium (LD) mapping or association mapping, uses the natural population as the target material. The analysis is based on linkage disequilibrium that detects the frequency of significant association between the genetic variation of markers with candidate genes and the target traits in a population (Bradbury et al. 2007; Pritchard et al. 2000). LD is defined as the non-random association of alleles at two or more loci (Fricano et al. 2012). Compared with QTL mapping, association analysis has two prominent advantages: (1) there is no need to construct population and genetic maps, and (2) uncovers (explores) elite genes from a certain number of germplasm resources in a single instance, providing indication of genetic diversity. LD mapping has the potential to outperform traditional mapping and facilitate fine mapping in a random-mating population as only close linkage between markers and traits remains over several generations in such population.

10.6.1 Extent of Linkage Disequilibrium

In cultivated plants, extent of linkage disequilibrium is influenced by mating system, mutation rate, genetic drift, selection, recombination rate, gene conversion, and population size and structure (Flint-Garcia et al 2003). High-density genome fingerprinting could uncover long as well as short range LD. In the first case, in species with large genomes, a lower number of molecular markers can be tested (Waugh et al. 2009), although this will result in a lower mapping resolution. Conversely, short-range LD enables the fine mapping of causal polymorphisms, if large panels of markers are available (Myles et al. 2009).

The knowledge on the extent of LD is essential to determine the minimum distance between markers required for effective coverage when conducting genome-wide association studies (GWAS). Fricano et al. (2012) identified a set of 89 genotypes that captured the whole genetic diversity detected at the 49 SSR loci and evaluated LD using 422 SSR markers mapping on seven linkage groups. The pattern of intra-chromosomal LD revealed that LD in tobacco was clearly dependent on the population structure and was extended up to distances as great as 75 cM with r2 > 0.05 or up to 1 cM with r2 > 0.2.

10.6.2 Target Gene Based LD Studies

Thornsberry et al. (2001), for the first applied the association analysis in plants. Currently, LD has been used in association mapping to locate QTLs or major genes, based on the co-segregation of specific marker alleles and traits in tobacco (Zhu et al. 2008; Rafalski 2010). Zhang et al. (2012a, b) conducted association analysis on 13 agronomic traits in 258 flue-cured tobaccos and detected significant association of six agronomic traits with 18 sequence-related amplified polymorphism markers. Based on association analysis, Yu et al. (2014) found that the polymorphic loci of an SSR marker and six microsatellite-anchored fragment length polymorphism markers were significantly associated with the levels of tobacco-specific nitrosamines. Ren et al. (2014) found 24 SSR loci associated with aroma substances in tobacco and Basirnia et al. (2014) identified only one SSR locus from linkage group 13 that was significantly associated with low chloride accumulation rate in 70 oriental-type tobaccos. Fan et al. (2015) performed a marker–trait association analysis and obtained 11 SSR markers associated with potassium content in tobacco; five of the SSR markers were used to validate the stability of the associated markers in 130 tobacco germplasms. Through association mapping, Darvishzadeh (2016) identified the linkage of five SSR loci from linkage groups 2, 10, 11 and 18 of a tobacco reference map (Bindler et al. 2007, 2011) with gene(s) controlling Orobanche resistance in tobacco. Tahmasbali et al. (2021) by using mixed linear method identified a total of 16 loci to be significantly (P < 0.05) associated with the agronomic traits under normal (without Orobanche) and stress (with Orobanche) conditions with some common markers across normal and Orobanche conditions for few traits. Based on association mapping using 219 accessions for their responses to BS at two sites, Sun et al. (2018) detected six significant marker-trait associations and identified two probable candidate genes for resistance to BS (Nitab 4.5_0000264g0050.1 and Nitab 4.5_0000264g0130.1) among 31 predicted genes at locus Indel53.

SNPs, due to their abundance at genome-wide level, biallelic and reproducible nature, are considered to be the most desirable, precise and efficient tools for assessing the genetic characteristics of populations or germplasm, QTL mapping and facilitating the selection of breeding materials that bear desired genes/alleles or haplotypes in both plant genetics and breeding programs (Pace et al. 2015; Mora et al. 2016). Tong et al. (2020) made simultaneous association analysis of leaf chemistry traits in natural populations with a large amount of tobacco germplasms based on genome-wide SNP markers.

10.6.3 Genome Wide LD Studies

Association mapping studies largely depend on the genetic structure of the population. Molecular diversity in germplasm collections can be utilized to reconstruct the population structure in tobacco for association studies (Moon et al. 2009b). Ganesh et al. (2014) analyzed genetic structure of 135 FCV (Flue Cured Virginia) tobacco genotypes and observed that 25 unlinked SSR markers delineated 135 FCV genotypes revealing a total of 85 alleles with an average of 3.4 alleles per locus. However, as on date, Genome wide LD studies are not reported in tobacco.

10.6.4 Future Potential for the Application of Association Studies for Germplasm Enhancement

The population-based association study explores the availability of broader genetic variations with wider background for marker-trait correlations (i.e., many alleles evaluated simultaneously). Such studies provide higher resolution maps because of the utilization of majority recombination events from a large number of meiosis throughout the germplasm development history and exploits historically measured trait data for association without the development of expensive and tedious biparental populations in a time saving and cost-effective way (Abdurakhmonov and Abdukarimov 2008). Linkage disequilibrium (LD)-based association study, as a high-resolution, broader allele coverage, and cost-effective gene tagging approach provides an opportunity to dissect and exploit existing natural variations in tobacco germplasm resources for tobacco improvement. Owing to the availability of large collection of germplasm resources in tobacco over worldwide, association studies help to detect neutrally inherited markers in close proximity to the genetic causatives or genes controlling the complex quantitative target traits including resistance to biotic stresses for further germplasm enhancement and exploitation.

10.7 Brief Account of Molecular Mapping of Resistance Genes and QTLs

10.7.1 Brief History of Mapping Efforts

Gene mapping in tobacco found to initiate with the identification of linkage between a pollen color factor and the sterility factors (Anderson and De Winton 1931; East 1932; Brieger 1935; Smith 1937). Clausen and Goodspeed (1926) developed two types of monosomics and showed that both haplo-C (then called “corrugated”) and basic color factor, Wh are associated with a chromosome. Later, Clausen and Cameron (1944) studied the association in transmission between 24 monosomics developed by them and mendelian characters which led to location of 18 genes in nine chromosomes. Though genes controlling various characters and their linkages with other genes are identified, detailed map based entirely on genes is not available in tobacco (Suen et al. 1997; Narayanan et al. 2003).

However, with the advent of DNA markers in early nineties, efforts were made to map tobacco genome using molecular markers. Initially, RFLP, RAPD and AFLPs markers were used to map and tag resistant genes linked to biotic stresses. In 2001, for the first time, RFLP and RAPD markers were used to map Nicotiana spp. (Lin et al. 2001). Later, RAPD, AFLPs and ISSR were used in construction of genetic maps (Lin et al. 2001; Nishi et al. 2003; Julio et al. 2006; Xiao et al. 2006). After the discovery of SSR markers in late 1990s, for the first time SSR based molecular map showing 24 linkage groups was developed in N. tabacum (Bindler et al. 2007). This SSR map was improved further with identification of more number of SSRs (Bindler et al. 2011; Tong et al. 2012b). Recently, with identification of SNPs, high density SNP based tobacco genetic map has been developed with 24 linkage groups (Tong et al. 2020). Currently, maps are available for FCV, burley tobacco and their intra type. Maps are also available few Nicotiana spp.

10.7.2 Evolution of Marker Types

Molecular markers allow detection of variations or polymorphisms that exist among individuals for specific regions of DNA, thus serves as useful tools in mapping of genetic material. Molecular genetic markers, such as RFLP, RAPD, AFLP, SSRs and SNPs have been used in genetic linkage mapping and QTL mapping in tobacco (Liu and Zhang 2008).

Initially, molecular genetic markers such as RFLP, RAPD and AFLP were used in mapping studies. In the beginning of nineties, PCR-based RAPD markers were used by different workers to map and tag resistant genes linked to biotic stresses due to their relative ease in spite of the reproducibility issues. Though, reproducibility and sensitivity of AFLPs markers is higher, they were used in a limited extent in mapping of resistance gene sowing to their lengthy and laborious detection method and non-suitability to automation.

After the discovery of SSR markers in late 90s, SSRs and EST-SSRs became markers of choice for mapping in tobacco (Bindler et al. 2007, 2011; Tong et al. 2012b). Currently, more than 10,000 SSR markers available in tobacco for their use in QTL/gene mapping studies (Bindler et al. 2007, 2011; Tong et al. 2012b; Cai et al. 2015; Madhav et al. 2015). In addition, Wang et al. (2018) identified a huge number of about 1,200,000 non-redundant and novel NIX (Nicotiana multiple (X) genome) markers (SSRs) for use in tobacco.

In recent years, SNPs have been identified and used in mapping of tobacco genome. Xiao et al. (2015) developed SNPs using two different methods (with and without a reference genome) based on restriction-set associated DNA sequencing (RAD-seq). Thimmegowda et al. (2018) identified SNPs by whole-genome resequencing of 18 flue-cured Virginia (FCV) tobacco genotypes and positioned SNPs in linkage groups. Using the genome of N. tabacum (K326 cultivar) as a reference, Tong et al. (2020) identified and mapped 45,081 SNPs to 24 linkage groups in the tobacco genetic map.

In addition to the above markers; sequence-specific amplification polymorphism (SSAP), Sequence-related amplified polymorphism (SRAP), cleaved amplified polymorphic sequence (CAPS) and Diversity arrays technology (DArT) markers were also used in generating molecular linkage maps in tobacco.

10.7.3 Mapping Populations Used

For molecular mapping in tobacco, diverse populations viz. F2 populations, doubled haploid (DH) lines, recombinant inbred lines (RILs), BC1 progenies, BC1F1, BC4F3 populations etc. have been used as the mapping population (Table 10.5). Majority of the maps developed based on F2 and DH populations and three maps developed based on NGS technologies. Practically 99–381 individuals are selected in a mapping population for higher resolution and fine mapping.

Table 10.5 Genetic linkage maps constructed in Nicotiana (in chronological order)

10.7.4 Mapping Software Used

Mapmaker program (Lander et al. 1987; Lin et al. 2001; Wu et al. 2010), JoinMap® 3.0 program (Van Ooijen and Voorrips 2001; Bindler et al. 2007, 2011) Map Manager QTXb20 (Manly et al. 2001; Bindler et al. 2011; http://www.mapmanager.org) and JoinMap 4.0 (Van Ooijen 2006; Lu et al. 2012; Tong et al. 2016) and LepMap3 software (Rastas 2017; Tong et al. 2020, 2021) were used in developing molecular maps in tobacco. Among various mapping softwares, JoinMap 3.0/4.0 program is the widely used one in tobacco for construction of molecular maps based on markers. LepMap3 software was used in the construction of maps using NGS data.

10.7.5 Maps of Different Generations

Genetic linkage maps are essential for studies of genetics, genomic structure, genomic evolution and for mapping essential traits. Construction of molecular genetic maps in tobacco have lagged behind other Solanaceae crops such as the tomato, potato, and pepper plants (Tanksley et al. 1992; Jacobs et al. 2004; Barchi et al. 2007). Till the end of twentieth century, very little information was available on genetic mapping and molecular development in tobacco (Suen et al. 1997). Construction of genetic linkage maps in tobacco was started since the beginning of twenty-first century (Lin et al. 2001). Various maps constructed are briefly discussed here.

10.7.5.1 Mapping of Nicotiana Species

Lin et al. (2001) constructed a genetic linkage map based on the 99 individuals of the F2 plants derived from tobacco wild species, Nicotiana plumbaginifolia × N. longiflora. This map covers 1062 cM with the distribution of 60 RFLP and 59 RAPD loci on nine major linkage groups. Owing to the shortage of markers, the map has not coalesced into ten linkage groups, corresponding to the haploid chromosome number of N. plumbaginifolia. Later, another two maps for wild diploid Nicotiana species, N. tomentosiformis and N. acuminata with 12 linkage groups spanning ~1000 cM are generated by Wu et al. (2010). The N. tomentosiformis map was created with the combination of 489 SSR and CAPS markers and N. acuminata (closely related to N. sylvestris) with 308 SSR and CAPS markers (Wu et al. 2010).

10.7.5.2 Mapping of Burley Tobacco

Nishi et al. (2003) constructed a genetic linkage map of the burley tobacco containing 10 linkage groups based on DH lines, derived from F1 hybrids between burley entries, W6 and Michinoku 1 using AFLP markers. Cai et al. (2009) constructed currently available high density burley linkage map spanning 1953.6 cM using a double haploid (DH) population derived from a cross between Burley 37 with high nicotine content and Burley 21 with low nicotine content assembling 112 AFLP loci and six SRAP loci into 22 linkage groups (A1-A22).

10.7.5.3 Mapping of Flue-Cured Tobacco

The first linkage map of flue-cured tobacco based on a DH population from a cross between Speight G-28 and NC2326 was constructed by Xiao et al. (2006) using 169 ISSR/RAPD molecular markers covering 27 linkage groups. While Julio et al. (2006) constructed a molecular linkage map of flue-cured tobacco with 18 linkage groups covering 138 ISSR, AFLP and SSAP markers based on 114 flue-cured tobacco recombinant inbred lines.

Bindler et al. (2007) for the first time constructed an SSR based linkage map using F2 plants derived from a cross of Hicks Broadleaf × Red Russian. Later, this map was further improved (Bindler et al. 2011) with the identification of more number of SSR markers. This high density SSR map of flue-cured tobacco constructed with 2318 SSR markers mapping on 24 linkage groups covering a total length of 3270 cM is the most widely used map of tobacco. The length of individual linkage groups in this map varied from 86 to 199 cM, and the average genetic distance between adjacent markers was 1.4 cM. However, in spite of the high marker density, there exists some gaps spanning ~16 cM in the map. (Fig. 10.1; Bindler et al. 2011). Tong et al. (2012a, b) utilized a population of 207 DH lines derived from a cross between ‘Honghua Dajinyuan’ and ‘Hicks Broad Leaf’ and constructed a genetic map of flue-cured tobacco consisting of 611 SSR loci distributed on 24 tentative linkage groups and covering a total length of 1882.1 cM with a mean distance of 3.1 cM between adjacent markers. Utilizing a 213 backcross (BC1) individuals derived from an intra-type cross between two flue-cured tobacco varieties, Y3 and K326, Tong et al. (2016) further constructed a genetic map consisting of 626 SSR loci distributed across 24 linkage groups and covering a total length of about 1120 cM with an average distance of 1.79 cM between adjacent markers.

Fig. 10.1
A diagram illustrates the map of tobacco. It includes an SSR-based linkage map with 1 to 6 linkage groups with 2318 SSR markers mapped covering a total length of 3270 centimorgan.

Part SSR map (1–6 linkage groups) constructed by Bindler et al. (2011) with 2318 microsatellite markers covering a total length of 3270 cM

Xiao et al. (2015) constructed two SNP linkage maps of flue-cured tobacco using two different methods (with and without a reference genome) based on restriction-set associated DNA sequencing (RAD-seq) of back cross population. Overall, 4138 and 2162 SNP markers with a total length of 1944.74 and 2000.9 cM were mapped to 24 linkage groups in these genetic maps based on reference genome and without reference, respectively. An SNP-based high density genetic map, N. tabacum 30 k Infinium HD consensus map 2015 (Fig. 10.2) was released at SOL Genome Network to facilitate the fine mapping of different trait of interest (https://solgenomics.net/cview/map.pl?map_version_id=178). Cheng et al. (2019) constructed a high-density SNP genetic map of flue-cured tobacco using restriction site-associated Illumina DNA sequencing. In this map, a total 13,273 SNP markers were mapped on 24 high-density tobacco genetic linkage groups spanning around 3422 cM, with an average distance of 0.26 cM between adjacent markers. While, Tong et al. (2020) performed whole-genome sequencing of an intraspecific RIL population, a F1 generation and their parents and identified SNPs. Using the N. tabacum (K326 cultivar) genome as reference, a total of 45,081 identified SNP markers (with 7038 bin markers) were characterized to construct a high-density SNP genetic map of flue-cured tobacco spanning a genetic distance of around 3487 cM (Fig. 10.3). Tong et al. (2021) successfully constructed another high-density genetic map containing 24,142 SNP markers using a BC4F3 population derived from inbred of flue-cured tobacco lines Y3 (recurrent parent) and K326 (donor parent). This map included 4895 bin markers with a genetic distance of 2885.36 cM and an average genetic distance of 0.59 cM. Based on the genotype of these located markers, a binmap was also constructed using the chromosome as a unit.

Fig. 10.2
A diagram illustrates an Infinium HD consensus map of S N P based on the high density genetics of Nicotiana Tabacum.It indicates the fine mapping of the different traits of interest.

N. tabacum 30 k Infinium HD consensus map 2015 (https://solgenomics.net/cview/map.pl?map_version_id=178)

Fig. 10.3
A diagram illustrates the linkage map designed using the N. tabacum genome as a reference, having a total of 45,081 identified SNP markers covering a total length of 3486.78 centimorgan.

Linkage map constructed based on the tobacco reference genome by Tong et al. (2020). These linkage maps were constructed with 45,081 SNP markers (with 7038 bin markers) covering a total length of 3486.78 cM

Lu et al. (2012) developed a high-density integrated linkage map (2291 cM) of flue-cured tobacco that included 851 markers [238 DArT and 613 SSR] in 24 linkage groups. Gong et al. (2016) generated a high-density 2662.43 cM length integrated genetic map of flue-cured tobacco containing 4215 SNPs and 194 SSRs distributed on 24 linkage groups (LGs) with an average distance of 0.60 cM between adjacent markers.

10.7.5.4 Intra Type Genetic Maps

Ma et al. (2008) constructed an intra type genetic map containing 26 linkage groups and 112 markers by using flue-cured and burley tobaccos based on SRAPs and ISSR markers.

Currently, the high-density maps in tobacco are constructed with SSR (Bindler et al. 2011) and SNP (Gong et al. 2016; Tong et al. 2020) markers. The widely referred SSR map of Bindler et al. (2011) was constructed with 2318 microsatellite markers covering a total length of 3270 cM while the SNP map of Tong et al. (2020) covers 3486.78 cM with 45,081 SNPs. The combination of SNPs and genetic maps, if developed, helps in designing precise breeding strategies and genomic selection in tobacco. Diverse genetic maps constructed can be effectively utilized in mapping QTLs, positional cloning, comparative genomics analysis, marker-assisted breeding and genomic selection etc. It would be necessary to further build the genetic linkage maps of tobacco in different cultivating types for their effective utilization in breeding of those types.

10.7.6 Enumeration of Mapping of Simply-Inherited Stress Related Traits

Availability of draft Nicotiana genome sequences (Sierro et al. 2014; Edwards et al. 2017)and high-density molecular maps in recent times are laying the foundation for trait discovery and fine mapping of trait of interest in tobacco (Yang et al. 2019). Linked molecular markers to various biotic stress namely black root rot, black shank, wildfire, blue mold, brown spot, powdery mildew, TMV, PVY, TSWV, root-knot etc. are identified in tobacco for their use in breeding programs are enumerated at Table 10.6. Majority of the markers found to be dominant ones.

Table 10.6 Biotic stress resistance traits and linked markers in tobacco

10.7.7 Framework Maps and Markers for Mapping Resistance QTLs

Framework maps are being constructed mostly using SSR and SNP markers that are identified and mapped to linkage groups in tobacco (Tong et al. 2020; Lu et al. 2012). High density SSR map of Bindler et al. (2011) constructed with 2318 microsatellite markers covering a total length of 3270 cM and the SNP map of Tong et al. (2020) covering 3486.78 cM with 45,081 SNPs can be used in constructing framework maps while mapping various traits (Edwards et al. 2017).The release of the SNP-based high density genetic map, N. tabacum 30 k Infinium HD consensus map 2015 can be one of the finest resources for fine mapping any trait of interest (https://solgenomics.net/cview/map.pl?map_version_id=178).

RAPD and SSR markers are the most widely used markers in QTL mapping of biotic traits in tobacco followed by SNP and AFLP markers (Table 10.7). In few cases, RAPD markers are converted into SCAR markers for reproducibility and reliability. SCAR markers are developed after sequencing RAPD bands and designing 18–25 base PCR primers that can specifically amplify the sequenced DNA segment more reliably. CAPS, COS, Random amplified microsatellite polymorphism (RAMP), ISSRs and Target region amplification polymorphism (TRAP) are some of the other markers used. CAPS markers developed are the primers designed on the known sequence of a gene of Interest. COS primers used are universal primers based on sequence alignments of orthologs (genes that are conserved in sequence and copy number) from multiple solanaceous species. RAMP markers include SSR primers that amplify the genomic DNA in the presence or absence of RAPD primers (Liu et al. 2009b). TRAP are two PCR-based primers, one from target EST and the other is an arbitrary primer.

Table 10.7 Details of few trait wise QTLs identified in tobacco

10.7.8 QTL Mapping Software Used

Mapmaker/Exp 3.0 is the most widely used QTL mapping software in QTL mapping various biotic traits in tobacco followed by various version of Join Map and Map Chart (Table 10.7). Mapmaker/QTL, QTL IciMapping 4.1, QTL Network 2.1, R/QTL, AYMY-SS, Stat Graphics Plus 5.0 and QTL Cartographer V 2.5 are some of the other softwares used.

10.7.9 Details on Trait Wise QTLs

QTLs have been identified for various biotic traits in tobacco for their use in gene introgression and genome selection (Table 10.7). Details of QTLs developed for resistance to Bacterial wilt, Brown spot, Black Shank and CMV are briefly discussed here.

10.7.9.1 Resistance to Bacterial Wilt Disease

The resistance for bacterial wilt in tobacco is governed by either polygenes or in combination of polygenes and a major gene (Smith and Clayton 1948; Kelman 1953; Matsuda 1977; Hayward 1991). Furthermore, the influence of environment makes difficulty in screening/phenotyping, hence it is essential to identify closely linked molecular marker to developing resistant varieties through marker-assisted selection. Earlier, QTLs affecting the resistance to bacterial wilt have been reported on chromosome 6, chromosome 7, and chromosome 12 (Margin et al. 1999; Danesh et al. 1994). Nishi et al. (2003) detected a QTL for the bacterial wilt resistance of W6 explaining more than 30% of the variance. Qian et al. (2013) detected four QTLs viz. qBWR-3a, qBWR-3b, qBWR-5a and qBWR-5b in linkage group 3 and 5 strongly associated with resistance and explaining 9.00, 19.70, 17.30, and 17.40% of the variance in resistance, respectively. These loci had the close linkage with the markers PT20275 and PT3022. Lan et al. (2014) identified eight QTLs with significant main effects on chromosomes 2, 6, 12, 17 and 24 through genome wide QTL analysis. Also detected a major QTL (qBWR17a) on chromosome 17 which explained up to 30% of the phenotypic variation.

10.7.9.2 Resistance to Black Shank Disease

Clayton (1958) suggested that resistance to black shank disease in tobacco was simply inherited and controlled by recessive alleles. Moore and Powell (1959) reported resistance to be partially dominant and affected by modifying factors. Others have suggested that in the line Florida 301 resistance to be polygenic and additive in nature (Crews et al. 1964; Chaplin 1966). Incomplete penetrance and variable expressivity have probably complicated the interpretation of data generated to investigate the inheritance of resistance in this line. Vontimitta and Lewis (2012a, b) suggest that Beinhart 1000 and Florida 301 share a major gene affecting black shank resistance, but probably differ in allelic variability at a fair number of additional loci having smaller effects. The genetic control of black shank resistance in Florida 301 is of the classic polygenic type and that it is controlled by a combination of a few genes with large effects and a greater number of genes with small to intermediate effects. Zhang et al. (2018) identified stable QTLs (qBS7 and qBS17) for resistance to black shank disease using the F2, BC1F2 individuals and BC1F2:3 lines derived from a cross between Beinhart 1000-1 and Xiaohuangjin 1025 (Fig. 10.4). QTL qBS7 was mapped to the region between PT30174 and PT60621 and explained 17.40–25.60% of the phenotypic variance under various conditions. One major QTL, labeled as Phn7.1, was found to be the largest contributor to partial P. nicotianae resistance in the highly black shank resistant cigar tobacco cultivars, Beinhart 1000 and Florida 301 and highly resistant flue-cured tobacco cultivar, K346 (Vontimitta and Lewis 2012b; Xiao et al. 2013; Ma et al. 2019). A second QTL, Phn15.1 with a large effect was identified on N. tabacum linkage group 15 in Beinhart 1000 (Ma et al. 2020). Gong et al. (2020) fine mapped the QTLs using the 177 F7:8-9 recombinant inbred lines generated from a cross between a resistant cultivar ‘Yunyan 85’ and a susceptible entry ‘Dabaijin 599’. A total of 10 QTLs associated with resistance to P. nicotianae across multiple environments were detected and two major QTL qBS7 and qBS14 were repeatedly identified under all five environments explaining 56 and 6.78% of the mean phenotypic variance with high logarithm of the odds (LOD) scores, respectively.

Fig. 10.4
A line graph illustrates the L O D value over the position for the B C subscript 1 F subscript 2, B C subscript 1 F subscript 2 is to 3 populations in the field and greenhouse in black shank disease.

Likelihood plots and position (cM) of the QTL associated with resistance to black shank in different conditions. Red line: LOD plots for the BC1F2 population; Green line: LOD plots for the BC1F2:3 population in the field; Black line: LOD plots for the BC1F2:3 population in the greenhouse; dashed lines represent the significant LOD threshold at the level of 2.35 (Zhang et al. 2018)

10.7.9.3 Resistance to Brown Spot

The cigar tobacco line Beinhart 1000 and flue-cured tobacco variety Jingyehuang are considered important sources of partial resistance in the USA and China, respectively. Beinhart 1000 is derived from a selection of tobacco Quin Diaz and presents a high level of partial resistance to brown spot. Using SSR markers, Tong et al. (2012a), detected three QTL for resistance to brown spot in tobacco on 2, 3, and 5 linkage groups using a F2 population derived from a cross between a brown spot susceptible variety Changbohuang and the resistant source Jingyehuang (Fig. 10.5). The major QTL mapped on the genetic linkage group three found to explain 14.3% of the phenotypic variation. In a study, Sun et al. (2018) evaluated F2, F2:3 and BC3F2:3 populations developed from a cross between a source of brown spot resistance Jingyehuang and a brown spot susceptible flue-cured variety NC82 for field resistance under different environments and identified QTLs by linkage mapping. A major QTL mapped on chromosome 15 explained 8.6–18.0% of the phenotypic variation under diverse conditions. Furthermore, based on association mapping using 219 accessions for their responses to brown spot at two sites, six significant marker-trait associations were detected. Out of these markers, the marker Indel53 exhibited the most significant association with resistance to brown spot and explained around 21% of the phenotypic variation at the two sites. An approximately 2-Mb physical interval at the locus of marker Indel53 contained 31 predicted genes and two of these genes (Nitab 4.5_0000264g0050.1 and Nitab 4.5_0000264g0130.1) were identified as probable candidate genes for resistance to brown spot.

Fig. 10.5 
A diagram illustrates a modified mapping of L 1 to L 23 Q T Ls, which depicts the resistant brown spots in flue-cured tobacco.

Mapping of quantitative trait loci conferring resistance to brown spot in flue‐cured tobacco (N. tabacum). Three QTLs for brown spot resistance were mapped on LG2a, LG3b and LG5, respectively (Tong et al. 2012a)

10.7.9.4 Resistance to Cucumber Mosaic Virus

Cheng et al. (2019) identified seven QTLs, including two for incidence of disease and four for disease index for CMV in BC1F1 population at seedling stage. qID5 for incidence of disease was mapped to the interval mk6533mk646 on LGs 5 explaining 7.70% of the total phenotypic variance. For disease index, qDI8 mapped on LG 8 explained 7.20% of total phenotypic variance indicating stable genetic effects in diverse environments.

10.7.10 Mendelization of QTL

The QTL, Phn7.1 was found to have an additive effect on resistance to black shank disease and the corresponding QTL was localized to within a genetic interval of approximately 3 cM (Ma et al. 2019).A second QTL, Phn15.1 on N. tabacum linkage group 15 in Beinhart 1000 was localized to a genetic interval of approximately 2.7 centimorgans using subNILs containing varying amounts of Beinhart 1000-derived Phn15.1-associated genetic material (Ma et al. 2020). Incorporation of this allelic variability into breeding programs could increase the level, range, and durability of genetic resistance to P. nicotianae in to be released newly tobacco cultivars. However, Phn15.1 is very closely linked to the gene NtCPS2 associated with Z-abienol biosynthesis (Vontimitta et al. 2010). Z-abienol is a trichome exudate that contributes to flavor and aroma characteristics of Oriental and some cigar tobacco but is considered undesirable for flue-cured and burley tobacco. For the effective utilization of Phn15.1 in flue-cured and burley tobacco cultivar development, Ma et al. (2020) disassociated the favorable Beinhart 1000 Phn15.1 alleles from the Beinhart 1000 NtCPS2 allele.

10.8 Marker-Assisted Breeding for Resistance Traits

Marker-assisted breeding (MAB) refers to a breeding program in which detection of DNA markers and selection of desirable genotypes are integrated. The status and prospects of MAB are discussed here under.

10.8.1 Germplasm Characterization and DUS

Germplasm are to be characterized for desirable traits and tightly linked molecular markers for the traits need to be identified. Identification of reliable linked markers is critically important to initiate a marker-assisted breeding program. Polymorphism for DNA markers are available throughout the genome and DNA markers can be detected at any stage of plant growth. The presence or absence of these markers is not affected by environments and usually do not directly affect the phenotype. Identification and selection of markers located in close proximity to the target gene or within the gene will ensure the success in selection of the target gene. Therefore, DNA markers are the predominant types of genetic markers for MAB (Xu 2010). These tightly linked markers can be utilization in MAB while screening parents, F1 and other segregating materials for selecting plants with desirable traits. Molecular-assisted breeding of pest/disease resistant tobacco plants can help to identify the resistant plants in early stage, cut down on workload by way of avoiding inoculation procedure of pests/disease, increase selection efficiency, accelerate rapid utilization of the resistant sources, and shorten the breeding cycle (Liu and Zhang 2008).

The information on marker trait associations in germplasm lines and mapping populations can be obtained through gene mapping, QTL analysis, association mapping, classical mutant analysis, linkage or recombination analysis, bulked segregant analysis, etc. It is also essential to know the linkage state i.e. cis/trans (coupling or repulsion) linkage with the desired allele of the trait. Most commonly used molecular markers in tobacco include RAPD, AFLP, SSR, SCAR, CAPS, dCAPS (derived CAPS), and KASP (Yang et al. 2019). Each type of DNA markers has advantages and disadvantages for specific purposes. Relatively speaking, SSRs have most of the desirable features and availability of large number of SSRs make them markers of choice in tobacco. SNPs require a detailed understanding of single nucleotide DNA changes responsible for genetic variation among individuals. Fairly large number of SNPs have become available in tobacco making them important choice of markers for MAB in tobacco.

Genetic mapping, QTL analysis and association mapping (AM) have accelerated the dissection of genetic control of biotic stress resistant traits in tobacco. Large number of studies for screening germplasm and sources of resistance are made in tobacco to identify closely linked markers to various biotic stresses for their utilization introgression into cultivated varieties. Tightly linked markers are important tools for DUS characterization of varieties also. Tightly linked molecular markers/genic markers/QTLs were identified for some of the traits conferring resistance to various biotic stresses particularly diseases in tobacco are listed at Tables 10.6 and 10.7. This information has the potential to make marker-assisted selection (MAS) a successful option for tobacco improvement.

10.8.2 Marker-Assisted Gene Introgression

Marker-assisted backcrossing (MABC) is the simplest form of marker-assisted gene introgression that is most widely and successfully used in transferring biotic stress resistant genes into elite cultivars. MABC aims to transfer one or a few genes/QTLs for resistance from one genetic source (donor parent) into a superior cultivar or elite breeding line (recurrent parent) to improve the stress resistance. In contrast to traditional backcrossing, MABC depends on the molecular markers linked to gene(s)/QTL(s) of interest in the place of phenotypic performance of target trait. MABC program with two types of selections viz. foreground selection for the marker allele(s) of donor parent at the target locus (ex. Resistance) and background selection for the marker alleles of recurrent parent in all genomic regions of desirable traits (agronomic traits) except the target locus may be made effective transfer of resistance into elite genotypes (Hospital 2003). Foreground selection ensures the transfer of target trait from donor parent and background selection takes care of the genome recovery of recurrent parent.

MAS can be used when other characters are to be combined from two parents along with resistance trait. However, MAS will be more effective for simply inherited character controlled by a few genes than for a highly complex character governed by large number of genes. In tobacco, SCAR markers were used in marker-assisted breeding of black shank, black root rot, PVY, blue mold, and TMV disease resistance breeding (Whitham et al. 1994; Johnson et al. 2002; Milla et al. 2005; Julio et al. 2006; Li and Miller 2010).

10.8.3 Gene Pyramiding

Pyramiding of multiple genes/QTLs may be achieved through multiple-parent crossing or complex crossing, multiple backcrossing, and recurrent selection. A suitable breeding scheme for marker-assisted gene pyramiding depends on the number of genes/QTLs required to be transferred, the number of parents that contain the desired genes/QTLs, the heritability of traits of interest, and other factors (e.g. cost of genotyping). Pyramiding of three or four desired genes/QTLs existing separately in three or four lines can be realized by three-way, four-way or double crossing, convergent backcrossing or stepwise backcrossing. More than four genes/QTLs can be pyramided by way of complex or multiple crossing and/or recurrent selection.

Gene pyramiding through MABC may be achieved through three different strategies or breeding schemes viz. stepwise, simultaneous/synchronized and convergent backcrossing or transfer. In the stepwise backcrossing, the target genes/QTLs are transferred from donor parents into the recurrent parent (RP) in order one after the other. In the first step of backcrossing, one gene/QTL is targeted and transferred, followed by next step of backcrossing for another gene/QTL, until all target genes/QTLs are introgressed into the RP. The advantage is that gene pyramiding through stepwise backcrossing is more precise and easier to adopt as it involves only one gene/QTL at a time requiring small population size and lower genotyping cost. The disadvantage of this method is it takes a longer time to complete. In the simultaneous or synchronized backcrossing, the recurrent parent is first crossed to each of the donor parents and the resultant single-cross F1s are crossed with each other to produce two double-cross F1sand then the two double-cross F1s are crossed again to produce a hybrid integrating all target genes/QTLs in heterozygous state. The hybrid and/or progeny with heterozygous markers for all the target genes/QTLs are subsequently crossed back to the RP until the satisfactory recovery of the RP genome. Finally, homozygosity of the RP genome recovered line can be achieve through selfed. Simultaneous or synchronized backcrossing takes shorter time to transfer multiple genes, however, requires a large population and more number of genotyping as all target genes/QTLs are involved at the same time. Convergent backcrossing uses both stepwise and synchronized backcrossing strategies. First each of the target gene/QTLs from the donors are transferred separately into the recurrent parent through single crossing followed by backcrossing based on the linked markers to produce improved lines. The improved lines are crossed with each other and the resultant hybrids are then intercrossed to integrate all the genes/QTLs together to develop the final improved line with all the genes/QTLs pyramided. Convergent backcrossing not only reduce time (compared to stepwise transfer) but also easily fix and/or pyramid genes (compared to simultaneous transfer).

Marker-assisted complex or convergent crossing (MACC) can be undertaken to pyramid multiple genes/QTLs if all the parents are improved cultivars with complementary genes or favorable alleles for the traits of interest. In MACC, the hybrid of convergent crossing is self-pollinated for several consecutive generations along with MAS for target traits until genetically stable lines with desired marker alleles and traits have been developed. Detection and selection of most important genes/QTLs in early generations and less important markers in later generations can effectively reduce population size and avoid loss of important genes/QTLs.

Theoretically, application of MABC and MACC for pyramiding target genes/QTLs is possible in tobacco through various schemes discussed above. However, information is not currently available about the release of commercial cultivars resulted using these strategies.

10.8.4 Limitations and Prospects of MAS and MABC

Though MAS and MABC breeding has number of advantages, it may not be universally useful (Jiang 2013). Rapid DNA extraction technique and a high throughput system of marker detection are essential to handle a large number of samples and a large-scale screening of multiple markers in breeding programs. Development of suitable bioinformatics and statistical software packages are required for meeting the efficient and quick labeling, storing, retrieving, processing and analyzing large data set requirements, and even for integrating data sets available from other programs. Hence, the startup expenses and labor costs involved in MAS and MABC breeding are higher than conventional techniques making them not in the reach of all the researchers (Morris et al. 2003).

As the distance between the marker and the gene of interest increases, the chance of recombination between gene and marker increases there-by make the selection of resistant plants based on marker ineffective due to false positives. This may be avoided with the use of flanking markers on either side of the locus of interest in order to increase the probability that the desired gene is selected. Sometimes markers that were used to detect a locus may not be 'breeder-friendly'. Such markers viz. RFLP and RAPD may need to be converted into more reliable and easier to use markers. RFLP markers may be converted to STS (sequence tagged site) for detection via PCR protocols (Ribaut and Hoisington 1998) and RAPD markers into SCAR markers for reliable and repeatable amplifications (Milla et al. 2005; Lewis 2005). RAPD technique may be considered less reliable for MAS as RAPD results vary from lab to lab, largely due to low binding specificity of short (10-base) PCR primers. Hence, SCAR markers are developed by sequencing RAPD bands and designing more specific 18–25 base PCR primers to amplify the same DNA segment more reliably. Imprecise estimates of QTL locations and effects may result in slower progress than expected through MAS (Beavis 1998). Sometimes markers developed for MAS in one population may not be suitable for screening other populations due to absence of polymorphism for identified markers or lack of marker-trait association.

MAB is going to become a powerful and reliable tool in genetic manipulation of agronomically important traits in tobacco in view of the increasing utilization of molecular markers in various fields’ viz. germplasm evaluation, genetic mapping, map-based gene discovery, characterization of traits etc. Currently available high density linkage maps in tobacco provide a framework for identifying marker-trait associations and selecting markers for MAB. Markers linked to resistant traits discussed in previous sections can effectively be utilized in MAB in tobacco. However, only the markers that are closely associated with the target traits or tightly linked to the genes can offer adequate promise for the success in practical breeding. Availability of new high-throughput marker genotyping platforms for the detection of SSR and SNP markers along with the sequencing information of cultivated and wild relatives of Nicotiana going to have a great impact on discovering marker trait associations that can be used for MAS in the future. Array-based methods such as DArT (Lu et al. 2012) and single feature polymorphism (SFP) detection (Rostoks et al. 2005) offer prospects for lower-cost marker technology that can be used for whole-genome scans in tobacco. Rapid growth in genomics research and huge data generated from functional genomics in tobacco in the recent years is leading to the identification of many candidate genes for numerous traits including biotic stress resistance. SNPs within candidate genes could be extremely useful for ‘association mapping’ and circumvents the requirement for construction of linkage maps and QTL analysis for the genotypes that have not been mapped previously. The availability of large numbers of publicly available markers and the parallel development of user-friendly databases (Sol genome network, NCBI etc.) for the storage of marker and QTL data, increasing number of studies on genes and marker trait associations will undoubtedly encourage the more widespread use of MAS in tobacco.

Selection for all kinds of traits at seedling stagein MAB helps to minimize the costs as undesirable genotypes are eliminated at early stages. Closely linked markers allow the selection of disease/pest resistance traits even without the incidence of pests and diseases. MAS based on reliable markers tightly linked to the multiple genes/QTLs for traits of interest can be more effective in pyramiding desirable genes than conventional breeding. Use of co-dominance markers (e.g. SSR and SNP) in MAB allow effective selection of recessive alleles in the heterozygous state without selfing or test crossing, thus saves time and accelerate breeding progress. As more and more newer techniques are available genotypic assays based on molecular markers may be faster, cheaper and more accurate than conventional phenotypic assays and thus MAB may result in higher effectiveness and higher efficiency in terms of time, resources and efforts saved in future.

Conventional breeding methodologies have extensively proven successful in development of tobacco cultivars and germplasm. Subjective assessment and empirical selection plays a significant role in conventional breeding. As a new addition to the whole family of plant breeding, MAB has brought great challenges, opportunities and prospects for breeding crops including tobacco. However, as transgenic breeding or genetic manipulation does, MAB cannot replace conventional breeding but is a supplementary addition to conventional breeding. High genotyping costs and technical/equipment requirements of MAB will be major limiting factors for its large-scale deployment in the near future, especially in the developing countries (Collard and Mackill 2008). Therefore, integration of MAB into conventional breeding programs will be an optimistic strategy for tobacco improvement in the future. It can be expected that the drawbacks of MAB will be gradually overcome leading to its wide spread adoption in practical breeding programs as its theory, technology and application are further developed and improved.

10.9 Map-Based Cloning of Resistance Genes

10.9.1 Traits and Genes

High levels of redundancy between genes in the large and complex genome of tobacco with the absence of molecular markers and genomic resources till recent years made the identification and subsequent mapping of interesting mutants a very difficult prospect. However, having anchored 64% of the genome assembly to chromosomal locations in recent years, a possibility now exists to apply map-based biotic stress resistant gene discovery approaches in the species (Edwards et al. 2017). First reported instance of successful map based cloning in tobacco was done by Edwards et al. (2017) and cloned NtEGY1 and NtEGY2 homeologous candidate genes for YB1 and YB2 loci conferring white stem phenotype in recessive condition in burley tobacco. However, map-based cloning of pest and disease resistant genes based on genetic maps are not yet reported in tobacco.

10.9.2 Strategies: Chromosome Landing and Walking

Chromosome landing and walking strategies are used in identification of clones carrying gene of interest for map based cloning. Recently available high density genetic maps, genome sequences and Bacterial artificial chromosome (BAC) clones are paving the way for map based cloning of resistance genes in tobacco.

In the only reported case of map based cloning in tobacco, Edward et al. (2017) used a specific technique to clone genotyped pairs of NILs carrying dominant or recessive alleles of the YB1 and YB2 loci (cultivars SC58, NC95, and Coker 1) with a custom 30 K Infinium iSelect HD BeadChip SNP chip (Illumina Inc., San Diego, CA) used in developing a high density genetic map (N. tabacum 30 k Infinium HD consensus map 2015; https://solgenomics.net/cview/map.pl?map_version_id=178). Genomic regions comprising SNP polymorphisms that distinguished the nearly isogenic lines were identified and SNP markers closely linked to the loci were aligned to the genome assembly and predicted potential candidate genes. Coding regions of candidate genes were then amplified from first-strand cDNA from tobacco cultivars K326 and TN90 using the primers specifically designed. Amplified fragments were then cloned into a vector.

10.9.3 Genomic Libraries

Availability high-capacity genomic libraries are essential resources for physical mapping, comparative genome analysis, molecular cytogenetics etc. Such libraries are also powerful tools for large-scale gene discovery, elucidation of gene function and regulation, and map-based cloning of target trait loci or genes associated with important agronomic and resistant traits for their further study and use in crop improvement programs. BAC libraries are the large DNA insert libraries (inserts of DNA up to 200,000 base pairs) of choice for genomics research. Cloning of larger DNA segments (more than 1000 kb) are possible with Yeast Artificial Chromosome (YAC) libraries and greatly facilitates chromosome walking and physical mapping around the target locus. While, transformation-competent artificial chromosome (TAC) libraries make it possible to clone and transfer genes efficiently into plants. In recent years, BAC libraries are constructed and utilized in tobacco for genome sequencing, mapping and comparative genome analysis. However, construction of YAC and TAC libraries are not currently reported in tobacco.

Tobacco Genome Initiative (TGI) generated BACs library (9.7-fold genome coverage) for assembling the partial genome of Hicks Broadleaf variety (Opperman et al. 2003; Rushton et al. 2008). Sierro et al. (2013b) used 425,088 BAC clone library for construction of physical map and ancestral annotation of tobacco cultivar, Hicks Broadleaf. Edwards et al. (2017) constructed two libraries containing 150,528 BACs from K 326 variety using HindIII or EcoRI, with average insert sizes of 115 kb and 135 kb, respectively (representing ~8 × coverage of the genome) and used for generating a whole-genome profile (WGP) map from sequence reads at EcoRI and HindIII restriction sites. Jingjing (2018) reported a tobacco genome sequence of the HongDa cultivar, which has been produced by the combination of BAC-to-BAC and whole-genome shotgun technologies. Dong et al. (2020) constructed a BAC library of 414,720 clones using blank shank resistant flue-cured tobacco line, 14–60 with an average insert size of 123 kb ranging from 97.0–145.5 kb covering 11 times of genome equivalents. Further confirmed the utility of this library by screening the library with gene specific primers. A BAC library of wild tobacco, N. tomentosiformis, one of the parent of N. tabacum was constructed with inserted DNA size ranging from 50 to 200 kb and an estimated average size of 110 kb (Yuhe 2012). These libraries are important resources for map based cloning of resistant traits through various strategies.

10.9.4 Test for Expression (Mutant Complementation)

Transformation of a cloned gene into mutant plant and looking for wild phenotype rescue will help to validate the function of the target gene. However, currently mutant complementation studies with cloned genes are not reported in tobacco due to the absence of map based cloning of functional genes in general and biotic stress resistant genes particular.

10.10 Genomics-Aided Breeding for Resistance Traits

10.10.1 Structural and Functional Genomic Resources Developed

The aim of structural genomics is to characterize the structure of the genome. Understanding the genome structure of an individual can be advantageous in manipulating genes and DNA fragments in that species. In tobacco, huge data on genomic resources have been generated from in-depth genomic studies by various researchers. These resources are available for furthering the research in the area of genomics, gene tagging, identification, isolation and cloning for genome assisted breeding in tobacco as well as other crop species.

With advances in next-generation sequencing (NGS) technologies, decoding the genome sequences of 12 Nicotiana species viz. N. tabacum, N. rustica, N. attenuata, N. benthamiana, N. knightiana, N. obtusifolia, N. otophora, N. paniculata, N. undulata, N. tomentosiformis, N. sylvestris and N. glauca has been completed. Sequencing details of Nicotiana Spp. are presented at NCBI website (https://www.ncbi.nlm.nih.gov/) and Sol Genome Network (SGN) (Asaf et al. 2016).

Plastid genomes of 10 tobacco species: N. tabacum, N. attenuata, N. tomentosiformis, N. sylvestris, N. otophora, N. knightiana, N. rustica, N. paniculata, N. obtusifolia and N. glauca are sequenced and data made available (Asaf et al. 2016; Mehmood et al. 2020).

The developments in Transcriptomics have resulted in the development of large data sets and tools for the progression of functional genomics in tobacco. A database of 2513 tobacco (N. tabacum) TFs representing all of the 64 well-characterized plant TF families created using a dataset of 1,159,022 gene-space sequence reads (GSRs) (Rushton et al. 2008). Further, the transcriptional activity for thousands of tobacco genes in different tissues throughout the lifecycle of the tobacco from seed to senescence based on tobacco expression microarray from a set of over 40 k unigenes (a set of transcripts that appear to stem from the same transcription locus) and gene expression in 19 different tobacco samples has been generated (Edwards et al. 2010). 772 of 2513 transcription factors earlier recognized in tobacco were mapped to the array, with 87% of them being expressed in at least one tissue in the generated Tobacco Expression Atlas (TobEA). Based on the co-expression of these transcription factors, putative transcriptional networks could be identified. SGN contains the collection of transcriptome sequences of N. sylvestris (32,276), N. tomentosiformis (31,961) and N. tabacum (26,284) from transcriptome projects and unigenes data sets of N. sylvestris (6300), N. tabacum (84,602) and N. benthamiana (16,024).

The data hosted at NCBI and SGN databases includes majority of the available genomic resources information pertaining to cultivated tobacco and wild Nicotiana spp. Hence, the available information in these databases was discussed here as an indicator of genome resource availability in Nicotiana. Large collection of data on nucleotides, genes and protein sequences submitted by various researchers on Nicotiana are available at NCBI site (Table 10.8). More than 33 lakh nucleotide sequences of 20 Nicotiana spp. that includes genomic DNA/RNA, mRNA, cRNA, ncRNA, rRNA, tRNA and transcribed RNA is generated by various researchers and is available at NCBI website as on 30.09.2021 (Table 10.8). Among them, around 895,700 sequences are comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein. These sequences also included 456,507 expressed sequence tags (ESTs) and 1,420,639 genomic survey sequences (GSS). Further, a total of over 201,560 records of gene sequences belonging to 12 Nicotiana spp. viz. N. tabacum, N. tomentosiformis, N. sylvestris, N. attenuata, N. undulate, N. otophora, N. suaveolens, N. glauca, N. stocktonii, N. repanda, N. amplexicaulis and N. debneyi are existing at NCBI (Table 10.8).

Table 10.8 Nucleotide, gene, SRA, GEO and protein sequences of Nicotiana spp.

Sequence Read Archive (SRA) data is the largest publicly available repository of high throughput sequencing data and is accessible through several cloud providers and NCBI servers. The archive admits data from different branches of life including metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis (https://www.ncbi.nlm.nih.gov/sra). Nearly, 5080 records of SRA data of 20 Nicotiana spp. are available at NCBI website (as on 30.09.2021) (Table 10.8). Around 4860 curated gene expression datasets as well as original series and platform records of 11 Nicotiana spp. are available at Gene Expression Omnibus (GEO) repository of NCBI as on 30.09.2021 (Table 10.8; https://www.ncbi.nlm.nih.gov/gds). Around 275,000 collection of protein sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and Third Party Annotation (TPA) Sequence, as well as records from other data bases are available for 20 Nicotiana spp. at NCBI (Table 10.8).

Genomic resource collection of SGN include information on genome sequences, transcriptome sequences, mRNAs, and predicted proteins of one or the other five wild Nicotiana spp. namely N. attenuata, N. benthamiana, N. tomentosiformis, N. sylvestris and N. otophora and four versions of N. tabacum. SGN also hosts 39 transcript libraries of N. tabacum and two of N. sylvestris are also present. Though both NCBI and SGN utilize the information from same bio-projects, the number of sequences, predicted proteins, CDS and mRNA for different species are not similar in these databases due the use of different bioinformatics softwares and gene prediction model in analyzing the sequences.

The proteomic data generated globally is stored and accessed through the Universal Protein Resource (UniProt). UniProt provides several sets of proteins (proteomes) thought to be expressed by organisms whose genomes have been completely sequenced (https://www.uniprot.org/). There are 73,606 protein entries associated with Nicotiana tabacum proteome (UP000084051) in the UniProt database as on 31.03.2021.

10.10.2 Details of Genome Sequencing

10.10.2.1 Nuclear Genome Sequencing

Though, tobacco have the same basic chromosome number of n = 12 as of other solanaceous crops such as tomato, potato, chili and eggplant, its genome size (4.5 Gbp) is largest in the Solanaceae (Arumuganathan and Earle 1991) with a large proportion of repetitive sequences (Zimmerman and Goldberg 1977; Kenton et al. 1993; Leitch et al. 2008; Sierro et al. 2014). N. tabacum genome is 50% larger than the human genome. Analysis of the tobacco genome has been started in the last decade. Tobacco Genome Initiative (TGI) have initiated the first tobacco genome sequencing project in 2003 by the with the aim of sequencing the open reading frames of N. tabacum utilizing the methyl-filtration technology for reducing genome complexity (Opperman et al. 2003; Rushton et al. 2008). The Hicks Broadleaf variety, ancestor for some of the currently cultivated flue-cured tobacco cultivars, was chosen as the genotype for generation of bacterial artificial chromosome (BAC) libraries used for sequencing because of its low introgression content (Sierro et al. 2013b). The project was completed in 2007 and the sequences are available from the National Center for Biotechnology Information (NCBI) GenBank (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA29349). Around 689 Mb of genomic sequence generated through Sanger sequencing was assembled into 81,959 contigs with an average length of 1.2 kb and 871,255 singletons with an average length of 688 bp (Wang and Bennetzen 2015). However, the generated sequences comprised only a small portion of the tobacco genome, mainly because the sequencing technology (methylation filtration) was employed for only enriched genes, and for the portions of genes that are under methylated relative to TEs.

With the advances in next-generation sequencing (NGS) technologies, there is a rapid progress in sequencing of entire genomes of cultivated tobacco and important wild relatives in recent times (Table 10.9). Sequencing of three N. tabacum cultivars have been completed since 2014. Eleven complete genomes of wild Nicotiana species viz. N. knightiana, N. paniculata, N. rustica, N. glauca, N. obtusifolia, N. otophora, N. attenuata, N. sylvestris, N. tomentosiformis, N. undulata and N. benthamiana were sequenced and released since 2013. The scaffold level assemblies of N. sylvestris and N. tomentosiformis having 94.0× and 146.0× genome coverage and a length of 2222 and 1688 Mb, respectively being the first sequences released in 2013 (Sierro et al. 2013a). This was followed by the release of three genome assemblies of N. tabacum, one at Scaffold level (cv. TN90) and two at contig (cv. K326 and Basma Xanthi -BX) level by Philip Morris International in 2014 (Sierro et al. 2014). The length of these assemblies was around 3700 Mb with a GC content of 39% and genome coverages ranging from 29.0 to 49.0×, while the assembly of TN 90 is used as reference genome. Reference genome is a comprehensive, integrated, non-redundant and well-annotated set of sequences for a given genome. Later, an improved version of N. tabacum (cv. K 326) assembled at scaffold level covering the 86.0× of genome (4600 Mb) with a GC content of 33.5% was submitted by British American Tobacco in 2017 (Edward et al. 2017). This assembly covers over 4 Gb of non-N sequence (90% of predicted genome size), which is an increase from 3.6 Gb (81% of predicted genome size) in the previously published version (Sierro et al. 2014). This assembly achieved an N50 size of 2.17 Mb and anchored 64% of the genome to pseudomolecules; a significant increase from the previous value of 19% providing more complete coverage of the tobacco genome.

Table 10.9 Genome sequencing details of Nicotiana species as (available at NCBI site)

Currently, a total of 16 assemblies of 12 Nicotiana species viz. N. sylvestris, N. tomentosiformis, N. benthamiana, N. tabacum, N. otophora, N. attenuata, N. obtusifolia, N. glauca, N. knightiana, N. paniculata, N. rustica and N. undulata are obtainable at NCBI genbank with 18.0× to 146.0× genome coverages. Two assemblies are available for N. attenuata and four for N. tabacum. N. attenuata reference sequence (2366 Mb) is assembled at chromosomal level (12 haploid), while N. tabacum (K326, and Basma Xanthi) and N. benthamiana at contig level. All the other Nicotiana assemblies are available at scaffold level. Genome coverage of the sequences range from 18.0× in N. attenuata strain UT to 146.0× in N. tomentosiformis. GC content of the assemblies ranges from 29.1 to 41.3% and length ranges from 62 Mb in N. benthamiana (diploid genome) to 4647 Mb in N. tabacum cv. K326 (tetraploid). Most of the genomes were sequenced using high throughput Illumina Hiseq sequencing technology. The detail statistics including assembly level and their N50 and L50 values for each genome was also provided at Table 10.9.

Genomic resource collection of five wild Nicotiana spp. and four of N. tabacum are available at Sol Genome Network (SGN) (Fernandez-Pozo et al. 2015). They include contig level methylation filtered genome sequences generated under TGI project, and genomes, predicted proteins, and mRNAs N. tabacum cv. BX, N. tabacum cv. K326 and N. tabacum cv. TN90 and genome scaffolds, proteins and cDNA of improved K326 assembly (Sierro et al. 2014; Edwards et al. 2017). Further, scaffold level genome assemblies of five Nicotiana species namely N. attenuata, N. tomentosiformis, N. benthamiana, N. sylvestris and N. otophora also available at this database along with predicted proteins and mRNA for first four species. Chromosome level assembly is available for N. attenuata. Though the source of these assemblies is same as of NCBI, the sequence info is not similar due to differences in bioinformatics softwares employed in analysing the sequences.

10.10.2.2 Organelle Genome Sequencing

Plastid and mitochondrial genomes of tobacco are circular DNA molecules. Sequencing of chloroplast genome of tobacco, for the first time, was completed in 1986 (Shinozaki et al. 1986) and till then number of studies made to sequence chloroplast genomes of different entries of N. tabacum and its wild relatives. Currently, sequencing of about 219 complete plastid genomes including 12 popset related to five Nicotiana spp. are completed (Table 10.10). The PopSet data is a collection of related DNA sequences derived from population, phylogenetic, mutation and ecosystem studies. The size of the plastid genome of Nicotiana species is around 0.156 Mb. In addition, recently, Mehmood et al. (2020) assembled the plastid genomes of five more tobacco species: N. knightiana (155,968 bp), N. rustica (155,849 bp), N. paniculata (155,689 bp), N. obtusifolia (156,022 bp) and N. glauca (155,917 bp) and made comparisons among themselves. Reference plastid genomes of five Nicotiana species namely N. tabacum (155,943 bp), N. attenuata (155,886 bp), N. tomentosiformis (155,745 bp), N. sylvestris (155,941 bp), and N. otophora (156,073 bp) are available at NCBI (Table 10.9).

Table 10.10 Nicotiana organelle genome resources available at NCBI (as on 31.03.2021)

Sequencing of mitochondrial complete genome of tobacco started in 2003 (Sugiyama et al. 2005)). Eight mitochondrial complete genome sequences including 5 popsets from three Nicotiana species namely N. tabacum (430,597 bp), N. attenuata (394,341 bp) and N. sylvestris (430,597 bp) are completed till date and the details are available at NCBI site (Table 10.10). Further, reference mitochondrial genomes are made available for N. tabacum, N. attenuata and N. sylvestrisat the NCBI.

10.10.3 Gene Annotation

Genome annotation is the process of identifying functional elements along the sequence of a genome. Assigning function to genome sequence is necessary because the sequencing of DNA produces sequences of unknown function. Once a genome is sequenced, it needs to be annotated to understand its functions for its further successful utilization in genetic manipulation. In tobacco, both nuclear and organelle genomes are successfully annotated and annotation reports are available at various data bases. Gene annotation records available at NCBI and SGN databases are discussed here they host majority data of the Nicotiana sp.

NCBI Eukaryotic Genome Annotation Pipeline is an automated annotation pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. Currently, annotation reports are available for four Nicotiana species viz. N. sylvestris, N. tabacum cv. TN 90, N. attenuata strain UT and N. tomentosiformis at NCBI website (Table 10.11). The first annotation was released in 2014 for N. sylvestris followed by N. tabacum cv. TN 90 (in 2016), N. attenuata strain UT (2016) and N. tomentosiformis (2020). Before annotation the repeat sequences in the reference sequences were masked with RepeatMasker making use of curated libraries of repeats and WindowMasker software (Morgulis et al. 2006). RepeatMasker masked 1.33–3.0% of genomes while WindowMasker masked 47.93–53.07%. Based on gene prediction tools, genes and pseudogenes, protein-coding and non-coding sequences etc. were estimated for all four genomes. While, alignments with RefSeq and other known transcripts and proteins were made to predict mRNAs, other RNAs and proteins. The final set of annotated proteins from the Nicotiana species was aligned against the available known RefSeq proteins from Arabidopsis thaliana, utilizing the high-quality proteins as the target and the annotated proteins as the query. Protein with an alignment covering 50% or more of the query and 95% or more of the query are identified in each of the annotated Nicotiana species. The details of the annotations are given at Table 10.11.

Table 10.11 Gene annotation reports of Nicotiana species (as per NCBI)

Annotated gene predictions are made available for published genomes of N. attenuata, N. benthamiana, N. tomentosiformis, N. sylvestris and 4 versions of N. tabacum at SGN site also (Table 10.12). For N. tabacum, predicted proteins were ranging from 69,500 to 122,388 and mRNA from 145,503 to 189,413. However, less number of proteins (33,449–54,497) and mRNA (33,449–87,234) were predicted for Nicotiana species compared to N. tabacum.

Table 10.12 Gene annotation reports of Nicotiana species (as per SGN)

Nicotiana tabacum proteome (UP000084051) in the UniProt database contains 73,606 protein entries as on 31.03.2021 (https://www.uniprot.org/). Biotic stress response related proteins are also included in this list. Edward et al. (2017) identified predicted proteins showing good cross-over with the related Solanaceae species tomato and potato in addition to other flowering plants based on gene ontology analysis.

Annotations are also done for published organelle genome sequences predicting genes and proteins (Table 10.10). For plastid genomes, predicted genes in five Nicotiana spp. vary from 129 to 155 and proteins from 84 to 108. In mitochondria, number of genes and proteins predicted are more for cultivated species, N. tabacum (183; 153, respectively) than for wild species, N. attenuata (68; 40) and N. sylvestris (64; 37).

10.10.4 Impact on Germplasm Characterization and Gene Discovery

Sequencing of Nicotiana spp. made it possible to compare the genomes of Nicotiana spp. among themselves and with other solanaceous crops. These studies assisted in identifying the relationships between cultivated and wild species, and their progenitor species in terms of sequence similarities and genome rearrangements (Wu et al. 2009; Sierro et al. 2014; Asaf et al. 2016; Gong et al. 2016; Edwards et al. 2017). The synteny between the genomes of N. tabacum cv. TN90, K326 and BX and those of tomato and potato could be evaluated at the protein level which leads to the detection of homologous genes (Sierro et al. 2014).

Annotation of published genome sequences assisted in the identification of functional sequences, and predicted mRNAs and proteins that can be expressed in tobacco. Though comparison of genome assemblies, genomic regions responsible for virus tolerance was recognized in draft genomes (Sierro et al. 2014). Nearly perfect match of N. glutinosa N gene (source of TMV resistance in many tobacco cultivar) sequence was found on the draft genome sequence of a TMV resistant cultivar, TN 90 and weak identity in susceptible genomes (K326 and BX genomes). Genes that are differentially expressed between PVY resistant and susceptible RIL plants were identified based on comparison with a reference transcriptome (Julio et al. 2014). Further, through annotation of differentiation expressed genes, Julio and co-workers confirmed that functional eIF4E gene mapped on chromosome 21 of the tobacco genetic map is responsible for PVY susceptibility in majority of the tobacco lines carrying dominant Va locus. One copy each of eIF4E1eIF4E2 and eIF (iso) 4E gene were identified in the N. sylvestris genome, whereas two copies of eIF4E1 and one copy of the other genes in N. tomentosiformis (Sierro et al. 2014). Except the N. sylvestris eIF4E1 gene in TN90, all identified N. sylvestris and N. tomentosiformis genes were observed in TN90, K326 and BX genomes. With this observation, Sierro et al. (2014) confirmed that the genomic deletion of the S-form eIF4E1 locus is responsible for TVMV, TEV and PVY resistance in TN90 and its presence for susceptibility in K326 and BX.

Based on sequencing of Nicotiana genomes and EST data, large number of SSRs and SNPs were identified (Bindler et al. 2011; Tong et al. 2012b, 2020; Cai et al. 2015; Xiao et al. 2015; Thimmegowda et al. 2018; Wang et al. 2018). The identified markers have been used for characterization of germplasm, diversity studies, DUS testing and genetic relatedness of cultivated varieties etc. (Moon et al. 2008, 2009a, b; Davalieva et al. 2010; Fricano et al. 2012; Gholizadeh et al. 2012; Prabhakararao et al. 2012; Xiang et al. 2017; Binbin et al. 2020). Wang et al. (2021) used core markers developed based on genotyping-by-sequencing for varietal identification and fingerprinting of cigar tobacco accessions. The high-density maps developed based on SSR and SNP markers will be useful in characterization of germplasm, and identification of target traits including biotic stresses. Genome-wide SNP markers were used for simultaneous association analysis of leaf chemistry traits in natural populations of tobacco germplasms (Tong et al. 2020). SSRs and SNPs were used to identify QTLs linked to biotic stress resistant traits viz. Bacterial wilt (Qian et al. 2013), Brown spot (Tong et al. 2012a; Sun et al. 2018), black shank (Ma et al. 2019, 2020; Gong et al. 2020) and CMV (Cheng et al. 2019).

The release of N. tabacum 30 k Infinium HD consensus map 2015 also provides the tobacco genetic research community with resources to detect genome-wide DNA polymorphisms, fine map and clone their trait of interest. Genome-wide DNA polymorphisms could be detected using the custom 30 K Infinium iSelect HD BeadChip SNP chip (Edwards et al. 2017). Map based cloning of target traits is becoming a reality in tobacco with cloning of two homeologous candidate genes conferring white stem phenotype in recessive condition in burley tobacco (Edwards et al. 2017).

Advances in tobacco genomics provide further means to advance the understandings of diversity at species and gene levels, and allows DNA markers to hasten the pace of genetic improvement. Discovery of novel genes/alleles for any given trait could be obtained through genotyping-by-sequencing, whole-genome re-sequencing, Sequence-based mapping etc. Genomics tools also enable rapid identification and selection of novel beneficial genes and their controlled incorporation into germplasm. Similarly, genome-wide association studies (GWAS) could help to identify the genomic regions controlling traits of interest in diverse collection of germplasms that are genotyped and phenotyped for traits of interest through statistical associations between DNA polymorphisms and trait variations. Genomics possesses the potential to increase the diversity of alleles available to breeders through mining the gene pools of crop wild relatives (CWRs).

10.10.5 Application of Structural and Functional Genomics in Genomics-Assisted Breeding

Recent advances in high-throughput sequencing and phenotyping platforms are transforming molecular breeding to genomics-assisted breeding (GAB). GAB is going to be the key in designing future tobacco cultivars though optimizing the tobacco genomes with accumulation of beneficial alleles and eradicating deleterious alleles (Varshney et al. 2021). Availability of draft Nicotiana genomes, transcriptome and metabolome profiles is paving the way for understanding the genomic areas, their ancestral origins, genes, gene products, expression patterns, control elements and associated allelic variations responsible for resistance to biotic stresses. Availability of molecular markers (Sects. 10.5.2 and 10.7.2), high density genetic maps (Sect. 10.7), structural and functional resources (Sect. 10.10), identification of DNA markers linked to traits of interest, identification and mapping of trait specific QTLs in recent years is going to be the starting points for the utilization of GAB an important tool for tobacco improvement.

Currently, application of structural and functional genomics in GAB in tobacco is in initial stage. Application of DNA markers to facilitate marker-aided selection (MAS), a preliminary form of GAB, for tobacco improvement is in commencement stage. As progress is made in genomics, genotypic and phenotypic datasets on training populations can be used to develop models to predict the breeding value of lines for genomic selection (GS). All the available marker data for a population can be used as predictors of breeding value of a line. Breeding value serves as a predictor of how well a plant will perform as a parent for crossing and generation advance in a breeding program, based on the resemblance of its genomic profile to other plants in the training populations that are known to have performed better in the target environment(s). The genotypic data obtained from a seed or seedling on all the favorable alleles can be used to predict the phenotypic performance of mature individuals without the need for extensive phenotypic evaluation over years and environments during GAB (Varshney et al. 2014). The time required for breeding will be drastically come down due to selection of desirable lines in early generations with accuracy without the influence of environment.

10.11 Recent Concepts and Strategies Developed

Genome designing involves all biotechnological interventions that results in accumulation of desirable alleles and elimination of undesirable alleles and gene combinations for realizing maximum possible genetic potential with tolerance to various stresses. In addition to MAS and genetic engineering, gene editing and nanotechnology are emerging as key concepts in genome designing of crop plants. Gene modification with these techniques are briefly discussed here.

10.11.1 Gene Editing

Gene editing (GE) techniques have revolutionized biological sciences via precise modifications in the genome of both plants and animals to yield desirable changes in the phenotypes. These technologies make structural changes to DNA of target genes or epigenetic changes to alter gene expression. The techniques used to edit or change the genomes are evolved from the earlier attempts like nuclease technologies, homing endonucleases, and certain chemical methods (Khan 2019). Molecular techniques like meganuclease (MegaN), TALENs, and ZFNs initially emerged as genome-editing technologies. Currently, there are several open engineering platforms available for construction of ZFNs and TALENs (Townsend et al. 2009). These initial technologies suffer from lower specificity due to their off-targets side effects. The latest discovery of the CRISPR/Cas9 nuclease system seems more encouraging in view of its higher efficiency and feasibility advancing the genome-engineering techniques to the level of molecular engineering. GE is broadly categorized into three generations: MegaN and ZFNs are first–generation tools, TALENs are second-generation tools, and CRISPR associated system is considered the third-generation tool. Third-generation GEs, e.g., CRISPR/Cas9, CRISPR–CRISPR from Prevotella and Francisella 1 (Cpf1) were found to be powerful tools for the successful modification of genome sequence in a precise and straightforward manner (Ahmar et al. 2021). In terms of delivery to the plant cell, techniques such as ZFN technology, TALENs and CRISPR/Cas9 typically use either Agrobacterium-mediated or protoplast transformation.

Meganuclease, also termed molecular DNA scissors originate from micro-organisms such as bacteria and yeasts and recognize relatively large DNA sequences (18 to 30 building blocks) in comparison with standard nucleases (Daboussi et al. 2015). Only very specific fragments that seldom occur in plant DNA are cut by meganucleases. Their potential to excise large pieces of DNA sequences was recognized as a genetic tool to modify DNA (Khan 2019). Double-strand breaks could be obtained at the target region in the genome under consideration through a sequence-specific nuclease. However, Meganuclease technology is costly and time-consuming (Townsend et al. 2009). The advantage with meganucleases is less toxicity in view of their natural occurrence and site specific cleavage ability. Modifying the target genes with specifically engineered meganucleases was first demonstrated in tobacco (Puchta 1999; Honig et al. 2015).

ZFNs were demonstrated as site-specific nucleases for cutting DNA at strictly defined sites, for the first time, in 1996. ZFNs are proteins composed of a zinc finger part and a nuclease part. By coupling the nuclease to a zinc finger, a protein that binds with great accuracy on a specific DNA fragment, the nuclease will only be able cut DNA at that location. Repair of ZFN-induced double-strand breaks (DSBs) with error-prone non-homologous end-joining (NHEJ) will result in introduction of insertion or deletion mutations (indels) at the site of the DSB. Alternatively, homology-directed repair of a DSB with an exogenously introduced donor template can promote introduction of alterations or insertions at or near the break site (Sander et al. 2011). ZFNs have been used to introduce specific mutations and transgene insertions that confer herbicide resistance in tobacco (Townsend et al. 2009).

TALEN effectors for DNA targeting were discovered in 2009. TALENS almost resemble ZFNs in terms of manufacturing and mode of action (Khan 2019). They are made by a similar principle where a restriction nuclease (FokI) is bound to a DNA-binding protein domain called TAL effector that guides the nuclease to a specific DNA sequence. Fusing a nuclease and a TAL effector allow the nuclease to cut only at one specific place in the plant DNA. The construction of TALENs is quite easier and popular compared to ZFNs; however, repetitive sequences present in TALENs can enhance the rate of homologous recombination. TALENs can target 3 nt at a time thus making it slightly more site specific with fewer off-target effects compared to ZFNs that address only 1 nt. Methods were optimized for targeted modification of plant genomes using TALENs in tobacco (Zhang et al. 2013). Though, the mitochondrial genomes of higher plants, in general, are not transformable, TALENs could effectively be used to achieve targeted modification of the mitochondrial genomes of rice and Arabidopsis through mitochondrial localization signals (mitoTALENs; Arimura et al. 2020). Therefore, it is theoretically possible to engineer changes into any part of a plant’s genome including tobacco using TALENs.

CRISPR Cas9/sgRNA system is a novel targeted genome-editing technique derived from bacterial immune system. It is an easy, less expensive, user friendly and quickly adopted genome editing tool and enables precise genomic modifications in many different organisms and tissues (Surender et al. 2016). Cas9 protein is an RNA guided endonuclease employed for creating targeted double-stranded breaks with a short RNA sequence to facilitate the recognition of the target site in animals and plants. Hence, CRISPR/Cas9 gene targeting requires a custom single guide RNA (sgRNA) that contains a targeting sequence (crRNA sequence) and a Cas9 nuclease-recruiting sequence (tracrRNA). The crRNA region is a 20-nucleotide sequence that is homologous to a region in target gene and will direct Cas9 nuclease activity. Development of genetically edited crops similar to those developed by conventional or mutation breeding using this technique makes it a promising and extremely versatile tool for genotype improvement. Inducible CRISPR/Cas9 system was developed to avoid constitutive expression of the Cas9 protein (Ren et al. 2020). Multiplexing could be performed using CRISPR/Cas9 simply through co-infiltration of multiple tobacco rattle viruses encoding different sgRNAs in tobacco (Ali et al. 2015). Through CRISPR/Cas9-mediated homologous recombination Hirohata et al. (2019) created three nucleotide substitutions (ATG to GCT) leading to herbicide chlorsulfuron (Cs) resistance. This technology has been used for improving seed oil in tobacco (Tian et al. 2020, 2021).

Engineered nuclease systems viz. ZFNs, TALENs, and CRISPR-Cas have emerged as innovative genome editing tools with their high genetic engineering efficiency and specificity. Successful demonstration of these techniques in tobacco promises its potential utilization in targeted editing of economically important traits in future.

10.11.2 Nanotechnology

The genetic engineering technique modifies plant cell genomes, involving the efficient delivery of modifier biomolecules as genetic cargo to targeted plant (Ahmar et al. 2021). Out of number of conventional tools for bimolecular deliveries, Agrobacterium-mediated transformation (AMT) and biolistic delivery of DNA are most widely used ones. AMT is being employed for the transformation of target DNA to the nuclear genome of a limited number of plant species. The AMT method results in random DNA integration, disruption of endogenous plant genes and alteration in gene expression arising from the inserted sites (Niazian et al. 2017). Biolistic delivery of DNA uses a high-pressure gene gun for directly targeting the plant tissues and randomly integrate the DNA into the chromosomal region across cell walls and membranes. This yet times results in destruction of tissues and multiple insertions in random portions of the plant genome (Toda et al. 2019). Because of the unavoidable high velocity of genetic cargo during biolistic delivery, the bombarded particles damage the cell wall through penetration and disrupt the homeostasis of target cell. Other less commonly used systems include electroporation, viral vectors and chemical delivery. Polymer-based chemical transformation leads to cytotoxicity in plant cells because of the accumulation of high-density charged polymer-based genetic cargo. A decrease in charge results in the impairment of bioconjugated complex. The disadvantages with viral vectors are high host specificity and limited cargo size. In addition to biolistic methods, PEG-mediated transformation is one of the widely used methods for inserting genetic cargo into chloroplasts (Yu et al. 2020). This method allows to carry various genetic cargo types, such as DNA and RNAs (small interfering RNA [siRNA] and miRNA into target cells (Cunningham et al. 2018). However, it requires regeneration from protoplasts, which is highly challenging because of the limited number of plant species amenable to protoplast regeneration. Thus, plant transformation presents a major bottleneck for GE technology. Conventional biomolecule delivering technologies are time-consuming and involve complicated protocols. Thus, low efficiency of gene transmission, narrow species range for application, limited cargo types, and tissue damage are the critical drawbacks of conventional delivery methods.

Nanotechnology advancements have created opportunities to overcome the limitations of the above conventional methods. Nanoparticles (NPs) are promising for the species-independent passive delivery of various genetic cargo [DNA, RNA, proteins (site-specific recombinases or nucleases), and ribonucleoprotein (RNP)] across plant systems (Cunningham et al. 2018). Use of NPs for the transfer of biological molecules into plant cells overcome all the issues that are previously hindering the success of GE and is emerging as a promising technique for improving the effectiveness, robustness and flexibility of GE. Development of genetically modified (GM) organisms (GMOs) through nanotechnology involving the use of Nanoparticles (NPs) as nanocarriers by constructing a binding complex with biomodifier molecules (eg. CRISPR/Cas system) signifies a powerful technique for transgene delivery into plant cells (Demirer et al. 2021).

The unmatched potential of the NP-based delivery of biomolecules to plant cells has revolutionized the GE delivery process (Ahmer et al. 2021). The NP-bound GE nuclease can be efficiently transferred to plant cells without resultant damage to the target tissue. Thus, the use of NP-based methods for genetic cargo delivery has emerged as a cutting-edge technology with new insights and a robust GE (Cunningham et al. 2018). Smaller size of NPs facilitate them to transverse the cell wall and overcome the obstacles to delivering biomolecules to plant tissues without the species- and tissue-specific limitations. NPs can also be engineered to facilitate cargo delivery to any subcellular parts such as mitochondrial or chloroplast DNA that AMT cannot target.

NPs commonly used for biomolecule delivery in both animal and plant systems are classified according to the base material used: bio-inspired, carbon-based, silicon-based, polymeric, and metallic/magnetic (Cunningham et al. 2018). Each of the NP types delivers different genetic cargos. For example, carbon nanotubes (CNTs) can carry RNA and DNA, but metallic NPs can only deliver DNA as genetic cargo. Silicon-based NPs can deliver DNA and proteins, and polymeric NPs (PEG and polyethyleneimine) transfer RNA, DNA, and proteins into cells (Silva et al. 2010). Cationic NPs are preferred for gene delivery into plants as they can bind to the plant cell wall (negatively charged) and perform gene transfer, whereas CNT NPs have been used to deliver plasmid DNA into various crops. Several NPs can penetrate the cell wall (eg. CNTs and mesoporous silica), whereas other NPs such as gold NPs and magnetic NPs (MNPs) require additional physical methods (e.g., magnetoinfection and electroporation) for genetic cargo delivery into the cells.

NP-mediated cargo delivery can be through physical and non-physical means. Creation of transient pores in the cell membrane with electric fields, soundwaves, or light, magnetofection, microinjection, and biolistic particle delivery are some of the physical methods. Utilisation of cationic carriers, incubation, and infiltration are nonphysical methods. Optimizing the use of NPs in different plant species in respect of their dose and spatiotemporal tuning is essential as NPs behave differently in specific plant cells (Ahmar et al. 2021). NPs considered to enable an efficient transformation of plants because of their ability to protect the genetic cargo from cellular enzymatic degradation (Ahmar et al. 2021).

Literature on NP–tobacco systems is mainly found to focus on genetically modified tobacco mosaic virus-based metallic nanomaterial synthesis, NP as pesticides, NP uptake, effects on plant growth, biomolecule delivery systems etc. (Burklew et al. 2012; Love et al. 2014, 2015; Wang et al. 2016; Tirani et al. 2019). Delivery of DNA into Nicotiana tabacum plants via biolistic delivery of 100–200-nm gold-capped mesoporous silica NPs MSNs was first demonstrated in 2007 by Torney and colleagues. Fu et al. (2012) used Zinc NPs to deliver DNA plasmid into tobacco. By contrast, NPs, such as silicon carbide whiskers (SCW) and MSN, have been effectively used to transfer genes into tobacco without using other physical methods (Golestanipour et al. 2018). However, the disadvantage with SCW method compared to other NP-mediated plant transformation is that an adequate protocol is required for plant regeneration from cell cultures. Silva et al. (2010) introduce siRNA into tobacco protoplasts using polymer NPs as an alternative gene knockout mechanism. Meanwhile, NP-mediated passive delivery of DNA plasmids has been reported with tobacco through CNTs (Burlaka et al. 2015; Kwak et al. 2019) and dsDNA through clay nano sheet NPs (Mitter et al. 2017). Demirer et al. (2019) have recently achieved passive delivery of DNA plasmids and protected siRNA using functionalized CNT NPs for transient silencing of constitutively expressed gene in transgenic N. benthamiana leaves with 95% efficiency.

In spite of significant advantages, few challenges are hampering the successful use of NPs in GE. The first relates to nanophytotoxicity effect of NPs on plant growth, causing damage to either the plant or the environment because of the subsequent release of NPs up to a toxic level (Ahmar et al. 2021). In general, significantly less, nontoxic level in terms of both the environment and the plant, amount of engineered NPs are required as genetic cargo. However, cell structural stability, metabolic pathway disturbance, deposition and dispersal of NPs to other plant cells after application necessitate further research to improve the use of NPs as genetic cargo. Another issue requiring attention for improving NP-mediated GE’s efficacy relates to efficiency in binding of NPs to biomolecules and the breakdown of the binding complex in plant cells. Different biomolecules have a different binding affinities with various NPs based on their chemical composition, structure, surface area, and charge, making them suitable for a bioconjugation complex. However, optimization of biomolecules specific binding requires further research to increase their versatility as genetic cargo.

10.12 Brief on Genetic Engineering for Resistance Traits

Tobacco is used widely as a model in transgenic research of plant for several reasons due to its well-studied molecular genetics, nearly complete genomic mapping, readily achievable genetic transformation, its survival ability under in vitro and greenhouse conditions, and large biomass yielding potential (Jube and Borthakur 2007). Transgenic tobacco plants are also ideal model organisms for the study of basic biological functions, such as plant-pathogen interactions, environmental responses, growth regulation, senescence, etc. In view of this, number of studies have been undertaken incorporating genes from bacteria, animals and other plant species into tobacco and their functional roles validated. Genetic engineering of tobacco plants for resistance related traits are reviewed here.

10.12.1 Target Traits and Alien Genes—Biotic Stress Resistance

Tobacco transgenics incorporated with transgenes from bacterial, other plants and virus were validated to confer resistance to various pests and diseases. These transgenic studies clearly proved that tobacco cultivars having resistant to biotic stresses can be successfully developed for commercial cultivation.

Number of bacterial genes found to confer resistance to diseases viz. P. syringae pv. phaseolicola (argK-ornithine carbamoyl transferase from P. syringae), P. syringae pv. tabaci (bO-Bacterio-opsin from Halobacterium halobium), P. parasitica var. nicotianae (popA-PopA protein from R. solanacearum), E. carotovora (expI-N-oxoacyl-homoserine lactone from E. carotovora; aiiA-Acyl-homoserine lactonase from Bacillus spp.), Helicoverpa spp. And Spodoptera spp. (cry genes-Crystal proteins from B. thuringiensis), boll weevil larvae (choM-ChoM protein from Actinomyces), tobacco hornworm (ipt-Cytokinin isopentenyl transferase from A. tumefaciens), etc. (Jube and Borthakur 2007). Further, tobacco transgenics having bacterial genes found to tolerance herbicides viz. Bialaphos (Bar gene—producing PPT acetyl transferase from Streptomyces hygroscopicus), Glyphosate (aroA-M1-EPSPS from E. coli), Phenmedipham (pcd-PMPH from Arthrobacter oxydans), 2, 4-D (tfdA-2, 4-D monooxygenase from Ralstonia eutrophus), Paraquat (pqrA-Paraquat resistant protein (PqrA) from Ochrobactrum anthropic), etc. (Jube and Borthakur 2007).

Exhaustive list of tobacco transgenics validating the effectiveness of plant pathogen related genes, lectins, proteinase inhibitors, trypsin inhibitor, transcription factor etc. from various plant sources conferring resistant to Spodoptera, Heliothis, aphids, TMV, phytopathogens and nematodes are available in the literature (Sane et al. 1997; Luo et al. 2009b; Malone et al. 2009; Priya et al. 2011; Guo et al. 2013). Plant derived antifungal proteins found to give protection from fungal pathogens. Animal derived Antimicrobial magainin analogs, avidins, gamma-aminobutyrate (GABA), proteinase inhibitor etc. found to enhance resistance of tobacco to diseases, pests and nematodes (Jach et al. 1995; Li et al. 2001; Burgess et al. 2002; Christeller et al. 2002; McLean et al. 2003).

Transgenic tobacco plants showing resistant to CMV, TMV, TLCV were developed by the transfer of transgenes from virus, plant or other origins (Prins et al. 2008). The approaches such as viral coat protein mediated resistance, replicase protein, movement, proteases, and antisense sequences ‘R'-genes from plants, plantibodies, double stranded RNA (dsRNA) etc. were used in conferring virus resistance in tobacco (Powell-Abel et al. 1986; Day et al. 1991; Audy et al. 1994; Xiao et al. 2000; Hofius et al. 2001; Spassova et al. 2001; Kalantidis et al. 2002). In a large number of transgenics developed using transgenes of viral origin, resistance is found to be conferred by post-transcriptional gene silencing (Kalantidis et al. 2002).

10.12.2 Review on Achievements of Transgenics

Tobacco has served as a model plant for producing large number of transgenics having pest and disease resistant and other economically important genes. However, no genetically transformed tobacco varieties (transgenic cultivars) are released for commercial cultivation in any of the countries, in view of the opposition faced by GM tobacco in the global market (Bowman and Sisson 2000). Though GM Approval Database of International Service for the Acquisition of Agri-biotech Applications (ISAAA) reports two GM tobacco events viz. (1) oxynil herbicide tolerance and (2) nicotine reduction, antibiotic resistance (GM approval database 2021), none of them are cultivated on commercial scale in any of the countries. In contrast, millions of hectares of genetically engineered soybean, corn, cotton and canola are being grown throughout the world (ISAAA 2019). Thus, tobacco breeding efforts lag behind those of other crops in genetic engineering. In addition, the strong opposition from the European countries to genetically modified organisms (GMOs) is also acting as hindrance in transgenic tobacco breeding. Thus, genetic engineering of tobacco cultivars is on hold until the trade related obstacles are alleviated. However, this methodology holds great promise for improving tobacco cultivars in terms of disease and pest resistance, and possibly health-related constituents in the cured leaf.

10.12.3 Organelle Transformation

Manipulation of nuclear genome through genetic engineering is performed widely in most economically important plant species. However, nuclear transformation has several drawbacks including unpredictable expression of the gene of interest and gene silencing due to the random location of transfer DNA integration and/or position effects (Meyers et al. 2010). As organelles containing genetic materials in small DNA genomes, plastids (chloroplasts) and mitochondria provide an opportunity for transformation in plants (Butow and Fox 1990; Rascon-Cruz et al 2021).

Plastid genomes of tobacco are typically 150 kb, and codes for about 140 genes. Plastids are seat for some of the important biosynthetic pathways and processes that include photosynthesis, photorespiration, metabolism of amino acids, lipids, starch, carotenoids, other isoprenoids, phenol compounds, purines, pyrimidines, isoprenoids, starch, pigments, vitamins synthesis, and also are implicated in the metabolism of phytohormones such as cytokinins, abscisic acid, and gibberellins (Kuchuk et al. 2006; Rascon-Cruz et al. 2021). Compared with conventional nuclear genetic engineering, plastid genome transformation offers several advantages (Kuchuk et al. 2006; Li et al. 2021). High level of transgene expression is possible with chloroplasts as there are about 100 chloroplasts per cell, each containing about 100 copies of genome. Thus, there is possibility of 10,000 copies of transgenes per cell due to plastid transformation. Gene silencing or position effects were not defined for plastid genes. Thus, the level of expression is much more predictable. Unlike integration into the nuclear genome, integration of heterologous DNA through homologous recombination mechanism into a plastome allows very precise genetic manipulations. There exists the possibility of multigene engineering through stacking transgenes in synthetic operons in a single transformation event. Maternal inheritance of plastomes in majority of crop species reduce the risk of uncontrolled transgene release into the environment (Kuchuk et al. 2006; Li et al. 2021).

Stable transformation of the plastome was achieved first in unicellular alga, Chlamydomonas reinhardtii in 1988, and two years later for the dicotyledonous seed plant tobacco (N. tabacum) (Svab et al. 1990). Over the years, plastid transformation in tobacco has become more and more routine with a transformation efficiency equivalent to nuclear transformation (Svab and Maliga 1993; Daniell et al. 2016; Li et al. 2021). Plastids of N. tabacum var. Petit Havana (Svab et al. 1990), N. benthamiana (Davarpanah et al. 2009) and N. sylvestris (Maliga and Svab 2010) are transformed by different workers. The plastid transformation technology has lead transgene expression, genome editing, and RNA editing analysis in plastids.

The first biotechnological application of transplastomic technology for pest control was the expression of cry1A(c) gene from B. thuringiensis (Bt) in the tobacco plastid. The tobacco transplastomic plants accumulates higher amounts of the Bt insecticidal protein (3–5% of Total Soluble Proteins-TSP) and displayed high levels of resistance to herbivorous insects (McBride et al. 1995). When cry2Aa2 was transformed as an operon along with two small open reading frames, the Cry2Aa2 protein accumulated up to 45% of TSP and led to the formation of crystals (De Cosa et al. 2001). Higher expression levels of cry9Aa2 (10% of TSP) in tobacco plastid genome results in severe growth retardation of the transplastomic plants (Chakrabarti et al. 2006) indicating that the transgene expression level need to be cautiously optimized for providing sufficient protection without a yield penalty. The advances made in other crops clearly indicate that developing tobacco with high levels of resistance to insects, bacterial, fungal and viral diseases, and different types of herbicides is quite possible with plastid transformation (Adem et al. 2017).

The comparison between plastids and mitochondria make it possible to transform mitochondrial genome with suitably designed constructs. However, reliable methods for the transformation of mitochondria using a biolistic device currently exist only for yeasts (Johnston et al. 1988) and green algae (Remacle et al. 2006) and no successful transformation of mitochondria in plant systems has been reported to date (Li et al. 2021). Most of the plant mitochondrial genomes composed of non-coding repeated sequences, gene spacing sequences and introns. A system for genetic transformation of plant mitochondria would facilitate functional analyses of the mitochondrial genome and its products, and also open the way for modification of mitochondrial metabolism, or to introduce cytoplasmic male sterility (CMS) into new crops and varieties (European Commission 1989; Wang et al. 2020).

10.12.4 Biosynthesis and Biotransformation

10.12.4.1 Biosynthesis

Alkaloids are important compounds synthesized in Nicotiana plants and essential in establishing the commercial quality of tobacco as well as its defense against herbivores (Zenkner et al. 2019). Both wild and domesticated forms of Nicotiana spp. accumulate nicotine, the content and composition of which vary among species. Nicotine, in general, is produced in tobacco roots and translocated to leaves. Some Nicotiana wild species produce N-acyl-nornicotine, an alkaloid with more potent insecticidal properties than nicotine. The regulation of nicotine biosynthesis has been considered a complex physiological response, and many TFs are directly or indirectly involved in its regulation (Kajikawa et al. 2017; Xu et al. 2017; Qin et al. 2020). Six TFs (from three TF families) found to affect nicotine metabolism, with two basic helix-loop-helix genes positively regulating the jasmonate activation of nicotine biosynthesis (Todd et al. 2010). Metabolic engineering for the biosynthesis of nicotine and its more potent N-acyl-nornicotine in tobacco without compromising the commercial quality of tobacco could be an option in developing biotic stress resistant cultivar. The developed high nicotine yielding cultivars can be used to extract nicotine for use as a pesticide.

10.12.4.2 Biotransformation

Biotransformation of applied xenobiotic (pesticides, herbicides etc.) chemicals for controlling biotic stresses is essential to maintain tobacco quality and reduction of pesticide residues, and ultimately for health of the tobacco users. Tobacco transgenics having herbicide tolerance viz. bialaphos (having Bar gene—producing PPT acetyltransferase from S. hygroscopicus), glyphosate (aroA-M1-EPSPS from E. coli), phenmedipham (pcd-PMPH from A. oxydans), 2,4-D (tfdA-2,4-D monooxygenase from R. eutrophus), paraquat (pqrA-paraquat resistant protein (PqrA) from O. anthropic), etc. found to reduce the effect of herbicides on tobacco through their biotransformation in tobacco plant (Jube and Borthakur 2007). This ability enables to use these herbicides in controlling weeds in tobacco field without affecting the mail crop. Biotransformation of pesticides to their less toxic forms in a reasonable time frame in tobacco will assists in reducing pesticide residues. Research in this direction is essential in future for adding more number to the list of crop protection agents that can be used on tobacco.

10.12.5 Metabolic Engineering Pathways and Gene Discovery

“The improvement of cellular activities by manipulation of regulatory, enzymatic, and transport functions of the cell with the use of recombinant DNA technology” is defined as Metabolic engineering (Bailey 1991). Metabolic engineering is motivated by commercial applications by which one can improve the developing strains for production of useful metabolites. It is basically meant for altering the metabolic pathways for the production of chemicals, pharmaceuticals, fuels, and medicine. A metabolic pathway can be defined as any sequence of feasible and observable biochemical-reaction steps connecting a specified set of input and output metabolites. The rate at which various input metabolites are processed to form output metabolites is known as pathway flux. Metabolic engineering involves useful alteration of metabolic pathways to better understand and utilize the cellular pathways. This involves overexpression or down regulation of certain proteins in a metabolic pathway in such a way that the cell produces a new product.

First step for successful engineering requires the complete understanding of metabolic pathway and genes involved in the path way and host cell for genetic modifications (Fuentes et al. 2018). The engineering of metabolic pathways in plants frequently requires the concerted expression of more than one gene involved in that pathway. With traditional transgenic approaches, the expression of such multiple transgenes has been a challenge. Recent progress in transformation techniques has making it possible to integrate multiple transgenes into host genomes. New technological options include combinatorial transformation (large-scale co-transformation of the nuclear genome) and transformation of the chloroplast genome with synthetic operon constructs (Bock 2013). Metabolic pathway engineering of plastid (chloroplast) genome offers significant advantages, including straight forward multigene engineering by pathway expression through operons, higher levels of transgene expression, and transgene containment due to maternal inheritance. Further, it allows direct access to the large number of diverse metabolite pools in chloroplasts and other non-green plastid types.

In contrast to most structural genes, TFs tend to control multiple pathway steps and hence, facilitates for the engineering of complex metabolic pathways for higher levels of metabolites (Broun 2004; Grotewold 2008). The TFs are often exists as gene families and regulate target genes in tissue- and species-specific patterns (Bovy et al. 2002). In most of the cases, detailed studies have not been made on the specificity of the regulatory genes. Studies on the variations in transcriptomes and metabolomes assists in understanding the regulation by transcription factors in heterologous systems. Flavonoids are a group of compounds involved in several aspects of plant growth and development, such as pigment production, pollen growth, seed coat development, pathogen resistance and UV light protection (Harborne 1986). Hence, manipulation of phenylpropanoid pathway responsible for flavonoid production can be a strategy for biotic and abiotic stress resistance.

Metabolic engineering using three monoterpene synthases from lemon altered fragrance of tobacco plants (Lucker et al. 2004). Engineering of synthetic operon constructs comprising three genes for the key enzymes of vitamin E (tocochromanol) biosynthesis resulted in an increase of up to tenfold in total tocochromanol accumulation in transplastomic tobacco (Lu et al. 2013). Astaxanthin content in the transplastomic tobacco plants was enhanced through plastid transformation of a synthetic operon consisting of three genes that redirect lycopene into the synthesis of β-carotene and ultimately astaxanthin, a high-value ketocarotenoid (Lu et al. 2017). Grafting of transplastomic tobacco onto the non-transformable Nicotiana glauca enabled the horizontal transfer of the transgenic chloroplast genomes through the graft junction (Lu et al. 2017). Thus, grafting may be helpful in the transplastomic engineering of plant species that are otherwise not amenable.

Metabolic engineering of artemisinic acid biosynthetic pathway provided a proof of concept for combining plastid and nuclear transformation to optimize product yields from complex biochemical pathways in chloroplasts (Fuentes et al. 2016). Transplastomic tobacco for two synthetic operons expressing the core artemisinic acid biosynthetic pathway accumulates only low levels of the metabolite. However, super transformation of the trasplastomics lines using the COSTREL (combinatorial super transformation of transplastomic recipient lines) approach, increased the artemisinic acid content up to 77-fold.

Photorespiration could be reduced in tobacco through the introduction of three distinct alternative glycolate metabolic pathways into tobacco chloroplasts (South et al. 2019). Coupling the reduced expression of a glycolate and glycerate transporter to limit glycolate flux out of the chloroplast with alternative photorespiratory pathway could raise the biomass productivity by >40% under field conditions (South et al. 2019). In this study, about 17 constructs were designed for nuclear transformation; these multienzyme pathways could effectively be introduced into the chloroplast in the form of operons.

Possibility of metabolic engineering for pest resistance was obtained with the modulation of transcriptome and metabolome of tobacco (Nicotiana tabacum) by Arabidopsis transcription factor, AtMYB12 (Misra et al. 2010). Expression of AtMYB12 in tobacco resulted in enhanced expression of genes involved in the phenylpropanoid pathway, leading to severalfold higher accumulation of flavonols. The tobacco transgenic lines developed resistance against the insect pests S. litura and H. armigera due to enhanced accumulation of one of the flavonol, rutin. This study clearly indicates that metabolic engineering can be successfully employed in developing stress tolerant tobacco genotypes.

10.12.6 Gene Stacking

Gene stacking or Gene pyramiding or multigene transfer refers to incorporation of two or more genes of interest into a single plant. The combined traits obtained from this process are called stacked traits. A biotech crop variety that bears stacked traits is called a biotech stack or simply stack (ISAAA 2020). A biotech stack may be a plant transformed with two or more genes that code for proteins having different modes of action on a pest or a hybrid plant expressing both insect resistance and herbicide tolerance genes derived from two genetic sources. Biotech stacks are engineered to overcoming the myriad of problems that includes insect pests, diseases, weeds, and environmental stresses in order to increase farm level productivity. Insect resistance based on multiple genes confers stable resistance than single gene which may breakdown due to co-evaluation of pests. Gene stacks can be are generated through methods such as (i) the simultaneous introduction of transgenes through co-transformation, and (ii) the sequential introduction of genes using re-transformation processes or the sexual crossing of separate transgenic events.

Pyramiding of multiple genes found to impart resistance to insects and pathogens resistance and herbicide tolerance in tobacco. Stacking of genes conferring insect-resistance and herbicide (glyphosate) tolerance (Sun et al. 2012), dsRNAs silencing of chitin biosynthesis pathway genes for root-knot nematode resistance are reported in tobacco. Stacked insect resistant cry1Ac and cry2A genes (Bakhsh et al. 2018) and three codon optimized cry2Ah1 genes (Li et al. 2018) found to confer resistance to lepidopteran insects. Pyramiding of two pathogenesis-related genes imparted resistance to three filamentous fungus in tobacco (Boccardo et al. 2019). Stacking of protease inhibitors from sweet potato and taro found to impart resistance to both insects (H. armigera) and pathogens (damping-off disease caused by P. aphanidermatum and bacterial soft rot caused by E. carotovora) (Senthilkumar et al. 2010). Through, stacked products are promising and technically feasible in tobacco, till date, none of the stacks produced, through genetic engineering, are approved for commercial cultivation in tobacco mainly because of their transgenic tag. Gene pyramiding events in tobacco are mainly used as proof of concepts or for gene function and interaction studies.

Regulatory principles and procedures for approval and release of biotech stacks differ globally (ISAAA 2020). In USA and Canada, no separate or additional regulatory approval is required for commercialization of hybrid stacks that are products of crossing a number of already approved biotech lines. This policy is based on the assertion that interactions between individual trait components in a stack that have not been shown environmental or health hazard would not give rise to new or altered hazards. The combinations of “plant incorporated protectants” or PIPs (eg. Bt insecticidal proteins) may give rise to higher or altered toxicity, hence, the US Environmental Protection Agency calls for a separate safety review of a stack in case of identification of any such hazard. Contrary to this, stacks are considered new events in Japan and European Union (EU) countries, even though individual events have commercial approval, and must pass through a separate regulatory approval process, including risk assessment of their safety, similar to mono-trait biotech events (ISAAA 2020). Risk assessment is focused on the identification of additional risks that could arise from the combined genes.

10.12.7 Gene Silencing

Gene silencing is the regulation of gene expression in a cell to prevent or reducing the expression of a certain gene. Gene silencing is used as a means for developing species-specific pest control methods that are alternatives to potentially harmful chemical methods in plants. RNA interference (RNAi) is a promising method for controlling insect pests and diseases by silencing the expression of vital pest and disease-causing organism genes to interfere with development and physiology. RNAi technology is based on the expression of dsRNA that shares nearly 100% sequence homology with a desired target gene for optimal silencing. The mechanism of RNAi mediated gene silencing is based mainly on the exogenous production of short interfering RNAs/microRNAs (siRNAs/miRNAs) in an organism to control gene expression. Expression or introduction of dsRNA in eukaryotic cells can trigger sequence-specific gene silencing of transgenes, endogenes, and viruses. Transgenic plants producing dsRNAs with homology to pest or viral sequences are likely to exhibit pathogen-derived resistance to the pests and diseases. Tobacco Rattle Virus–based virus-induced posttranscriptional gene silencing (termed virus-induced gene silencing or VIGS) found to be the widely used method to downregulate the expression of a target plant gene (Bachan and Dinesh-Kumar 2012; Senthil-Kumar and Mysore 2014). Temporal and spatial control of gene silencing could be achieved through inducible (ethanol) expression of double-stranded RNAs in tobacco (Chen et al. 2003). In planta-expressed dsRNA synthesized within the plastids are more effective than nuclear expressed circumventing the native RNAi paths in eukaryotes. M. sexta (tobacco hornworm) genes could be effectively silenced through plastid transformation of dsRNA genes targeting the hornworm genes (Burke et al. 2019).

As a model organism number of studies conducted in tobacco for silencing genes of various pest and disease-causing organisms there-by reducing their growth and development leading to resistance. Transgenic tobacco plants expressing a hairpin RNA (hpRNA) targeting a root-knot nematode (M. javanica) putative zinc finger transcription factor effectively suppressed the growth of nematodes feeding on the roots of the transgenic plants (Fairbairn et al. 2007).Vietnamese scientists succeeded in breeding tobacco for virus resistance using gene silencing or RNAi technology using three expression vectors carrying single gene (TMV, TSWV etc.) or multiple genes (TMV, TSWV, CMV and TYLCV) (http://agrobiotech.gov.vn/NewsDetail.aspx?ID=821&CatID=7). Expressing of dsRNA homologous to pest genes found to enhance host plant resistance to whitefly (Thakur et al. 2014; Malik et al. 2017), H. armigera and S. exigua (Zhu et al. 2012), nematodes (Mani et al. 2020) and CMV (Kalantidis et al. 2002) in tobacco.

10.12.8 Prospects of Cisgenics

In cisgenesis, as in transgenesis, extra DNA is stably built into the plant DNA. The major difference between transgenesis and cisgenesis is the origin of the DNA (Schouten and Jacobsen 2008). With cisgenesis, the extra DNA originates from a plant with which the acceptor plant (the plant that will receive the extra DNA) can cross-breed. ‘Cis’ refers to within the same crossable group (Schouten et al. 2006a, b). Cisgenesis approach combines both traditional breeding techniques with modern biotechnology and dramatically speed up the breeding process. Cisgenic plants are presumably considered safer than those produced through conventionally bred plants because of the lack of linkage drag (Hou et al. 2014). Cisgenesis introduces just desired genes without the undesirable genes and hence, prevents hazards compared to induced translocation or mutation breeding (Telem et al. 2013). Through cisgenesis various biotic and abiotic stress resistance genes can be pyramided to provide wider and long-lasting forms of resistance. Several backcrossed generations are required to get rid of undesired genes in conventional hybridization programs, hence takes long time (Telem et al. 2013). Whereas the time taken for introducing a single gene or more so with multiple genes can be drastically reduced in cisgenesis due to incorporation of merely target traits.

Introduction of exogenous transfer process related genes through cisgenesis can be avoided through the use of new transformation protocols without bacterial selection markers (de Vetten et al. 2003; Schaart et al. 2004) and species-specific P-DNAs instead of bacterial T-DNAs for insertion of isolated genes (de Vetten et al. 2003; Rommens et al. 2004). Further, application of new methods, such as promoter trapping and RNA fingerprinting for the isolation of native regulatory elements can now be exploited for the precise expression of the desired traits (Meissner et al. 2000; Trindade et al. 2003). Majority of the methods for production of cisgenic crops without exogenous genes including removal of selectable marker genes, use of pDNAs and segregation of independently integrated T-DNAs have been patented, therefore scientists need either to use the existing patents or design new protocols to eradicate the undesired DNA sequences from host genomes (Holme et al. 2013).

Vast prospects exists for cisgenesis in tobacco crop. Availability of large number of wild species and germplasm resources, genome sequence information of cultivated tobacco and few wild relatives, and comparative genomic techniques, the development of efficient gene isolation techniques like map- based cloning and allele mining are opening the avenues for identification and cloning of resistant traits from tobacco and their wild relatives. The cisgenes thus isolated can be used for transferring and imparting resistance for biotic stresses in tobacco. Cisgenesis was successfully taken up in tobacco through gene editing techniques, such as ZFNs (Townsend et al. 2009), TALENs (Zhang et al. 2013) and CRISPR-Cas (Upadhyay et al. 2013; Ali et al. 2015) induce targeted alleles in tobacco (Hou et al. 2014). Identification, cloning and transferring single or multiple biotic stress resistant cisgenes into tobacco can be achieved in future with the ever-improving gene technologies.

Though both transgenesis and cisgenesis employ the same genetic modification techniques for introducing gene(s) into a plant, cisgenesis use only genes of interest from the plant itself or from a crossable species which otherwise could also be transferred by traditional breeding techniques. The release of cisgenic plants into the environment doesn’t evoke any environmental risk and they are as safe as that of traditionally bred plants. Hence, compared to transgenesis, cisgenesis is more similar to traditional plant breeding and may be considered as non-transgenic, in spite of use of molecular biology and plant genetic engineering methods in their development. Restrictions on any form on cisgenesis could delay further research on development of improved crop varieties, particularly at a time when a greater number of genes from crops and their crossable wild relatives are being isolated and are amenable to cisgenesis. Hence, it is necessity to discriminate cisgenesis from transgenesis.

Common people are found to be much satisfied with cis/intragenic crop than transgenic crops. Surveys indicated cisgenic plants are more likely to be acceptable to the public than transgenic plants (Viswanath and Strauss 2010; Gaskell et al. 2011; Mielby 2011). However, currently, GMO regulations to prevent any negative effects on the environment or human health are based on transgenic organisms and do not distinguish transgenic plants from cisgenic plants in majority of the countries. Canada has adopted a product-based regulation system instead of process-based one and making it legally possible to control cisgenic plants less strictly than transgenic plants. As per the Australian Gene Technology Regulations, “a mutant organism in which the mutational event did not involve the introduction of any foreign nucleic acid” is not specified as GMO and thus treats cisgenic plants differently (Russell and Sparrow 2008). European Food Safety Authority (EFSA 2012) states that cisgenic plants are similar to the traditionally bred plants and safe in terms of environment, food and feed.

Though, biotic stress resistant trasgenics are available, they could not be used for commercial cultivation due to worldwide GMO regulations. In case cisgenics are treated differently, that will be advantageous to tobacco in which GMOs are not acceptable. Lesser regulations on cisgenics will boost the cisgenesis research in tobacco for improving tobacco yields and resistant factors.

10.13 Brief Account on Role of Bioinformatics as a Tool

Recent technological advances have resulted in the accumulation of large volumes of biological data in terms of nucleic acid sequences and various details of biomolecules produced in tobacco and other crops under different situations and life stages. In order to store and analyze these data, number of general and crop specific databases were created. The databases may have the information related to one or more of omics types in an integrated way. The information pertaining to tobacco are being stored and accessed through quite a number of databases, globally. However, to name a few, some of the key databases that are covering most of the data related to tobacco are discussed here under.

10.13.1 Gene and Genome Databases

Ever-improving sequencing technologies, gene mapping and tagging projects, and phylogenetic studies have resulted in accumulation of large volumes of genomic data in tobacco. Computer databases are an increasingly necessary tool for organizing such vast amounts of biological data generated and for making it easier for researchers to access and analyze relevant information. The Genomic databases serve as hubs for storing, sharing and comparison of data across research studies, data types, individuals and organisms.

Among the various databases, the key genome databases harboring Nicotiana genome and gene information are NCBI Genome, Sol Genome Networks (SGN), Kyoto Encyclopedia of Genes and Genomes (KEGG genome), EnsemblPlants, Nicotiana attenuata data hub (NaDH), The International Nucleotide Sequence Database Collaboration (INSDC), Gramene etc. (Table 10.13). At present, genome sequences of 12 Nicotiana spp. viz. N. tabacum, N. tomentosiformis, N. sylvestris, N. attenuata, N. undulate, N. otophora, N. suaveolens, N. glauca, N. stocktonii, N. repanda, N. amplexicaulis and N. debneyi at scaffold or contigs level, chloroplast genomes of five species and mitochondrial genomes of three species are available with one or the other databases. Further, a total of over more than two lakh records of gene sequences belonging to 12 Nicotiana species are existing at various data bases. NCBI and SGN together are the principal databases that cover all the currently available information on genomes and genes of various Nicotiana spp. These databases are sharing the stored information with other databases and providing extensive tools for the analysis of sequences and annotation. INSDC is a long-standing foundational initiative that operates between DNA Databank of Japan (DDBJ), European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) and NCBI and covers the spectrum of data raw reads, from assemblies and alignments to functional annotation. Other databases mentioned above provide access to the Nicotiana resources, mostly, through collaboration with other databases along with additional analysis tools existing with their databases. Nicotiana attenuata data hub (NaDH) covers the exclusive information on N. attenuata and its similarities with other Nicotiana species and 11 published dicot species. A website of Boyce Thompson Institute’s for N. benthamiana resources provide access to N. benthamiana genomic resources existing at SGN including gene and protein data, markers, genes to phenotypes database etc. (https://btiscience.org/our-research/research-facilities/research-resources/nicotiana-benthamiana). It is also providing tools for alignment, annotation, designing siRNAs for VIGS, CRISPR designing etc.

Table 10.13 Some of the main genomic resource databases for Nicotiana species

In addition to above databases, the Gene Ontology resource database provides access to scientific information about the molecular functions of genes (or, more properly, the protein and non-coding RNA molecules produced by genes) from many different organisms, from humans to bacteria, their cellular locations and processes those gene products may carry out (Table 10.13). Currently, 25,761 genes and gene products are found to be associated with the term Nicotiana in The Gene Ontology resource database.

Most of the gene and genomic databases provide tools for searching, alignment and comparison of sequences with other Nicotiana spp. Solanaceous crops and other crops, annotation, marker designing, QTL analysis, VIGS Tool to design virus-induced gene silencing constructs, CRISPR designing etc. Genome analysis aims to describe the functions of genes and proteins, as well as the relationship that exists between a given genotype and phenotype. Gene and gene sequence alignments helps in the identification of inherited genetic variations like SNPs and alterations in gene sequences and their relationship to resultant phenotypes. Apart from analysis of genome sequence data, various genome databases are facilitating the analysis of gene variation and expression, prediction of gene/protein structure and function, prediction and detection of gene regulation networks, etc.

10.13.2 Comparative Genome Databases

The increasing availability of genomic sequence from multiple organisms has provided biomedical scientists with a large dataset for orthologous-sequence comparisons. The rationale for using cross-species sequence comparisons is to identify biologically active regions of a genome based on the observation that sequences that perform important functions are frequently conserved between evolutionarily distant species, distinguishing them from nonfunctional surrounding sequences. This is most readily apparent for protein-encoding sequences but also holds true for the sequences involved in the regulation of gene expression. However, examination of orthologous genomic sequences from several vertebrates has shown that the inverse is also true.

Genome sequences can be compared using pairwise and multiple whole-genome alignments and based on these alignments, synteny, sequence conservation scores and constrained elements can be determined. Comparison of whole-genome sequences at the level of nucleotide or protein provides a highly detailed narration of how organisms are related to each other at the genetic level. Comparative genome studies on genomic variations will identify the types of genes, gene families, and their location, as well as provide clues on the history of evolutionary gene rearrangements and duplications that might be responsible for the recognized genetic variations. By carefully comparing genome characteristics that define various organisms, researchers can pinpoint regions of similarity and difference. This information can be used to identify putative genes and regulatory elements for various traits that may lead to their cloning and further utilization.

A variety of tools to compare the complete genome sequences of within or among the different species are made available by different databases. All most all the gene and genome databases of tobacco namely NCBI, SGN, NaDH etc. are offering the tools for such comparative genome analysis. VISTA provides a comprehensive suite of programs and databases for comparative analysis of genomic sequences. There are two ways of using VISTA—one can submit their own sequences and alignments to VISTA servers for analysis or examine pre-computed whole-genome alignments of different species.

Gramene, a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops including tobacco. The current release, #64 (September 2021), hosts 114 reference genomes, and around 3.0 million genes from 90 plant genomes with 3,256,006 input proteins in 123,064 families with orthologous and paralogous classifications. Comparative genomics contains around 340 pairwise DNA alignments and 80 synteny maps. Plant Reactome portrays pathway networks using a combination of manual biocuration and orthology-based projections to 106 species. The Reactome platform enables comparison of reference and projected pathways, gene expression analyses and overlays of gene–gene interactions. Gramene integrates information extracted from plant-focused journals on genetic, epigenetic, expression, and phenotypic diversity; ontology-based protein structure–function annotation; and gene functional annotations.

Many online/web applications can be effectively used for comparative analyses at genomic and genic levels. Apart from tools provided by various genomic databases, applications such as BRIG (Alikhan et al. 2011), Mauve (Darling et al. 2004)), Artemis Comparison Tool (ACT) (Carver et al. 2005), geneCo (Jung et al. 2019) etc. can be used for comparative genomics. At Nicotiana attenuata data hub, genes of 11 published dicot species were compared and found to cluster into 23,340 homologous groups (HG) based on their sequence similarity with at least two homolog sequences. The phylogenetic trees were also constructed for all these HG.

Comparative analyses of Nicotiana plastid genomes among themselves and with currently available Solanaceae genome sequences indicated the existence of similar GC and gene content, codon usage, simple sequence and oligonucleotide repeats, RNA editing sites, and substitutions (Asaf et al. 2016; Mehmood et al. 2020). Such analysis also revealed that Notophora is a sister species to Ntomentosiformis within the Nicotiana genus, and Atropha belladonna and Datura stramonium are their closest relatives (Asaf et al. 2016).

Comparision of whole nuclear and plastid genomes facilitated the identification and confirmation wild progenitor species and their relative genome contributions in the evoluation of N. tabacum and N. rustica genomes (Murad et al. 2002; Lim et al. 2005; Leitch et al. 2008; Sierro et al. 2014, 2018; Edwards et al. 2017). Whole-genomic sequence comparison indicated that the genomes of N. sylvestris and N. tomentosiformis contributes 53 and 47%, respectively to the genome of N. tabacum confirming a larger biased reduction of T genome (Sierro et al. 2014). Comparison of chloroplast genomes revealed that N. tomentosiformis is a sister species and Atropha belladonna and Datura stramonium are closest relatives to N. otophora (Asaf et al. 2016). Further, it also revealed that the maternal parent of the tetraploid N. rustica was a common ancestor even for N. paniculata and N. knightiana, and the later species is more closely related to N. rustica.

Comparative studies of Nicotiana genome sequences provided insights into how speciation impacts plant metabolism, and in particular alkaloid transport and accumulation. In a genome evolution study, Sierro et al. (2018) found that 41% of allotetraploid genome of N. rustica originated from the paternal donor (N. undulata) and 59% from the maternal donor (N. paniculata/N. knightiana). Gene clustering analysis revealed the commonality of 14,623 ortholog groups among the Nicotiana species and 207 specific to N. rustica. It was speculated from the results the higher nicotine content of N. rustica leaves is due to the combination of progenitor genomes and higher active transport of nicotine to the shoot. Comparative genomics of pests and diseases is helping in distinguishing isolates, races and strains (Hou et al. 2016).

10.13.3 Gene Expression Databases

With technological advancements large volumes of data is being generated on gene expression patterns in tobacco from seed to senescence and under different growing conditions including during the incidence of biotic stresses. The expression of genes in plants are measured at transcriptome levels. In tobacco, the efforts of the Tobacco Genome Initiative (TGI) resulted in enrichment of the sequence information for the transcriptionally active regions of the tobacco genome. The information generated in the form of Expressed Sequence Tags (ESTs), short, single pass sequence reads derived from complementary DNA (cDNA) libraries and methyl filtered Genome Space Sequence Reads (GSRs) laid foundation for gene expression analysis in tobacco. Kamalay and Goldberg (1980) measured the extent to which structural gene expression is regulated in an entire tobacco plant. Later, Matsuoka et al. (2004) analyzed the change of gene expression during the growth of tobacco BY-2 cell lines and found 9200 EST fragments corresponding to about 7000 genes. Rushton et al. (2008) reported 2513 TFs representing all of the 64 well-characterized plant TF families and used to create a database of tobacco transcription factors (TOBFAC). Edwards et al. (2010) designed Affymetrix tobacco expression microarray from a set of over 40 k unigenes and measured the gene expression in 19 different tobacco samples to produce the Tobacco Expression Atlas (TobEA). TobEA provides a snap shot of the transcriptional activity for thousands of tobacco genes in different tissues throughout the lifecycle of the plant. Expression profiling of tobacco leaf trichomes resulted in the identification of putative genes involved in resistance to biotic and abiotic stresses (Harada et al. 2010; Cui et al. 2011). Differentially expressed long noncoding RNAs (lncRNAs) found to be involved regulating jasmonates (JA) mediated plant defense against M. sexta attack (Li et al. 2020).

Storing and integrating different types of expression data and making these data freely available in formats appropriate for comprehensive analysis is essential for their effective utilization. Analysis of gene expression data provide hints towards understanding various aspects of plant development and resistance to biotic and abiotic stresses along with defining the molecular and genetic pathways associated with these processes. The expression databases host the transcript/ RNA/ probe information of different genes under varied native or test conditions along with the relevant software tools for analysis and retrieval of the data. EST databases constructed in the years ago for this purpose has metamorphosed to host microarrays and RNA sequences with the advent of technology. ESTs provide an insight into transcriptionally active genes in a biological sample under a given set of conditions and is relatively expensive and time consuming. Microarrays, however, offer a faster and less expensive alternative for simultaneously measuring the expression of thousands of genes that can be easily and reproducibly applied to identify genes showing specific expression patterns or responses across a broad range of conditions or treatments.

At present the Gene expression data have been archived as microarray and RNA-seq datasets in the public databases such as Gene Expression Omnibus (GEO), ArrayExpress (AE) and Genomic Expression Archive (GEA) (Table 10.13). These databases act as useful resources for the functional interpretation of genes. GEO hosts around 4,860 curated gene expression data sets as well as original series and platform records of 11 Nicotiana spp. (Table 10.8; https://www.ncbi.nlm.nih.gov/gds). ArrayExpress database have gene expression profiles of around 75 experiments including N. attenuate, N. benthamiana, N. langsdorffii x N. sanderae and N. tabacum. Genomic Expression Archive has 205 gene expression records related to 11 Nicotiana species. SGN maintaining two transcript libraries of N. sylvestris and 39 of N. tabacum. Further, there are exclusive expression databases for Nicotiana attenuata (NaDH) and N. benthamiana (https://btiscience.org/our-research/research-facilities/research-resources/nicotiana-benthamiana) along with the Sol genome networks for expression analysis among solanaceous members.

10.13.4 Protein or Metabolome Databases

Proteins are the end products of some of the expressed genes in an organism. While, a proteome is the set of proteins thought to be expressed by an organism in its life cycle. The metabolome can be defined as the complete complement of all small molecule (<1500 Da) metabolites found in a specific cell, organ or organism. The metabolome of a species is the link between its genotype and phenotype. It indicates the stage specific and organ specific response of the plants through gene expression in response to its environment. Metabolomics can influence both the gene expression and the protein function of the plant, hence, make it a central component in elucidating cellular systems and deciphering gene functions.

Proteomics and metabolomics approaches play pivotal role in functional genomics and have been essentially required for understanding plant development and stress tolerance abilities. Proteome and Metabolome profiling is an attractive tool for phenotyping plants confronted by environmental changes and biotic stresses. Such studies contribute significantly to the study of stress biology by distinguishing different compounds such as auxiliary products of stress metabolism from biosynthetically unrelated pathways, stress induced signal molecules, molecules that are part of plant acclimation process etc. The resultant metabolic compounds could be further studied by direct measurement or correlating with the changes in transcriptome and proteome expression during stress condition and can be confirmed by mutant analysis. Thus, metabolome study aid in unravelling the different pathways related to plant development and response to stresses. With the advent of high-throughput based systems, proteome and metabolome profiling was extensively carry out in the model plant like tobacco to examine stress signaling pathways, cellular and developmental processes.

Principal databases hosting tobacco proteome information are UniProt, Pfam, KEGG, SGN, NCBI, etc. and metabolome databases are SolCyc, REACTOME, The Golm Metabolome Database (GMD), MoNA (Massbank of North America), etc. The salient features of various proteome data bases are briefed below.

The proteomic data generated globally in various organisms is being stored and accessed through the Universal Protein Resource (UniProt), a comprehensive resource for protein sequence and annotation data. UniProt was established as a collaboration between the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR) and consists of three databases that are optimized for different uses. The UniProt Knowledgebase (UniProtKB) is the central access point for providing information on extensively curated proteins including their function, classification and cross-references. The UniProt Reference Clusters (UniRef) pools closely related sequences into a single record to speed up sequence similarity searches. The UniProt Archive (UniParc) is a comprehensive repository of all protein sequences and consisting only of unique identifiers and sequences. UniProt provides several sets of proteins assumed to be expressed by organisms whose genomes have been completely sequenced, termed “proteomes”. There are 73,606 protein entries associated with proteome N. tabacum and a total of 154,728 entries for all Nicotiana species as on 30.09.2021.

The Pfam database contains a large collection of protein families, each represented by multiple sequence alignments. This database provides tools for protein alignments and annotation, domain organization of a protein sequence etc. There are about 4970 unique results for the search term Nicotiana in this data base as on 30.09.2021 indicating the protein entries in the database.

KEGG database in addition to protein info, projects biological processes from various organisms onto pathways consolidated in large network schemata. The KEGG database, at present, contains the information of annotated proteins of N. tabacum (61,780 No.), N. tomentosiformis (30,989) N. sylvestris (33,816) and N. attenuata (34,218). In the Sol genomics network database also providing the data of proteins annotated based on the draft genome sequences of N. tabacum, N. benthamiana and N. attenuata. Protein sequences collection of around 275,000, from several sources, are available for 20 Nicotiana spp. at NCBI along with annotated reports of four Nicotiana spp. (Tables 10.8 and 10.11).

SolCyc contains a collection of Pathway Genome Databases (PGDBs) related to Solanaceae species that are generated using Pathway Tools. It is a database hub at SGN for the manual curation of metabolic networks and includes annotated metabolic, regulatory and signaling processes in Solanaceous plants based on Omics data obtained from multiple resources. It has species-specific databases for Tomato (LycoCyc), Potato (PotatoCyc), Pepper (CapCyc), Coffee (CoffeaCyc), Petunia (PetCyc), N. tabacum (K326Cyc), N. attenuata (NattCyc), N. sylvestris (NiSylCyc), N. tomentosiformis (NiTomCyc), N. benthamiana (BenthaCyc); and multi-species databases for Combined Nicotiana genus (NicotianaCyc) and Combined Solanaceae database (SolanaCyc). In addition to proteomic data, NaDH is also providing metabolome data of N. attenuata with analysis tools and facilities for the search of metabolites and fragments based on annotation and measured values.

REACTOME offers intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, modeling, genome analysis, systems biology and education. GMD is an open access metabolome database and provides public access to custom mass spectral libraries, metabolite profiling experiments along with additional information and tools. MoNA (Massbank of North America) is a centralized and collaborative database of metabolite mass spectra, metadata and associated compounds. At present, MoNA contains over 200,000 mass spectral records from experimental and in-silico libraries besides from user contributions.

The proteomic studies in tobacco revealed the interesting facts about the different stress responses (Amme et al. 2005). Analysis of the proteome of glandular trichomes revealed the enrichment of proteins belongs to components of stress defense responses. In another study comparative proteomics of tobacco mosaic virus-infected N. tabacum plants identified major host proteins involved in photosystems and plant defense (Das et al. 2019). Metabolome study under water stress in tobacco identified a useful marker for drought stress for members of Solanaceae (Rabara et al. 2017).

10.13.5 Integration of Different Data

Analysis of a single omics data (e.g. genome, proteome, transcriptome and metabolome) provides biological understanding at a specific molecular layer. However, many agronomic and quality traits embrace complex crosstalk between various molecular layers viz., genome, transcriptome, proteome, and metabolome. These four ‘omes’ collectively generate the building blocks of systems biology. An integrative analysis of multiple layers of molecular data or system biology helps to discover and elucidate underlying molecular mechanisms of complex traits and thus, provide clues for genome designing. In the holistic study of the complex biological processes, it is imperative to have an integrative approach that combines multi-omics data to highlight the interrelationships of the involved biomolecules and their functions. Advent of high-throughput techniques and accessibility of multi-omics data from a large set of samples, a number of promising methods and tools have been developed for data integration and interpretation. Most of the biological databases collects and integrate data from different sources.

Databases namely INSDC, NCBI, SGN, NaDH, KEGG genome, EnsemblPlants, DAVID (Database for Annotation, Visualization, and Integrated Discovery), The BioStudies Database etc. are some of the integrated databases and resources that are collecting and integrating the omics data from different plants and tobacco (Table 10.13). As a collaborative foundational initiative, INSDC covers the spectrum of data raw reads, through alignments and assemblies to functional annotation, enriched with contextual information relating to samples and experimental configurations. NCBI in addition to providing wide-ranging tobacco basic data in its various databases, offers tools for integration of structural and functional genomic data and their annotations. The curated proteome data of Nicotiana species in Uniport, and metabolome data from different resources are being integrated in new datahubs like SGN, KEGG, NaDH etc., to provide holistic information from gene to pathway for the researchers. KEGG is an integrated database resource consisting of sixteen databases that includes genes and proteins, metabolites and other chemical substances, biochemical reactions, enzyme, disease-related network variations etc.

EnsemblPlants is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33). It provides data on genome sequence, gene models, polymorphic loci, and functional annotation. Also provide additional information on variation data comprising phenotype data, population structure, individual genotypes, and linkage. In each of its release, comparative analyses are performed on whole genome and protein sequences, and genome alignments and gene trees are made available showing the implied evolutionary history of each gene family. DAVID is a web-accessible database that integrates functional genomic annotations with intuitive graphical summaries. According to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership, lists of gene or protein identifiers are rapidly annotated and summarized. Numerous public sources of protein and gene annotation have been parsed and integrated into DAVID database. This database currently contains information on over 1.5 million genes covering more than 65,000 species. European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) is building and maintaining the BioStudies Database as a resource for encapsulating all the data associated with a biological study. One of the goals of BioStudies is to accept and archive data generated in experiments that can be characterized as "multi-omics".

These efforts of building archives, databases, and analysis tools in an integrated approach have been successful at enabling a better understanding as well as comparison of the omic resources of tobacco.

10.14 Brief Account on Social, Political and Regulatory Issues

10.14.1 Concerns and Compliances

Tobacco is one of the important high value commercial crops and is valued for its potential to generate higher farm income and employment to farmers and farm labors, and revenue to national governments. Today tobacco sector is reeling under the whirlpool of diametrically conflicting concerns relating to the livelihood security of those associated with tobacco production, processing and marketing on one hand and the serious health risks to consumers on the other. Use of huge quantities of forest wood as source of energy for flue-curing of FCV tobacco, environmental pollution caused by tobacco smoking and associated spitting habits of chewing tobacco are issues of serious concern to tobacco production. Further, climate change impacts, emerging biotic and abiotic stresses, pesticide residues, consumer preferences and tobacco regulatory policies are becoming increasingly complex and represent future challenges for tobacco cultivation (ICAR 2015).

WHO-Framework Convention on Tobacco Control (FCTC), with overwhelming membership of 182 countries, envisages non-price, price and tax measures to reduce the supply and demand for tobacco in the world. The FCTC signatory countries are under obligation to support the measures for reduction of supply and demand for tobacco. On May 31st every year, the world observes World No Tobacco Day (WNTD) promoted by the World Health Organization (WHO) with a primary focus on encouraging users to refrain from tobacco consumption and its related products for a period of at least 24 h.

10.14.2 Patent and IPR Issues

Researchers are developing various management strategies for minimizing crop yield losses at field level due to biotic stresses. This includes development of biotic stress resistant varieties through conventional breeding and biotechnological interventions. The advances in biotechnology and bioinformatics generated various genome-based tools, techniques, genes and gene constructs in the field of stress resistance (Dangl and Jones 2001). Intellectual property rights (IPR) for plants help protect investments made in research and development of new tobacco varieties (CORESTA 2005). In turn, this encourages further investment and helps continue the development of new varieties that increases economic returns throughout the tobacco supply chain. Over the past 25 years, an increasing number of governments and international organizations have enacted laws, regulations, or policies that acknowledge the need for IPR. While these provide protection to inventors or breeders, some of them may also recognize a farmer’s exemption for saving seed for their own use.

Patents offer the owner the exclusive right to make, use, sell, offer for sale, or import for those purposes a patented product. It also offers the rights to a patented process and to make, use, sell, offer for sale, or import for those purposes the direct product of the patented process. The ability to patent plant varieties is recognized in some countries, but in many countries like India it is disallowed. There have been few patents granted for tobacco varieties in the world (CORESTA 2005). Court decisions in the United States have permitted the use of utility patents to protect plant varieties, and to date the United States Patent and Trademark Office has issued over 200 utility patents with claims to seeds. European countries that are members of the European Patent Office (EPO) ban the patenting of plant varieties, but recent determinations support patent claims directed to plants of more than one variety. Thus, a utility patent and the ruling by the EPO may suggest a type of broad protection for a novel plant trait that is not recognized under plant variety protection (PVP) legislation. Rights granted under patents are also distinguishable from PVP in that exceptions are not allowed for using a patented variety for experiments, breeding purposes, or for private or non-commercial purposes. On the other hand, a patent provides explicit information on the processes and methods used to develop the variety, and it is often tied to a public notification of seed availability once the patent expires. The term of a patent varies across the world, but in the US a utility patent is valid for 20 years after the application date.

Recent breakthroughs in crop science, especially in the area of molecular biology, and plant genomics have opened unlimited number of opportunities in the way the tobacco plants can be utilized for commercial and medicinal uses. Genetic improvements in agronomic traits of crops such as yield attributes and, pest and disease resistance through molecular and plant genomic approaches has been accompanied by the great abundance of new patents issued in these fields. There is a scope for patenting of novel methods of making biotic stress resistant tobacco genotypes, methods of introgressing nucleic acid molecules associated with biotic stresses, genes conferring resistance to various tobacco pests, viruses, bacteria, fungi and nematodes (Hefferon 2010). Patenting activity in resistance genes in tobacco is initiated in 1992 and there is considerable progress in patenting from the year 2000 and it was more prominent from 2010 onwards (Prabhakararao et al. 2016). Majority of these patent documents (around 60%) are in the jurisdiction of USA and China.

The intellectual information generated in the frontier areas is available in non-patent and patent literature. Non-patent literature (NPL) consists of peer reviewed scientific paper, publications such as conference proceedings, databases (DNA structures, gene sequences, chemical compounds, etc.) and other literature (translation guides, statistical manuals, etc.). Nearly 80% of all the technical information available in the world is hidden in the patent documents and other IP assets (Prabhakararao et al. 2016). Patent mapping helps in retrieving and exploring the information protected in the intellectual property documents. Collections of patent documents are available in a number of patent information databases (https://guides.library.queensu.ca/patents/databases). Most patent offices provide free access to patent documents via public databases and some of the largest and most popular patent databases are given at Table 10.14. Patent information can be used to decide the patentability of an invention, avoid re-invention and infringement, provide the current state of the art in a given field of technology, find the latest trends in R and D being pursued by the peers and competitors etc.

Table 10.14 Patent databases

10.14.3 Disclosure of Sources of GRs, Access and Benefit Sharing

Genetic Resources (GRs) are a key source of numerous biotechnology innovations (Steward 2018). Historical studies reveals that less than 1% of species have provided the necessary basic resources for the progress of all civilization so far. Therefore, unexplored GRs expected to have certain potential value in further advancement of civilization. The main characteristic of bioprospecting is uncertainty as it is seldom possible to forecast which genes, species or ecosystems will turn out to be valuable in the future. With an aim to improve the sustainable use of GRs to protect biodiversity, and support benefit sharing with originating countries Access-Benefit Sharing (ABS) systems/regulations have been developed in the recent decades. ‘Convention on Biological Diversity’ (CBD) of 1992 serves as starting point in many countries for biodiversity conservation and use. As a supplementary agreement to the CBD, The Nagoya protocol of 2010 is aimed at improving the fair and equitable sharing of benefits arising out of the utilization of genetic resources. ABS systems vary widely from country to country. GR-rich countries tend to organize their ABS systems more strictly and focus on securing an equitable share of the benefits from the products developed through the use of GRs. In order to enhance the ABS compliance, over the years, several governments have introduced disclosure requirements (DRs) in their patent systems. However, the study conducted by Steward (2018) in Brazil and India indicated that DRs might enhance the R&D costs with increased uncertainty and delay in using GRs. The uncertainty may relate to unclear definitions of GRs (Ex.: Brazil, India) and to the fact that the content of the Disclosure Requirements is not verified by the IP authorities in these countries. This may provides ample scope for challenging patents on ABS conditions even after approval. The extent of the DR-effects on R&D cycles depends largely on local market conditions and (efficiency of) ABS legislation (Steward 2018).

10.14.4 Farmers’ Rights

Since the dawn of agriculture, farmers around the world have been the custodians and innovators of agricultural biodiversity (FAO 2017; Craig 2004). Farmers are responsible for collecting the best seeds and cultivating different types and species of tobacco throughout the world. Through the careful selection of their best seeds and propagating material, and exchange with other farmers, it became possible to develop and diversify crop varieties. Over thousands of years of constant management and innovation by farmers, a small number of initial crops and varieties have progressed into an incredible treasure of plant diversity for food and farming. In certain countries like India, households traditionally raising different tobacco landraces in their kitchen gardens since generations from the seeds collected from their own crops, thus maintaining and protecting biodiversity.

Farmers’ access to seed and propagating material and opportunities for exchanging the planting material are strongly influenced by seed regulations (variety release and seed marketing regulations), legislation linked to intellectual property rights (patents and plant breeders’ rights), and regulations concerning the bio-prospecting of genetic resources. Farmers’ Rights are the rights arising from the contributions of farmers in conserving, improving, and making available plant genetic resources, mainly those in the centers of origin/diversity, in the past, present and future.

The notion of Farmers’ Rights was developed during the early 1980s to counter increased demands for Plant Breeders’ Rights (PBR) being voiced in international negotiations. The aim was to draw attention to the unremunerated innovations of farmers, which were seen as the foundation of all modern plant breeding. The concept first emerged in international negotiations within FAO in 1986. Already in 1987, practical solutions were being proposed, serving as the foundation for all further negotiations on Farmers’ Rights, and providing substantial input to the framing of current understanding of the issue. In 1989, Farmers’ Rights gained formal recognition by the FAO Conference. In 1991, the Conference decided to set up a fund for the realization of these rights, but this has never materialized. In May 1992, the Convention on Biological Diversity (CBD) was adopted with a resolution on the interrelationship between the CBD and the promotion of sustainable agriculture. Through this resolution, FAO was advised to initiate negotiations for a legally binding international regime on the management of plant genetic resources, and in this context, to resolve the question of Farmers’ Rights. Agenda 21, a dynamic program approved at the UN Conference on Environment and Development held in Rio de Janeiro in 1991, had voiced similar demands. In November 1996, Global Plan of Action for the Conservation and Sustainable Utilization of Plant Genetic Resources for Food and Agriculture (hereafter referred to as the Global Plan of Action) was endorsed by the FAO Council, by the Conference of the Parties to the CBD, and by the World Food Summit at FAO acknowledges the need to realize Farmers’ Rights. The Second Global Plan of Action, prepared under the aegis of the Commission on Genetic Resources for Food and Agriculture, was adopted by the FAO Council on 29 November 2011 contains a set of recommendations and activities intended as a framework, guide and catalyst for action at community, national, regional and international levels. The International Treaty on Plant Genetic Resources for Food and Agriculture (hereafter referred as the International Treaty) adopted in 2001 addressed the issue of Farmers’ Rights in its Article 9, and in its Preamble. This Treaty recommends the Contracting Parties to protect and promote Farmers’ Rights in agreement with their national laws. In Article 9, the Contracting Parties of the International Treaty recognize the enormous contribution that farmers of all regions of the world have made, and will continue to make, for the conservation and development of plant genetic resources as the basis of food and agricultural production throughout the world. The responsibility for implementing Farmers’ Rights lies with national governments, and that they can choose the measures to do so as per their needs and priorities. Measures covering the protection of traditional knowledge, benefit-sharing and participation in decision-making are suggested. Also, the farmers rights to save, use, exchange and sell farm-saved seeds and planting material are addressed, but without giving any specific direction for execution. In addition, Farmers’ Rights are addressed in the preamble, and other articles in the Treaty clearly support these rights, albeit not explicitly (for example, the provisions on conservation and sustainable use and on benefit sharing). There are no legally binding provisions in the International Treaty on how to implement Farmers’ Rights at national level.

10.14.5 Traditional Knowledge

Indigenous peoples and local communities have to cope with extreme weather and adapt to environmental change for centuries in order to survive as often they live in harsh natural environments (Swiderska et al. 2011). They use their traditional knowledge (TK) generated using long standing traditions and practices in relation to adaptive ecosystem management and sustainable use of natural resources. Hence, the diversity of traditional varieties sustained by farmers around the world is increasingly valuable for adaptation to climate changes and to cope with emerging new pests and diseases, particularly as modern agriculture relies on a very limited number of crops and varieties. Landraces or traditional varieties are genetically more diverse than recent varieties and considered to be good sources of resistance to biotic stresses (https://www.cbd.int/traditional/what.shtml). Local communities use wild foods to supplement their diets and thus conserve wild species which are valuable sources of stress resistant genes. The tradition farmers are well placed to identify resilient crop species and resistant varieties for biotic stresses with the available accumulated TK, and in view of their long experience in cultivating crops under changing climates. Traditional Knowledge about resilient properties, such as drought and pest resistance traits and biotic stress resistant varieties and wild crop relatives (Jarvis et al. 2008) can be a valuable information in developing biotic stress tolerant varieties in tobacco also.

10.14.6 Treaties and Conventions

International agreements like Convention on Biological Diversity (CBD), International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA), The International Union for the Protection of New Varieties of Plants (UPOV) are of special significance in the context of agricultural sector in general and biotechnology particular. They are briefly discussed below.

10.14.6.1 Convention on Biological Diversity (CBD)

The Convention on Biological Diversity (CBD), known informally as the Biodiversity Convention, is a legally binding international treaty and was ratified in 1992 at the Rio earth summit (https://www.cbd.int/convention/). The convention currently having 196 Parties (168 Signatures). The three main goals of the Convention are conservation of biological diversity (or biodiversity), sustainable use of its components and fair and equitable sharing of benefits arising from genetic resources. In other words, its objective is to develop national strategies for the conservation and sustainable use of biological diversity. It is considered as the key document with regards to sustainable development.

The convention, for the first time, recognized in the international law that the conservation of biological diversity is “a common concern of humankind” and is an integral part of the development process. All the ecosystems, species, and genetic resources are covered in the agreement. It associates traditional conservation efforts to the economic goal of sustainably using biological resources. It has put in place the principles for the fair and equitable sharing of the benefits arising from the use of genetic resources, notably those destined for commercial use.

The Contracting Parties shall promote international technical and scientific cooperation in the field of conservation and sustainable use of biological diversity, wherever necessary, through appropriate international and national institutions. In pursuance of the objectives of this Convention and in accordance with national legislation and policies, the Contracting Parties shall, encourage and develop methods of cooperation for the development and use of both indigenous and traditional technologies. For achieving this purpose, the Contracting Parties shall also promote cooperation in the training of personnel and exchange of experts.

In respect of handling of biotechnology and distribution of its benefits, Article 19 of the Convention states—

  • Each Contracting Party shall take legislative, administrative or policy measures, as appropriate, to provide for the effective participation in biotechnological research activities by those Contracting Parties, especially developing countries, which provide the genetic resources for such research, and where feasible in such Contracting Parties.

  • Each Contracting Party shall take all practicable measures to promote and advance priority access on a fair and equitable basis by Contracting Parties, especially developing countries, to the results and benefits arising from biotechnologies based upon genetic resources provided by those Contracting Parties.

  • The Parties shall consider the need for and modalities of a protocol setting out appropriate procedures, including, in particular, advance informed agreement, in the field of the safe transfer, handling and use of any living modified organism resulting from biotechnology that may have adverse effect on the conservation and sustainable use of biological diversity.

  • Each Contracting Party shall, directly or by requiring any natural or legal person under its jurisdiction providing the organisms referred to in paragraph 3 above, provide any available information about the use and safety regulations required by that Contracting Party in handling such organisms, as well as any available information on the potential adverse impact of the specific organisms concerned to the Contracting Party into which those organisms are to be introduced.

CBD has three Protocols viz. The Nagoya Protocol on Access and Benefit-sharing, The Cartagena Protocol on Biosafety and The Nagoya – Kuala Lumpur Supplementary Protocol on Liability and Redress to the Cartagena Protocol on Biosafety under CBD. The essence of these protocols is given below.

10.14.6.2 Cartagena Protocol on Biosafety

The Cartagena Protocol on Biosafety to the Convention on Biological Diversity is an international treaty governing the movements of living modified organisms (LMOs) resulting from modern biotechnology from one country to another (http://bch.cbd.int/protocol). It was adopted as a supplementary agreement to the Convention on Biological Diversity on 29 January 2000 and came in force on 11 September 2003. Currently, it has 173 Parties (103 Signatures). The objective of the Protocol is to contribute to ensuring an adequate level of protection in the field of the safe transfer, handling and use of ‘living modified organisms resulting from modern biotechnology’ that may have adverse effects on the conservation and sustainable use of biological diversity, taking also into account risks to human health, and specifically focusing on trans-boundary movements. The Protocol provides for Parties to enter into bilateral, regional and multilateral agreements and arrangements regarding intentional trans-boundary movements of living modified organisms.

The Protocol establishes a Bio-Safety Clearing -House to: (a) Facilitate the exchange of scientific, technical, environmental and legal information on, and experience with, living modified organisms; and, (b) Assist Parties to implement the Protocol, taking into account the special needs of developing country Parties, in particular the least developed and small island developing States among them, and countries with economies in transition as well as countries that are centres of origin and centres of genetic diversity.

10.14.6.3 Nagoya Protocol

The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization (ABS) to the Convention on Biological Diversity is a supplementary international agreement (https://www.cbd.int/abs/). The Protocol was adopted on 29 October 2010 in Nagoya, Japan and entered into force on 12 October 2014. Currently, it has 131 Parties (132 ratifications) (92 signatories). The protocol aims at fair and equitable sharing of benefits arising from the utilization of genetic resources, there-by contributing to the conservation and sustainable use of biodiversity. The protocol offers a transparent legal framework for the effective implementation of one of the three objectives of the CBD: the fair and equitable sharing of benefits arising out of the utilization of genetic resources. The Nagoya Protocol applies to genetic resources that are covered by the CBD, and to the benefits arising from their utilization. It also covers traditional knowledge (TK) associated with genetic resources that are covered by the CBD and the benefits arising from its utilization.

10.14.6.3.1 The Nagoya–Kuala Lumpur Supplementary Protocol on Liability and Redress to the Cartagena Protocol on Biosafety

This Supplementary Protocol entered into force on 5 March 2018 and currently, it has 49 Parties. Adopted as a supplementary agreement to the Cartagena Protocol on Biosafety, it aims to contribute to the conservation and sustainable use of biodiversity by providing international rules and procedures in the field of liability and redress relating to living modified organisms (http://bch.cbd.int/protocol/supplementary/). The Supplementary Protocol requires that response measures are taken in the event of damage resulting from living modified organisms which find their origin in a transboundary movement, or where there is sufficient likelihood that damage will result if timely response measures are not taken. This Protocol gives a definition of ‘damage’, referring to an adverse effect on the conservation and sustainable use of biological diversity that is measurable or otherwise observable and significant, taking also into account risks to human health. The Protocol requires that a causal link between the damage and the living modified organism be established. Along with imposing a requirement for response measures, the Protocol obliges Parties to continue to apply existing legislation on civil liability or to develop specific legislation concerning liability and redress for material or personal damage associated with damage to the conservation and sustainable use of biological diversity.

10.14.6.4 The International Treaty on Plant Genetic Resources for Food and Agriculture

This International Treaty was adopted in 2001 (FAO 2009). The objectives of the International Treaty on Plant Genetic Resources for Food and Agriculture are the conservation and sustainable use of all plant genetic resources for food and agriculture and the fair and equitable sharing of the benefits arising out of their use, in harmony with the Convention on Biological Diversity, for sustainable agriculture and food security. This legally-binding Treaty is in harmony with the Convention on Biological Diversity and covers all plant genetic resources relevant for food and agriculture. The Treaty is vital for ensuring the continuous availability of the plant genetic resources for feeding people from different countries. The Treaty recognizes the enormous contribution that the local and indigenous communities and farmers of all regions of the world and takes measures to protect and promote Farmers’ Rights. The Contracting Parties agree to establish a multilateral system, which is efficient, effective, and transparent, both to facilitate access to plant genetic resources for food and agriculture, and to share, in a fair and equitable way, the benefits arising from the utilization of these resources, on a complementary and mutually reinforcing basis. The treaty takes care of (a) protection of traditional knowledge relevant to plant genetic resources for food and agriculture; (b) the right to equitably participate in sharing benefits arising from the utilization of plant genetic resources for food and agriculture; and (c) the right to participate in making decisions, at the national level, on matters related to the conservation and sustainable use of plant genetic resources for food and agriculture.

10.14.6.5 The UPOV Convention

For the protection of plant variety, The UPOV Convention was adopted in Paris in 1961 and entered into force in 1968. It was first revised slightly in 1972 and more considerably in 1978 and 1991. The 1978 Act entered into force in 1981, and the 1991 Act in 1998 (www.upov.int). The Convention established an inter-governmental organization called the Union Internationale pour la Protection des Obtentions Végétales (UPOV). Internationally, the most prevalent means of protecting plant varieties is through Plant Variety Protection (PVP), which is enabled in the 77 countries that are members of the International Union for the Protection of New Varieties of Plants which is headquartered in Geneva, Switzerland. The stated mission of UPOV’s is “To provide and promote an effective system for plant variety protection, with the aim of encouraging the development of new varieties of plants for the benefit of society”. The current act of the convention was adopted in 1991 and recognizes breeder’s rights to a variety if the variety is: (1) new; (2) distinct; (3) uniform; and (4) stable. The breeder’s rights in the 1991 Act require authorization of the breeder to perform the following: (1) production or reproduction (multiplication) (2) conditioning for the purpose of propagation, (3) offering for sale, (4) selling or marketing, (5) exporting, (6) importing, (7) stocking for any purpose mentioned in (1)–(6) above. Breeder’s rights to a variety remain in effect for a period of 20 years from the date on which the rights were granted.

For the first time, the Act of 1991 included protection against “essentially derived” varieties, which are derived from the protected variety, that is not clearly distinguishable from the protected variety, or which requires repeated use of the protected variety for production purposes. Examples of the manner in which an essentially derived variety could be developed from a protected variety include: (1) the selection of a natural or induced mutant, or of a somaclonal variant, (2) the selection of a variant individual from plants of the initial variety, (3) backcrossing, (4) transformation by genetic engineering. Exceptions to breeder’s rights were granted for: (1) Acts done privately and for non-commercial purposes, (2) Acts done for experimental purposes (3) Acts done for the purpose of breeding other varieties, except for the generation of essentially derived varieties.

10.14.7 Participatory Breeding

Conservation of diversity and selection of naturally occurring high yielding and stress resistant variants by cultivators was the principal method of tobacco improvement, for thousands of years prior to 1800s, during and after the domestication of Nicotiana species. The systematic varietal improvement started by scientists in the later period in established research organizations in different countries led to release of number of biotic stress resistant tobacco varieties using conventional plant breeding techniques. This has resulted in erosion of genetic diversity in the farmers’ fields due to large scale cultivation of few improved varieties. The resultant genetic vulnerability to diseases and pests necessitates the development of stress resistant cultivars.

In view of the limitations to formal breeding and the threats to farmers’ seed systems, participatory plant breeding (PPB) emerged as a means to bring farmers back into the breeding process as active participants (Greenberg 2018). The role played by farmers in agricultural biodiversity conservation and use is taken as an advantage while making them partners in breeding tobacco varieties. In the development of improved biotic stress resistant varieties, PPB ensures the improvement of adapted local genetic materials using the diversity available either with them or public gene banks to suit the farmer needs having resistance to races and biotypes prevailing under their conditions. This also empower the farmer in terms of technical and organizational skills in preserving and evolving materials under their control, on-farm management, and local creativity/innovation. PPB involves the active participation of farmers in few or all of the set of sequenced breeding program activities viz. priority setting, genetic materials acquisition and selection, crossing, selection at early stages and advanced stages, in situ experimentation/testing, and production and sharing of genetic materials and knowledge. However, PPB may be a substitute for station-based research or scientist-managed on-farm trials; rather than a complementary breeding process (Hardon et al. 2005; Aguilar-Espinoza, 2007; Ceccarelli et al. 2009).

10.15 Future Perspectives

10.15.1 Potential for Expansion of Productivity

The genetic potential for increasing the crop yield through conventional approaches is still obtainable in tobacco (Sarala et al. 2016). Combination of traditional breeding techniques with genome designing strategies can further accelerate the genotype improvements in tobacco for increasing and stabilizing yields. Further advances in genomic research in terms of genomic, transcriptomic, proteomic and metabolomics and their integrated analysis would assist in designing of appropriate genome assisted breeding strategies for genome designing of tobacco for attaining maximum potential yields with good quality and pests and disease resistance in a short span of time.

10.15.2 Potential for Expansion into Nontraditional Areas

The genus Nicotiana with well-defined group of species of which tobacco (N. tabacum L.) is an important agricultural crop plant that plays a significant role in the economies of many countries (FAO 2019). Nicotiana species are also used in the elucidation of various principles related to disease resistance, synthesis of secondary metabolites and basic aspects of plant physiology. In view of its higher level of biomass accumulation, tobacco is considered to be a promising crop for the production of commercially important substances (e.g., medical drugs and vaccines) through molecular farming and cultivation of tobacco for its valuable native phyto-chemicals viz. nicotine, solanesol, proteins and organic acids (Sarala 2019).

Tobacco is considered to be one of the most important model systems in plant biotechnology till date and going to continue further. In view of its easy transformation ability, tobacco plant serving as an experimental system various pilot studies on the expression of novel transgenes that are later being used in important food crops (Krishnamurthy et al. 2008), for the study of polyploidy, and for investigation of secondary product biosynthesis (Wang and Bennetzen 2015). Scientific research in Nicotiana going to be accelerated in a wide range of areas with the availability of entire genome sequences and transcriptomic profiles. As Nicotiana species are found to be one of the best plants for transient transgene expression in leaf via simple and A. tumefaciens infiltration, better knowledge of the Nicotiana genome will provide necessary raw material for studying the function of any transformed gene.

Both wild and domesticated forms of tobacco accumulate a wide variety of alkaloids and other phytochemicals, the content and composition of which vary among species. The work done so far has brought out the tremendous scope for exploiting the crop for extraction of many valuable native phytochemicals viz. nicotine, solanesol, organic acids, proteins (green leaf) and seed oil having pharmaceutical and industrial importance (Chakraborty et al. 1982; Chida et al. 2005; Narasimha Rao and Prabhu 2005; Patel et al. 1998). Solanesol is a major component of tobacco (from traces to 4.7%) and is used as an intermediate in the production of valuable pharmaceuticals viz. co enzyme Q9, co enzyme Q10, Vitamin K2, Vitamin E and N-solanesyl-N, N′-bis (3,4-dimethoxybenzyl) ethylenediamine (SDB) which has the potential for use in the treatment of migraines, osteoporosis, neurodegenerative diseases, hypertension, cardiovascular diseases, anti-aging, to improve brain health and as dietary supplement for type 2 diabetics (Sarala 2019). While, Nicotine is the principal alkaloid synthesized in roots and accumulated in the leaf (0.1–5%). Nicotine can be used as a pesticide for controlling many agriculturally important pests and in pharma industry as smoking cessation products. Fraction-1 protein is the most abundant protein in tobacco and constitutes about 50% of soluble protein and 25% of total protein. This can be used for manufacturing food supplements. Tobacco leaf contains malic acid (4.0–4.5%) and citric acid (0.5–2.0%). These acids can be extracted from leaf and can be used for solubilisation of Rock phosphate and as foods and beverages. Tobacco seed contains 32–42% oil which can be used as edible oil and for paint and soap industry. Thus, with the potential to generate considerable volumes of biomass per unit area, tobacco genotypes can be bred for higher amounts of native phytochemicals viz. nicotine, solanesol, organic acids, proteins (green leaf) and seed oil for pharmaceutical and industrial purposes.

Tobacco has an established history as a routine system for molecular farming and often is chosen as a production platform due to its easy genetic modification. Other advantages of tobacco are established technology for gene transfer and expression, higher biomass yield, potential for rapid scale-up in view of prolific seed production, and availability of required infrastructure for processing. Although many tobacco cultivars produce high levels of toxic alkaloids, there are low-alkaloid ones that can be utilized for the production of pharmaceutical proteins (Ma et al. 2003).

Tobacco is the first and probably only plant, where plastid transformation is successfully established as a routine (Svab and Maliga 1993; Daniell et al. 2002). High level of transgene expression is possible with chloroplasts and hence, there is a remarkable potential for large scale production of active proteins trough molecular farming. TMV genome can be rapidly manipulated and can be used as a vector as it has the ability to rapidly infect the tobacco plant. The transgenic tobacco plants express the target protein transiently. Molecular farming in tobacco hairy roots is another option where it triggers the secretion of a pharmaceutical antibody. Generally recombinant pharmaceutical proteins expressed in hairy root cultures can be secreted into the medium to improve product homogeneity and to facilitate purification.

A transgenic tobacco was the first plant used to be used for recombinant human antibody production (in 1989); this was quickly followed by production of human serum albumin (1990). Furthermore, it is not used as food or feed and therefore it is easier to manufacture active compounds in tobacco without fear of these compounds mixing with food or feed. Important pharmaceutical proteins viz. Human Growth hormone, Human serum albumin, Erythropoietin, Human-secreted alkaline phosphatise, Collagen, Protein C, Granulocyte–macrophage colony-stimulating factor, Epidermal growth factor, Hepatitis B virus envelope protein, Escherichia coli heat-labile enterotoxin, Diabetes autoantigen Disease, Cholera toxin B subunit, Immunoglobulin G1, Immunoglobulin M and Secretory immunoglobulin A etc. have been successfully found to produce in Tobacco (Thomas et al. 2002; Ma et al. 2003). In a review, Ashraful et al. (2014) reported that the bio-diesel from tobacco seed oil could be successfully used to run diesel engine that gave excellent performance and most effective regulated emissions. Tobacco genotypes with high biomass and seed yield can be identified and tailored to biosynthesize and accumulate of plant derived storage lipids such as triacylglycerol that can acts as a sustainable, carbon–neutral alternative biofuels (Carlsson et al. 2011). Considering various potential use of tobacco in farming of important biomolecules, tailoring tobacco for molecular farming going to be an important objective for tobacco improvement programs.