Introduction

Research articles (RAs) and doctoral dissertations (DDs) constitute two important academic texts that most disciplinary communities use for the “dissemination and ratification of knowledge” (Koutsantoni, 2006, p. 20). RAs are major academic publications that are considered a professional genre (Kawase, 2015) through which new research results and contributions are reported and shared with the scientific community. Thus, RAs, written by more seasoned writers, constitute one of the most researched academic genres (Soler, 2007). A dissertation or thesis, terms that are derived from the difference between US and UK conventionsFootnote 1 (Kawase, 2018), is the “highest level of advanced academic literacy” (Thompson, 2005, p. 307) and the most advanced educational genre (El-Dakhs, 2018; Kawase, 2015). Specifically, DDs constitute a genre positioned on a continuum between student-generated coursework papers and RAs subject to rigorous peer review (Xiao & Sun, 2020) and are characterized by specific features and conventions inherent to the genre (El-Dakhs, 2018; Kawase, 2015; Koutsantoni, 2006; Swales, 2004). Previous studies have compared specific linguistic features of the two genres, such as hedges (Koutsantoni, 2006) and stances (Qiu & Ma, 2019), as well as certain sections, such as abstracts (El-Dakhs, 2018; Hyland & Tse, 2005) and introductions (Bunton, 2002; Kawase, 2015; Taş, 2008), revealing the existence of genre variations that impact the writing of each section. Therefore, variations within the same sections of RAs and DDs need to be examined in terms of their context (El-Dakhs, 2018).

Titles are an essential component of academic texts, including RAs and DDs. Titles represent a “highly condensed” (Wang & Bai, 2007, p. 397) small subgenre that plays a pivotal role in indicating the contents of a document, attracting the correct groups of readers, and showing the expertise and professional identities of the authors. These terms are “front matter and summary matter” and cover the key information in a text (Swales, 1990, p. 179). Scanning titles in databases, references, and catalogs is an “early decision point” (Swales, 1990, p. 222) for academics to decide whether to read further. After Swales (1990) claimed that titles had not been thoroughly studied as an academic genre, interest in RA titles increased dramatically in various aspects (e.g., Cheng et al., 2012; Haggan, 2004; Hartley, 2007a, 2007b; Kerans et al., 2020; Lewinson & Hartley, 2005; Pearson, 2020, 2021; Soler, 2007, 2011), including structure (e.g., Wang & Bai, 2007; Xie, 2020), content (e.g., Goodman et al., 2001; Sahragard & Meihami, 2016), length (e.g., Bramoullé & Ductor, 2018; Jiang & Hyland, 2023), and lexical density (e.g., Li & Xu, 2019; Nagano, 2015); additionally, titles in a single discipline (e.g., Fox & Burns, 2015; Kerans et al., 2016) and multiple disciplines (e.g., Ball, 2009; Moslehi & Kafipour, 2022) and of synchronic (e.g., Nair & Gibbert, 2016; Méndez et al., 2014) and diachronic (e.g., Chen & Liu, 2023; Guo et al., 2015; Jiang & Jiang, 2023; Whissell, 2012, 2013; Yang, 2019) studies have been a focus. A plethora of related research has demonstrated the prominence of titles in RA academic writing. An RA may be one of many publications that a researcher has produced, whereas the vast majority of scholars traditionally write only one dissertation (Finlay et al., 2012). PhD candidates, as academic beginners, are confronted with pressure to publish to sustain employment and progress in their professional trajectories. Recognizing the generic similarities and distinctions between DDs, which constitute an educational genre, and RAs, which belong to a professional genre, can yield greater benefits for their future scholarly publications. Compared with RA titles, DD titles have received limited attention, and cross-generic comparative studies of RA and DD titles have rarely been performed. Thus, whether the titles of academic works differ across genres has not been addressed (Gesuato, 2008; Soler, 2011).

Although brief, titles for DDs and RAs in reputed journals are challenging to write. They should be attractive, informative (Ball, 2009; Hartley, 2005; Pearson, 2021) and concise (Bahadoran et al., 2019; Jiang & Jiang, 2023; Kerans et al., 2020; Wang & Bai, 2007). Moreover, scholars must construct titles under specific disciplinary academic community conventions, as title variations across disciplines have been identified (Ball, 2009; Diao, 2021; Haggan, 2004; Hartley, 2007a; Hyland & Zou, 2022). Thus, “before advice can be given on title writing, extensive research needs to be conducted to determine the discourse conventions within and across different disciplines and fields” (Anthony, 2001, p. 193). To this end, in this paper, we attempt to fill this gap by performing an exploratory comparison between RA and DD titles in various disciplines and providing preliminary research on the length, punctuation usage, structure, and content information of titles. This study will act as a reference for PhD candidates, supervisors, journal reviewers and editors, as well as EAP researchers, contributing to the existing literature on titleology.

Literature review

A good title is a “concise statement of the main topic of the research and should identify the variables or theoretical issues under investigation and the relationship between them” (American Psychological Association, 2020, p. 31); thus, writers may spend an “inordinate amount of time, discussion and mental effort” (Swales, 1990, p. 222) on making their titles appropriate and attractive. The qualities of good titles are listed in some studies. For example, authors must keep titles concise, specific, descriptive, and neutral; avoid interrogative or declarative components; and avoid acronyms (Dewan & Gupta, 2016). Swales and Feak (2012, p. 379) state that titles should be “self-explanatory,” and Bahadoran et al. (2019) propose that a well-written title is informative, accurate, specific, concise, and unambiguous. Conciseness and informativeness are frequently reported as vital for generating good titles. These two characteristics are indicated by an essential feature of titles: length.

Thus, title length is frequently studied as a significant factor influencing the scholarly impact of RAs. Contradictory conclusions have been reached concerning title length and RA citation numbers. Positive correlations have been observed between title length and citation number (e.g., Braticevic et al., 2020; Habibzadeh & Yadollahie, 2010; Jacques & Sebire, 2009). Longer titles are more informative because they contain more keywords, making articles more accessible for readers to retrieve from databases. However, opposing findings have been reported in which shorter titles, which are succinct and straightforward, were shown to attract more views, downloads, and citations (Gnewuch & Wohlrabe, 2017; Hallock & Dillner, 2016; Jamali & Nikzad, 2011; Paiva et al., 2012). Therefore, writers often face the dilemma of balancing title conciseness and informativeness.

Heterogeneity has been observed in the length of titles within the soft and hard sciences. Some studies have demonstrated that titles in hard disciplines are longer than those in soft disciplines (Nagano, 2015; Soler, 2007), while others have revealed the opposite (Hyland & Zou, 2022; Lewison & Hartley, 2005). One explanation for this is the discrepancy between disciplines and journals, as the conventions of particular journals influence the results (Hyland & Zou, 2022). Another reason is that title patterns might change; thus, studies in different years could yield different results.

Accordingly, diachronic studies of titles emerged over the last decade. Whissell (2013) analyzed titles in psychology over 45 years and found an increasingly frequent use of longer titles. This trend has also been observed in other disciplines, such as applied linguistics (Jiang & Jiang, 2023), pragmatics (Li & Xu, 2019), linguistics and literature (Xiang & Li, 2020), and economics (Guo et al., 2015). Jiang and Hyland (2023) confirmed that titles in the six disciplines selected have been increasing in length, and the number of compound titles has increased in softer sciences over the past 20 years. Furthermore, the titles of articles in applied linguistics journals increasingly provided more information about the research method/design from 1975 to 2015 (Sahragard & Meihami, 2016). Although Méndez et al. (2014) noted the same title length findings, they also observed an opposite trend regarding title structure and content information in astrophysics between 1998 and 2012; namely, more nominal phrase titles instead of compound titles were used, and more information about the results was included instead of information pertaining to methods. The aforementioned studies show that RA titles appear to have increased in length and informative nature over recent decades but with disciplinary variations.

In addition, disciplinary variations in the structure and content of titles have been discussed. NP structures are claimed to be prevalent in both soft (e.g., Nagano, 2015; Soler, 2011; Xie, 2020) and hard sciences (e.g., Diao, 2021; Morales et al., 2020; Wang & Bai, 2007). Moreover, researchers in soft disciplines have frequently used compound titles with colons as the major linking devices (e.g., Hartley, 2007a, 2007b; Jalilifar, 2010). Haggan (2004), for example, showed that science papers have the lowest percentage (21.5%) of compound titles, while this type appears three times more often in literature papers (60.8%). Furthermore, the information presented in titles varies across disciplines. Goodman et al. (2001) noted that the contents of RA titles of medical journals include “Topic only” and “Methods/design,” “Dataset,” “Results,” and “Conclusion,” and the first two types of information are most common. Kerans et al. (2020) explored titles in top-ranked general and specialty clinical medicine journals and reported massive variations regarding whether methods and results were mentioned. In contrast, in astrophysics RA titles, with minor variations, “Methods” appear less frequently than “Purpose” and “Results” (Méndez et al., 2014).

The above studies indicate a wealth of exploration of RA titles within one discipline or across several disciplines synchronically or diachronically, contributing to our knowledge of title characteristics. Cross-generic studies comparing RA titles with other academic titles have been conducted recently; for example, comparative studies on the titles of RAs and review articles (Kerans et al., 2016; Soler, 2007, 2011), conference papers (Hartley, 2007a), case reports (Salager-Meyer et al., 2013), and theses and dissertations (Jalilifar, 2010). The findings of these studies confirm the cross-generic variation in titles. Soler’s (2007, 2011) surveys on research and review paper titles are noteworthy. She examined the structural construction of titles in the biological sciences and social sciences, reporting that structure is the distinctive feature between the titles of these two genres (Soler, 2007). Continuing the previous study, she investigated crosslinguistic and cross-generic differences between English and Spanish title constructions. The results revealed variations in title length depending on discipline and language, indicating that the predominance of nominal-group titles is not confined to discipline, genre, or language (Soler, 2011).

A DD is regarded as “the longest and most challenging piece of assessed writing” (Thompson, 2013, p. 284), with few graduate students ever encountering a comparable task. Consequently, DDs hold significant academic value and have garnered increasing amounts of attention in recent decades (Qiu & Ma, 2019; Zhou & Jiang, 2023). As the most advanced educational genre (Kawase, 2015), the DD is an “under-theorized, under-studied, and under-taught text” (Paré et al., 2009, p. 179) that embodies a complex and multifaceted nature (Paré, 2019) characterized by distinctive requirements and traditions that diverge from those of RAs. Swales (2004) noted that DD introductions present a general framework that resembles RAs. Accordingly, comparative analyses of certain sections of these two genres have been performed. For example, Kawase (2015) investigated the utilization of metadiscourse in the introductions of DDs and subsequently published RAs, revealing that a substantial number of authors exhibit a heightened frequency of metadiscourse use in their article introductions. As asserted by the author, these differences can be attributed to the characteristics of DDs as an educational genre and RAs as a professional genre where authors must navigate intense competition to have their manuscripts accepted for publication. From a genre perspective, El-Dakhs (2018) compared 200 abstracts of DDs from renowned American and British universities with 200 abstracts of RAs in journals with high rankings and concluded that abstracts of DDs and RAs exhibit dissimilarities stemming from their representation of distinct genres, thereby exerting a significant influence on the composition of all sections and underscoring the significance of investigating disparities among corresponding segments of DDs/RAs in relation to their contextual factors. Previous studies on DD titles have been cross-disciplinary or cross-generic, focusing on the differences and similarities between the predominant title forms (Jalilifar, 2010; Soler, 2018). Jalilifar et al. (2010) performed a comparative study on MA and Ph. D theses titles and RA titles in linguistics, reporting that the thesis titles seemed to have a higher level of informativeness, whereas RA titles exhibited a wider range of structural diversity. Although these studies have revealed specific titular characteristics of DDs, it is noteworthy that no prior comparative studies on DD and RA titles across various academic fields have been conducted. Hence, we aim to address the following research questions:

  • (1) What are the word lengths of RA and DD titles by discipline?

  • (2) Does the use of punctuation marks highlight disciplinary and generic differences?

  • (3) Are there disciplinary and generic similarities or differences in the structure and content information of these titles?

Data and methods

Data collection

A bespoke corpus of 1600 titles was constructed and included two subcorpora: RA titles and DD titles (800 per group). Two hundred titles were chosen for each of the following selected disciplines under the categories recognized by SCImagoFootnote 2 (Hyland & Zou, 2022): linguistics (Ling), economics (Econ), medicine (Med) and computer science (CS). These were representative of soft (e.g., Gesuato, 2008; Haggan, 2004; Nagano, 2015; Soler, 2007; Xie, 2020) or hard (e.g., Anthony, 2001; Soler, 2011; Wang & Bai, 2007; Yitzhaki, 2002) disciplines, and research was conducted on RA titles in these disciplines, facilitating a comparative analysis of the present study with existing research findings. The titles were chosen from original research articles and doctoral dissertations published between 1 January 2017 and 31 December 2021. Four contemporary high-performing journals (see Table 1) in each discipline and DDs from prestigious universities in the US and UK were selected considering the following parameters:

Table 1 Selected journals for the RA title subcorpus

RA titles:

  • (1) RAs from top-tier journals with high impact factorsFootnote 3 ranging from 14.4 to 168.9 in hard disciplines and 2.3 to 13.7 in soft disciplines (Web of Science) were included. The hard-discipline journals are all SCIE-indexed (Science Citation Index—Expanded), while the soft-discipline journals are SSCI-indexed (Social Sciences Citation Index).

  • (2) Titles from research articles were included, and reviews, perspectives, editorials, and discussions were excluded.

DD titles:

  • (1) DDs were included from US and UK universities among the top 200 in “QS World University Rankings 2022”.Footnote 4

  • (2) The main subject of the DDs was limited to the terms of the included disciplines (“linguistics,” “economics,” “medicine,” and “computer science”); interdisciplinary DDs involving other disciplines were excluded.

We applied stratified random sampling to select and download all the titles from the Web of Science (WoS) and ProQuest databases and cross-checked the data manually to ensure that they conformed to the abovementioned parameters. RAs and DDs from prestigious journals and universities were selected to guarantee that the chosen works effectively exemplified high standards of writing norms and conventions in their respective fields while also receiving approval from esteemed authorities within the academic community, such as journal editors, reviewers and university professors, whether they serve as readers or examiners (El-Dakhs, 2018). The quality of RAs is typically judged by the journals in which they are published, which provides a more representative measure of good RAs according to discipline than does including titles from random journals of varying quality and uncertain value in the field (Jiang & Hyland, 2023). The writers’ identities, whether they were native or nonnative speakers, were considered inconsequential because their writings garnered approval from esteemed gatekeepers within the academic community, indicating adherence to established quality standards in the field. Thus, differentiating the cultural backgrounds of the writers was not prioritized in this study, as all accepted RAs and DDs met the necessary criteria for publication or examination (El-Dakhs, 2018). This approach is applicable to the present research, which focuses on the established norms of RAs and DDs in accordance with the consensus among members of the academic community.

Methods

All the RA titles were downloaded from WoS, and DD titles were collected from ProQuest and stored in a Microsoft Excel spreadsheet for coding and further analysis. The title word count, coding of punctuation marks, structure, and content information types were all assessed using Excel. The corpus was coded manually and searched or calculated automatically using formulas in the software. The researcher coded, classified and examined the data twice with a time interval of two weeks. Only eight discrepancies appeared, and these were resolved by discussion with two professors in applied linguistics. We used SPSS 22.0 to determine the statistical significance of the results. Specifically, chi-square tests were used.

The data analysis procedure followed the sequence of four title features: length, punctuation usage, structure, and content information type. The length of the titles was determined by the “word count formula” in Excel. Abbreviations, acronyms, and hyphenated words were counted as one word (Li & Xu, 2019; Milojevic, 2017; etc.), while words with an apostrophe were automatically counted by Excel as two words in some cases (e.g., “Teachers’ Perceptions”) and one word in other cases (e.g., “Student’s). The counting results were checked manually by the authors. A few examples are as follows:

Consumption of ultra-processed foods and cancer risk: results from NutriNet-Sante prospective cohort (Med RA) (12 words)

Investigating Syntactic Complexity in EFL Learners' Writing across Common European Framework of Reference Levels A1, A2, and B1 (Ling RA) (18 words)

A three-fold approach to the imperative's usage in English and Dutch (Ling RA) (11 words)

The punctuation marks were searched and counted in spreadsheets, and ten punctuation mark types were identified. The number of each type is the number of title(s) containing the particular punctuation, as in the following examples:

Reconfigurable magnetic microrobot swarm: Multimode transformation, locomotion, and manipulation (CS RA) (colon; comma)

Modernizing Retailers in Emerging Markets: Investigating Externally-Focused and Internally-Focused Approaches (Econ DD) (colon; hyphen)

Title structure was classified based on preliminary research and previous studies (e.g., Diao, 2021; Haggan, 2004; Li & Xu, 2019; Soler, 2007; Wang & Bai, 2007; Xie, 2020). The following five types were identified: nominal phrase (NP), compound (CP), full sentence (FS), verbal phrase (VP), and prepositional phrase (PP) titles. See Table 2.

Table 2 Classification of title types by structure

The category of content information shows the information type provided by the titles. Based on the framework of Goodman et al. (2001) (Li & Xu, 2019; Sahragard & Meihami, 2016; Xiang & Li, 2020, etc.), the titles were divided into Topic Only, Method/design, Dataset, Results, and Conclusion titles. Table 3 shows the descriptions of each category. One title can include several types of information listed in the framework above; such titles were counted separately, as in the following example:

Association between active commuting and incident cardiovascular disease, cancer, and mortality: prospective cohort study (Med RA) (Methods/design + Results)

Table 3 Framework for examining titles according to content information (adapted from Goodman et al., 2001 and Sahragard & Meihami, 2016)

This title contains information on Methods/design (“prospective cohort study”) and Results (“association”).

Results and discussion

Title length

Title length is considered crucial for RA retrieval and subsequent citation and for DDs to pass the examiners’ review, thereby enhancing the academic significance of these works and aligning with the requirements of journals or dissertation committees. Longer titles provide a more comprehensive description of studies, increasing their discoverability in online literature searches and enhancing their potential impact (Habibzadeh & Yadollahie, 2010; Jacques & Sebire, 2009); however, other studies have shown that shorter titles may also imply a more precise focus and have the potential to attract a greater number of views and citations (Jamali & Nikzad, 2011; Paiva et al., 2012). As the aforementioned studies focused on RA titles across various disciplines, it became apparent that the average length of titles is heavily influenced by disciplinary conventions. Our study confirms disciplinary variations in title length for both RAs and DDs across four disciplines. Additionally, generic variations were detected in this study.

Overall, the average length of titles in Med and Ling is longer than that of titles in CS and Econ within the two respective genres (shown in Table 4), which is partly in line with the observations of Haggan (2004) and Soler (2007, 2011) that the average length of RA titles is longer in natural science than in social science. Specifically, the titles of Med RAs (average length: 16.44 words) were the longest among the four disciplines, while the titles of Econ DDs (average length: 7.06 words) were the shortest.

Table 4 Length of titles across disciplines and genres

The rationale behind the variations in preferences for title length likely resides within disciplinary community conventions and the requirements set by academic journals. For example, Med RAs, which have the longest titles on average, tend to have an analogical structure, as shown below:

Neoadjuvant atezolizumab in combination with sequential nab-paclitaxel and anthracycline-based chemotherapy versus placebo and chemotherapy in patients with early-stage triple-negative breast cancer (IMpassion031): a randomised, double-blind, phase 3 trial (28 words) (Med RA)

Pembrolizumab versus paclitaxel for previously treated, advanced gastric or gastro-oesophageal junction cancer (KEYNOTE-061): a randomised, open-label, controlled, phase 3 trial (20 words) (Med RA)

Filgotinib as induction and maintenance therapy for ulcerative colitis (SELECTION): a phase 2b/3 double-blind, randomised, placebo-controlled trial (17 words) (Med RA)

This structure contains dense information and a more specialized vocabulary, distinguishing one study from other articles and attracting a significant number of authors working in the field (Hyland & Zou, 2022), representing the journal’s tradition.

From a cross-generic perspective, the average length of RA titles was longer than that of DD titles in all four selected disciplines. Genre-specific features and requirements may explain these findings. For RAs, given the growing competition to captivate readers and garner citations, authors and editors need to be aware of the significance of informative titles (Jiang & Jiang, 2023). In contrast, DDs, which function as “a final examination in a long student career” (Johns & Swales, 2002, p. 16), should comply with the guidelines of their respective universities; for example, the “Guidelines for the PhD Dissertation” of Graduate School of Arts and Sciences in Harvard University states that “the title should be as concise as possible, consistent with giving an accurate description of the dissertation.Footnote 5” Thus, DD titles could be more concise and shorter. Taking the titles of Econ DDs, which have the shortest average length, as an example, we frequently observe the following structures:

Essays on Economics of Immigration (5 words)s (Econ DD)

Essays on Finance and Macroeconomics (5 words) (Econ DD)

Essays on Healthcare Economics (4 words) (Econ DD)

An analysis of the abstracts and contents of these DDs revealed that they were “topic-based” DDs, which “typically commence with an introductory chapter which is then followed by a series of chapters which have titles based on sub-topics of the topic under investigation” (Paltridge, 2002, p.132). Within the community of economics, titles tend to follow the structure of “essays on + (a relatively broad) topic”.

Punctuation marks

As shown in Table 5, the titles in soft disciplines included 463 (57.9%) with punctuation marks and 337 (42.1%) without; in the hard disciplines, 478 titles (59.8%) included punctuation, and 322 (40.3%) did not. The chi-square test was used to check for statistical difference, and the results demonstrated that overall, there was no significant difference (Pearson chi-square = 0.581, p = 0.446, p > 0.05) between soft and hard disciplines in terms of punctuation marks. Generally, from the cross-disciplinary perspective, titles share more similarities than differences in punctuation mark use since more titles have punctuation. Nevertheless, Econ and CS DDs are exceptions: the number of titles without punctuation is greater than that of those with punctuation.

Table 5 Titles with/without punctuation marks

Punctuation in titles functions to “coordinate structures, negotiate text space, and express authors’ intention and emotions” (Diao, 2021, p. 6051). Thus, punctuation is an indicator of title complexity: the higher the percentage of punctuation usage is, the more complex the title. Table 5 also shows distinct differences between the two genres, as the percentage of titles with punctuation marks among RAs is greater than that among DDs across all four disciplines. The statistical test revealed a significant difference (Pearson chi-square = 100.133, p = 0.000, p < 0.05) between RAs and DDs regarding punctuation use in titles; overall, in this context, RA titles are more complex than DD titles.

Ten punctuation marks were identified from the corpus: colon, hyphen, comma, parenthesis, apostrophe, quotation mark, question mark, slash, period, and exclamation point. The apparent disciplinary differences between the soft and hard disciplines are shown in Table 6. Punctuation marks were more varied in soft disciplines, especially linguistics; nine and ten punctuation types were observed in the Ling RA and Ling DD titles, respectively. Titles in both genres of Econ and Med included a similar number of punctuation types, while titles of CS DDs had the fewest punctuation types (three out of the ten). From the cross-generic perspective, punctuation usage is similar across RAs and DDs in all four disciplines; for example, the same seven punctuation types appear in Econ RAs and Econ DDs.

Table 6 Usage of punctuation marks

In addition to punctuation type, the frequency of each type varied between soft and hard disciplines. The most recurrent punctuation in the soft disciplines was the colon, and the second most frequent was the hyphen. In contrast, in all four hard disciplines, more RA titles than DD titles had colons or hyphens.

Colons connect components in compound titles (Morales et al., 2020) and are frequently used in titles (Haggan, 2004; Hartley, 2007a, 2007b; Soler, 2011). The popularity of colons in titles is probably because titles with colons are more informative than those without (Lewinson & Hartley, 2005). Authors in both genres across all disciplines prefer to use colons to compile compound titles to achieve informativeness.

Hyphens account for a relatively high percentage of punctuation marks in titles across all disciplines and genres. Hyphens combine two or more words to create new words, for example, “Data-Driven,” “Alpha-T-Catenin,” and “Quasi-experiment.” The use of hyphens can be interpreted as increasing lexical density and complexity to achieve informativeness and conciseness. Hyphens also appear as non-alphanumeric characters in compound titles to relate the two components, serving the same function as colons, increasing the use of this type of punctuation in titles.

The comma is the third most frequent punctuation mark in titles in most disciplines and genres except in Econ and Ling RAs. The comma functions like a conjunction, joining a series of words to show the parallel components in titles. The relatively high-frequency use of commas indicates the multiple objects of study, as the following examples show.

Essays on Asset Pricing, Portfolio Choice, and International Finance (Econ DD)

Abstraction, Generalization, and Embodiment in Neural Program Synthesis (CS DD)

Assessing Epidemiology, Treatment Selection, and Outcomes of Head and Neck Merkel Cell Carcinomait (Med DD)

The apostrophe in Ling RAs, parenthesis in Med RAs, and question mark in Econ RAs demonstrate the highest frequency among the remaining punctuation types. Apostrophes transform different words into plurals, contractions, and possessives, making the titles more logical. Among Ling RAs, the possessive apostrophe is most commonly used; for example, “students’,” “teachers’” and “children’s” represent the experimental group or participants of the studies. Parentheses are used to convey technical information in titles, most often to introduce acronyms:

Accuracy of the Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis (Med RA)

Strengthening the reporting of observational studies in epidemiology using mendelian randomisation (STROBE-MR): explanation and elaboration (Med RA)

Parentheses frequently appear in Med RA titles, indicating that medicine scholars tend to use abbreviations to avoid redundant and lengthy repetition of the technical terms. A question mark indicates that a sentence is a question and is found in CP or FS titles. The use of question marks in titles has three motivations: “to formulate the central issue of the study,” “to stimulate interest,” and “to express the dubious nature of results” (Ball, 2009, p. 676–677). This finding shows that using question marks in Econ RA titles is traditionally preferred, corresponding with the high percentage of FS titles (interrogative sentences) in this category.

In summary, regarding punctuation usage, RA titles are more complex than DD titles are, revealing a generic difference between the two genres. Regarding discipline, titles in hard and soft disciplines demonstrate similar complexity in terms of punctuation use; however, the latter show greater diversity. The colon, hyphen, and comma outweigh other punctuation in frequency, which could be explained by the title structure variable.

Structure

Figure 1 shows the occurrence of the five categories of structures found in the titles analyzed, namely, NP, CP, FS, VP, and PP. The preference for title types is clear: overall, NP and CP are the two most preferred structures in all disciplines and genres analyzed, while PP is the least common type. The Pearson chi-square test results showed that there was no significant difference in the use of NP titles between soft and hard disciplines (Pearson chi-square = 0.643, p = 0.422, p > 0.05). In contrast, a significant difference existed in the use of CP titles (Pearson chi-square = 10.726, p = 0.001, p < 0.05). Regarding genre, we detected significant differences between RA and DD titles in the use of CP (Pearson chi-square = 86.429, p = 0.000, p < 0.05) and NP (Pearson chi-square = 79.629, p = 0.000, p < 0.05).

Fig. 1
figure 1

Title structures across disciplines and genres

In line with the findings of previous studies (e.g., Soler, 2007; Wang & Bai, 2007; Xie, 2020), our results show a stronger preference for NP titles in the two genres across the four disciplines, except in Ling and Med RAs. NP titles, as single-format titles, are unadorned and explicit, as they directly and straightforwardly present the information of the study to the community (Hyland & Zou, 2022). They have the “powerful ability to compact information in an economical way through various pre- and post-modifiers” (Wang & Bai, 2007, p. 395). The prevalence of NP titles observed in this study suggests that the authors of both RAs and DDs favor the linear construction of NP titles that is widely accepted in each community.

CP titles account for a large proportion of Med, Ling, and Econ RA titles. CP titles are divided by punctuation marks into more than two parts with specific or logical relationships to allow readers to obtain more information (Li & Xu, 2019). Our observation from Ling and Econ RA titles underpins the previous statement that CP titles are “extremely commonplace across the soft knowledge disciplines” (Hyland & Zou, 2022, p. 7). However, Med RA authors also show this preference for NP titles, which contradicts Wang and Bai’s (2007) finding that 99.0% of Med RAs have NP titles. One explanation for this difference is that the titles of the aforementioned groups were collected only from the New England Journal of Medicine, and the journal’s requirements for RA titles are distinct from those of other journals. Another interesting finding is that most Med RA titles were CP (60.0%) or NP (39.5%), and only one title was VP; thus, Med RA titles demonstrated the lowest structural variation. Relating this finding to the previous discussion of title length and punctuation in Med RAs, it seems justifiable to conclude that although only representing the fewest title types, Med RA titles have the highest complexity and informativeness in the corpus since CP titles are common and the titles include more punctuation and are longest on average.

Overall, a relatively small proportion of titles are FS; however, according to our counts, such titles demonstrate a relatively greater percentage of occurrence in Econ and CS RAs than in other disciplines and genres. Since economics is a social science field, this observation contradicts the findings of previous studies to a certain degree because it has been claimed that the FS structure is specific to titles in the natural sciences (Berkenkotter & Huckin, 1995; Li & Xu, 2019; Soler, 2007). FS titles accounted for 8.5% of the Econ RA titles, ranking third among all title types, just behind CP and NP, and all in the corpus were interrogative. For example,

Are CEOs' purchases more profitable than they appear? (Econ RA)

Was the post-1870 fertility transition a key contributor to growth in the West in the twentieth century? (Econ RA)

This finding also explains why the question mark usage frequency was highest in the titles of Econ RAs among all the topics studied. Hyland (2004) stated that social sciences, which encompass the study of human subjects, rely on qualitative analyses or statistical probabilities to construct and represent knowledge. Consequently, comprehensive exposition and a cautious approach in articulating assertions need to be used.

In summary, DDs in all four disciplines show a surprising frequency of NP titles, while for RA titles, authors switch their choices between NP and CP. These findings are somewhat consistent with those of previous studies on RA titles (Haggan, 2004; Soler, 2007, 2011; Wang & Bai, 2007; Xie, 2020), which indicated that NP is the most common title type across disciplines, genres, and cultures. The exception observed in this study, based on our analysis, is that the presence of CP titles can be attributed to disciplinary and generic characteristics, as they are predominantly found in Ling and Med RAs, thereby confirming the adherence to disciplinary conventions in constructing titles for RAs.

Content information

The five categories of content information in the titles are unevenly distributed across all four disciplines and two genres (see Table 7), revealing disciplinary and generic variations. Noticeably, Topic Only titles are the most frequent in both Econ and CS. This consistency is understandable in that the specific norms of the two disciplines lead authors to prefer titles that do not mention detailed research information but instead provide descriptive information about the topic, yielding concise and implicit titles. Topic-only titles are less informative (Sahragard & Meihami, 2016), are typically shorter and represent single structures (NP, VP, PP, and FS). Now, we can speculate on the interrelationship among the three parameters, namely, length, structure, and content information: Econ and CS titles with more Topic Only types are shorter on average, and the incidence of CP structures is lower (except for among Econ RAs).

Table 7 Content information by discipline and genre

The Methods/design category is most prevalent among Med RAs, accounting for 54.5% of the total. This category is less frequently used in other disciplines and genres. This finding is inconsistent with previous findings regarding Med RA titles. Goodman et al. (2001) found Topic Only titles to be the most common (40%) among Med RAs and indicated that Methods/design titles were less common (30%). These two studies collected titles from the same four medical journals but during different periods, thus presenting conflicting results. This lack of conformity could be explained by changes in journal policies and editorial boards (Sahragard & Meihami, 2016; Whissell, 2013). Regarding Med DD titles, among which content information types are distributed more evenly, a lower proportion mentioned Methods/design. Thus, Med RA titles indicate a clear difference between the two genres. One reason for the heterogeneity between genres is that authors in the medical field write their titles strictly in conformity with specific journal policies to make their papers publishable and allow readers to quickly discover their work after publication (Whissell, 2013).

The Dataset category shows uniqueness and importance among Ling RAs and DDs, as 68.5% and 69.0%, respectively, of these titles include data, indicating a sharp difference from other disciplines. Mentioning research data could provide more detailed information, e.g., in linguistics, the language data investigated and/or the research participants (Sahragard & Meihami, 2016), are included in this category. Titles containing data information are prevalent among Ling RAs and DDs; the following are examples:

Exploring Social and Linguistic Diversity across African Americans from Rochester, New York (Ling DD)

Referential cohesion in Swedish preschool children's narratives (Ling RA)

The use of the Results category varies among the four disciplines. In soft disciplines, the numerical results are not contrastive, while discipline disparity exists within hard disciplines. More titles containing information about results are found in Med than in CS, especially in Med RAs. Notably, among the Med RAs, 90.3% of the titles with results information also contained other types of information, supporting the previous statement regarding the preference for more informative and complicated titles among Med RA authors. Examples are shown below:

Effectiveness of therapeutic heparin versus prophylactic heparin on death, mechanical ventilation, or intensive care unit admission in moderately ill patients with covid-19 admitted to hospital: RAPID randomised clinical trial (RA Med) (Result, Dataset, and Methods/design)

Association of Gestational Diabetes With Maternal Disorders of Glucose Metabolism and Childhood Adiposity (RA Med) (Result and Dataset)

Conclusion information is included in the titles across genres and disciplines least frequently, which is consistent with the findings of previous diachronic studies on applied linguistics journal titles (Li & Xu, 2019; Sahragard & Meihami, 2016), which showed that titles with conclusion information have always been uncommon. Titles containing conclusion information are more implicit than those containing Method/design and Results information (Haggan, 2004; Jalilifar, 2010). Hence, conclusion information is not frequently included in titles.

Conclusion

We examined four key features of RA and DD titles in soft and hard disciplines, namely, length, punctuation marks, structure, and content information, and the findings confirmed the disciplinary and generic variations. The title length results partially contradict previous findings that titles in hard disciplines are usually longer than those in soft disciplines (Haggan, 2004; Soler, 2007, 2011). The average length of Econ titles is the shortest in both genres, and that of Med titles is the longest. From the cross-generic perspective, the average length of RA titles was longer than that of DD titles across all disciplines selected. The punctuation mark analysis revealed that, in general, more titles in hard disciplines than in soft disciplines contain punctuation, with the exception of CS DDs. More RA titles than DD titles employ punctuation in all disciplines, suggesting the greater complexity of RA titles in punctuation use. Regarding punctuation variety, there was no noticeable difference across the four disciplines and two genres, except for CS DDs. Moreover, the examination of title structure is partly congruent with the findings of previous research concluding that NP titles and CP titles are the two most common types and that PP titles are the least common (Soler, 2007; Wang & Bai, 2007; Xie, 2020). The title content information analysis revealed variations across disciplines. The authors of both genres, with the exception of Med, prefer titles containing the same type of content information.

It is challenging to provide an explanation for these variances across disciplines. Hyland and Zou (2022) contend that it is crucial for scientists in the hard disciplines to differentiate their research from that of others. They operate within a highly competitive research landscape with substantial investments that necessitates rapid publication to establish early precedence. Furthermore, their focus lies within a distinct and easily identifiable field of study. However, social scientists’ studies often encompass diverse and interdisciplinary aspects, thereby posing difficulties in identifying a cohesive target audience. Therefore, RA titles in the soft disciplines need to be more captivating and precise, aiming to capture readership for the article rather than assuming it. Nieuwenhuis (2023) discovered a greater prevalence of the same creative components in the titles of soft science RAs than in those of hard science RAs, which is consistent with the assertion made by Hyland and Zou (2022). Our findings support this assertion, as the use of question marks in Econ RA titles appears to be more frequent compared to other disciplines. Questions employed in titles can be perceived as attention-grabbing rhetorical elements (Chen & Liu, 2023), serving as a marketing tool to pique readers’ interest (Ball, 2009; Nieuwenhuis, 2023).

The present research findings provide empirical evidence supporting the notion that RAs and DDs are distinct genres with their own genre-specific features (El-Dakhs, 2018; Kawase, 2015; Koutsantoni, 2006), even within the smallest subgenre, the title. This can inform both pedagogical and academic implications. First, the results of this study can be applied to research writing or academic writing classes or guidebooks to help teachers educate students on composing titles for RAs or DDs by instructing them on various classifications of titles (Sahragard & Meihami, 2016). Specifically, titles are generated not from an infinite range of options but rather from a relatively limited subset that reflects authors’ understanding of their respective communities (Hyland & Zou, 2022). ESP instructors should also encourage students in these four disciplines to employ the appropriate NP or CP titles that are commonly utilized within their respective fields while incorporating frequently observed information into the titles. Furthermore, the findings of the current research could offer valuable insights for authors aspiring to publish internationally across the four disciplines under investigation, particularly within the sixteen selected journals examined in this study (Xiang & Li, 2020). Additionally, this study also offers ESP/EAP researchers and practitioners some inspiration for title research, contributes to the literature on titleology, and supports the statement that titles are “serious stuff” (Swales, 1990, p. 224).

Finally, the current study is not without its limitations. The syntactic structures and information types of the titles were manually analyzed. Despite the author’s consistent adherence to established procedures, the classification may still entail a certain degree of subjectivity. Moreover, we recognize that this was a relatively small-scale study, and much more must be done in this area. Further studies may consider academic titles from additional disciplines, e.g., creative disciplines and emerging disciplines, and titles of underresearched academic genres, such as academic blog posts and short videos.