1 Introduction

Text analysis has been applied so far in many research approaches, aiming at identifying mental health illnesses. Such type of analysis has been used to predict various psychological states (Pennebaker and King 1999). Providing the ability to detect such psychological states in early stages, through the means of Natural Language Processing (NLP) techniques is of vital importance, before they lead to an unfortunate conclusion for the individual.

Furthermore, previous research has focused on detecting suicidal tendencies through the writer’s language use, as suicide is a leading cause of death all over the world (Bertolote and Fleischmann 2012). More specifically, there have been multiple recent attempts focusing on suicide notes.

1.1 Literature review

Pestian et al. (2010) concluded that NLP could be used to differentiate between genuine and elicited suicide notes, by achieving a higher classification rate than mental health professionals. Many have tried to tackle the problem of emotion detection in suicide notes utilizing various methods, including entropy classification (Wicentowski and Sydes 2012) and latent sequence models (Cherry et al. 2012) in the pursuit of better understanding the suicidal mind.

Additionally, through the widespread use of the Internet emerged the phenomenon of micro blogging and social media. This has resulted in an increasingly large proportion of people who use these means daily to express their feelings and opinions. Thus, there has been major interest in examining whether these can be useful in predicting suicidal ideation. Zhang et al. (2015) deduced that such a task is realizable using the Chinese version of the Linguistic Inquiry and Word Count (LIWC), as well as Latent Dirichlet Allocation (LDA), on Chinese micro-blog users’ data. Litvinova et al. (2017) developed a mathematical model to predict suicidal tendencies based on Russian Internet texts, employing numerical rather than linguistic features. Burnap et al. (2015) aimed to identify suicide related topics from posts on Twitter by training baseline classifiers, then improving upon them with an ensemble classifier.

It has been proven that suicide rates among artists are significantly higher than rates pertaining to the general population (Jamison 1997; Raeburn 1999; Stack 1997). Lightman et al. (2007) deployed Coh-Metrix and LIWC to contrast textual features of suicidal and non-suicidal songwriters. Mulholland and Quinn (2013) composed a corpus (development, training and test set) consisting of songs from various English lyricists. This corpus was then used to derive lexical, syntactic, semantic class and n-gram features. Then these were input into the Waikato Environment for Knowledge Analysis (Weka) to compare the performance of multiple machine learning algorithms in classifying whether or not the song was written by a suicidal lyricist. SimpleCart was the algorithm with the highest classification rate, with an overall accuracy of 70.6%. Pająk and Trzebiński (2014) used LIWC to analyze Polish poems from six separate poets. Afterwards, ANOVA and logistic regression were used to extract the most prevalent features in identifying suicidal predisposition. They drew conclusions similar to those of previous research.

Perhaps one of the most influential contributions in the text analysis of poets is that of Stirman and Pennebaker (2001), considering their results have been the basis of many different studies, such as Lightman et al. (2007); Mulholland and Quinn (2013); Pająk and Trzebiński (2014). Their methodology consisted of the collection of 300 poems and the study, using LIWC, of linguistic characteristics that could be distinguished between suicidal and non-suicidal poets. Furthermore, they investigated how such characteristics accord with the two most dominant suicide models, namely Durkheim’s model (Durkheim 2005), where suicide rate is linked to a society’s integration level, and the hopelessness model (Petrie and Brook 1992), where an individual is overcome with negative emotions, such as hopelessness and helplessness, which ultimately drive them to suicide. These characteristics were then examined as to how they varied in different stages of the poets’ careers. They concluded that suicide can be predicted by the language use, finding stronger support for the social integration suicide model, and that there’s no significant variation over time.

As a basis for comparison, the results of previous studies are presented in this paragraph. Pestian et al. (2010) managed to accurately distinguish between elicited and genuine suicide notes 78% of the time. Litvinova et al. (2017) reached a 71.5% classification rate of deciding whether Russian internet texts were suicidal or not. Mulholland and Quinn (2013) achieved a performance of 70.6% in identifying suicidal tendencies of songwriters through their lyrics. Coppersmith et al. (2018) used Deep Learning to detect suicide risk on social media data, achieving 0.70–0.85 true positive (TP) rate. Nobles et al. (2018) presented a DNN architecture aiming at suicidality identification among young adults using text messages and achieved 70% accuracy and 81% recall. Du et al. (2018) used Convolutional Neural Networks (CNN) on Twitter data, for the detection of psychiatric stressors, as one major cause of suicide, achieving 74% accuracy, 78% precision and F-1 measure of 83%. Shing et al. (2018) created a dataset for studying the assessment of suicide risk via online postings in Reddit and used CNN at F-1 score of 0.42 on their baseline experimentation. Morales et al. (2019) used Deep Learning for suicide risk assessment of social media posts, achieving 52% accuracy and F-1 score of 0.57.

1.2 Aims and scope

The purpose of this study is to train a machine learning model, in order to classify a poem by the poet’s tendency for suicide, based on textual and semantic features. In contrast to previous research, the study focuses on Greek poetry of the twentieth century and aims to examine whether preceding results, derived mostly from English language works, can be verified. These include higher use of the first-person active and passive verbs, compared to the second and third person use of such verbs, more frequent use of death and sexual-related words, overall positive emotion words (Stirman and Pennebaker 2001). Language analysis has been used as a tool to understand emotion and thoughts of a person. Tausczik and Pennebaker (2010) have proposed a computerized text analysis method using LIWC program aiming at the detection of the words which provide information about the emotional state and the motivation of a writer. Shing et al. (2018) identified thoughts and feelings (such as lack of hope, sense of agitation or impulsivity, mixed depressive state) as risk factors for suicide. They used empath and depression-based lexical categories and emotion features (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust), as well as syntactic features (proportion of active and passive verbs) in order to predict suicide risk. Morales et al. (2019) used personality features based on Big Five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism—OCEAN) and tone features including emotion (anger, disgust, fear, joy, sadness) and social (openness, conscientiousness, extroversion, agreeableness, emotional range vs neuroticism) aiming at the assessment of suicide risk with Deep Learning systems.

Based on these studies we annotated the verses containing emotional words or words revealing personality traits and used them as features for suicide prediction in Greek poetry (Goldberg 1990). Overall, our methodology differs from previous efforts, given the lack of available tools for Greek, which resulted in the investigation of verb suffixes as features, as well as measuring in a novel manner the emotion of a poem’s lyrics set in the range of [0,2], in hopes of increasing the overall reliability of the corpus’ annotation. The algorithms used were for the most part examined in previous research, and have been shown to perform relatively well in these kinds of tasks.

The work described herein is an extension of earlier work (Zervopoulos et al. 2019). In detail, the novel aspects of the current work compared to the previous version include

  • the use of deep learning structures, i.e. DNN, in contrast to the shallow learning schemata employed earlier, leading to a significant increase in predictive performance

  • the extension of the feature set by including additional morphosyntactic and semantic features based on the writers' emotions and the Big Five Personality Trait model, enabling thus comparative evaluation of the predictive power of the various feature groups and

  • the application of feature selection filters, for identifying the most appropriate feature subset for suicide prediction in Greek poetry.

The main purpose of this study was to apply NLP methods and machine learning techniques to Greek language text. Greek language poses certain difficulties, as it is a low resourced language, with lack of NLP tools. Applying sophisticated approaches (DNNs), as well as comparison between morphosyntactic (suffixes) and semantic (Big-5 personality trait model) features were also major challenges. In particular, these methods were applied aiming at suicide tendency detection in Greek poetry. To the authors' knowledge, this is the first time such a study is applied to Greek language poetry.

The rest of this paper is organized as follows. Section 2 describes the collected data the process and the criteria for gathering it, as well as it presents the feature vector that represents each poem, and why the respective values were selected. In Sect. 3, the classification experiments are described in detail, including feature selection, the algorithms used and their results. Section 4 illustrates some of the difficulties faced during the process of classifying a poet’s suicidal ideation by their poem, especially in a language with a scarcity of processing tools. Finally, Sect. 5 presents future improvements that could be made and a conclusion of what was accomplished in this study.

2 Resources and proposed methodology

2.1 Data collection

A corpus of 90 poems was constructed, consisting of poems from 7 poets who committed suicide and 6 who did not. The number of poems is equally distributed between the two groups. The number of poems from each poet ranges from 5 to 9. The vast majority of poets are male (with the exception of two female poets in the suicidal group), who lived in approximately the same period of time, i.e. in the early to mid 1900s. The rationale behind this requirement was for the poets to belong to the same phase of the Greek language history, as the writing style of Greek changed significantly during the second half of the twentieth century. Specifically, Katharevousa was abolished as the official Greek language in 1976 and was replaced by Modern Greek.

It was of significant importance that the poets in the suicide group undoubtedly took their own life. Similarly, special attention was paid so that poets in the non-suicide group did not have a history of self-harm or suicide attempts (Table 1). The size of the poems varies between 80 and 300 words, with an average size of 155.35. For each poet, all poems were randomly selected, with no distinction as to the nature of their content, e.g. a poem particularly high in negative emotion was not specifically selected for a poet who committed suicide. The above criteria, combined with the overall low suicide rates of Greek poets, significantly restricted the size of the corpus.

Table 1 Composition of corpus

A Python script, implemented by the authors of this paper, was used in order to remove all the punctuation marks and accentuation from the poems. This script checks each character of a text whether it is accented (has a stress mark) or it is a punctuation mark. Accented characters are replaced (with the corresponding non-accented character), while punctuation marks are removed. For example the string: "ν᾿ ἀφήσουν ἄδεια σώματα κεῖ ποὺ οἱ ψυχὲς δὲν ἄντεχαν ἐκεῖ ποὺ ὁ νοῦς δὲν πρόφταινε καὶ λύγιζαν τὰ γόνατα." was converted to: "ν αφησουν αδεια σωματα κει που οι ψυχες δεν αντεχαν εκει που ο νους δεν προφταινε και λυγιζαν τα γονατα".

2.2 Feature description

2.2.1 Initial feature set

In the initial feature set (Zervopoulos et al. 2019) every poem was represented as a feature-value learning vector. Some features were based on the work of Kao and Jurafsky (2012), as well as Stirman and Pennebaker (2001), whereas others were novel, which were mostly applicable to the Greek language. A large number of features selected represented sums of occurrence of linguistic characteristics of interest throughout the entire poem, which were normalized by the poem’s length. These sum values for each poem were then normalized across the entire corpus using the well known normalization transformation of Eq. 1, where v is the initial feature value, vmin and vmax are the minimum and maximum values of the feature across all poems, and vnorm is the resulting normalized value

$$v_{norm} = \frac{{v - v_{\min } }}{{v_{\max } - v_{\min } }}.$$
(1)

Overall, the number of features in the initial feature set that were explored reached 37 in total.

2.2.1.1 Vocabulary features

The vocabulary feature that was chosen was the type to token ratio (TTR), which is used as an indicator of the poem’s richness in vocabulary (Kao and Jurafsky 2012).

2.2.1.2 Morphosyntactic features

The social integration model suggests that individuals who commit suicide have failed to integrate with society, so they are expected to be more self-centered, which seems to be manifested through the use of more first-person singular words, as opposed to first-person plural words (Stirman and Pennebaker 2001). Based on that observation, two morphosyntactic features were selected: the count of occurrences of (i) first-person singular and (ii) first-person plural verbs (Zervopoulos et al. 2019).

Due to the lack of reliable morphological analysis tools for Greek, especially for the historical phase of the language targeted herein, these counts were obtained using a set of predefined verb suffixes, that constitute person and number morphemes. These bases were also used as features, to examine whether a subset of them were more prevalent in the process of the classification. All in all, these constituted 32 of the 37 features that were examined, 30 of which were the aforementioned suffixes.

2.2.1.3 Semantic class features

Suicidal poets are expected to deal more with negative emotion than with positive feelings, according to the hopelessness suicide model (Petrie and Brook 1992). To test this, each verse was assigned a number ranging from 0 to 2 for the positive emotion it expressed, and 0 to 2 for the negative emotion. The range is overall lower than ones used in previous studies, as [− 5, 5] has been used for example in Nielsen (2011). This was done to increase the overall reliability of the manual annotation. These numbers were then summed up for all verses of a poem, resulting in two features: one sum reflecting the positive, and one reflecting the negative emotion.

Sexual and death-related references were also included as features, as they are considered to contribute to identifying suicidal tendencies (Stirman and Pennebaker 2001). For these features, the number of verses for each poem that were deemed to contain sexual or death references were summed up for each poem.

2.2.2 Extended feature set

In addition to the earlier work (Zervopoulos et al. 2019), 21 new attributes were used: 4 morphosyntactic features and 17 semantic class features. The morphosyntactic features were: (1) 1st person active verbs, (2) 2nd or 3rd person active verbs, (3) 1st person passive verbs and (4) 2nd or 3rd person passive verbs. These features were used in previous research aiming at the detection of suicidal tendencies (Shing et al. 2018). These features were represented in sums of occurrence throughout the entire poem, which were normalized by the poem’s total number of verbs. These sum values for each poem were then normalized across the entire corpus using the normalization transformation of Eq. 1 described above.

Furthermore, 17 semantic features regarding poets' feelings and personality traits were included. Some of these features (extraversion-introversion-optimism–pessimism-trust-denial- instability-joy-anger-fear-desperation) refer to the Big Five (OCEAN) personality traits model (Goldberg 1990; Morales et al. 2019), while others refer to the emotional state range of the poets (escapism-sadness-disgust-complaint-love-hate). Extraversion and introversion provide information regarding the openness of the poets, while trust and denial refer to their social attitude. Optimism and pessimism, along with emotional instability (joy-sadness, love-hate), fear, anger, complaint, disgust and desperation are emotional marks of a lack of hope for things to get better, which is associated to suicidal tendencies (Morales et. al. 2019; Zirikly et al. 2019). Escapism is used as a means of relieve persistent feelings and it is associated with depression (Cronkite et al. 1998). Selection of these emotion features aimed at the determination of the poets' psychological state. Poets' feelings revealing lack of hope, depressive sate, isolation could provide tone measures for suicide risk assessment (Morales et al. 2019; Shing et al. 2018; Zirikly et al. 2019). Annotation examples are presented in Table 2. Features were represented in sums of occurrence in each verse throughout the entire poem, which were normalized by the poem’s total number of verses. These sum values for each poem were then normalized across the entire corpus using the normalization transformation of Eq. 1 described above.

Table 2 Annotation examples

Each poem was annotated with the aforementioned semantic information by two Greek language native speakers. When a verse led to contrasting annotators’ decisions, the feature value was decided upon by the majority vote of the authors of the present work. All features used in this research are presented in Table 3. Morphosyntactic features are language-dependent related to the specific syntax of Greek language. Semantic features are language-independent, not related to the morphosyntactic features.

Table 3 List of features by type

Various experiments were run using the features described in Table 2. Feature selection was applied aiming at results optimization. As a result, 7 feature sets were created (Table 4).

Table 4 Description of feature sets

2.3 Model implementation

Each poem was represented using the aforementioned features. The RapidMiner Studio data science software platform (Hofmann and Klinkenberg 2013) was used for running prediction experiments pertaining to poets’ suicidal tendencies. Since the corpus was not divided into different sets for training and testing, k-fold cross validation was employed for evaluating the learning schemata. A variety of tree and rule-based algorithms were tested in our previous work (Zervopoulos et al. 2019). In this extended work, the Deep Learning and the Generalized linear model (GLM) algorithms, both using the implementation of H2O 3.8.2.6 (Candel et al. 2018), were compared as to their performance at accomplishing this task. These are well-established algorithms in dealing with such tasks: GLM was used in Passos et al. (2016), while Deep Learning was used in Coppersmith et al. (2018); Du et al. (2018); Nobles et al. (2018); Shing et al. (2018); Zirikly et al. (2019). The parameters used, which achieved the best results when running these tests, are described below.

Deep Learning H2O is based on a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network contained two (2) hidden layers consisting of 100 and 50 neurons respectively with the Exprectifier (Exponential Rectifier Linear Unit) activation function. Advanced features, such as adaptive learning rate, rate annealing, momentum training, dropout and L1 or L2 regularization enable high predictive accuracy. Each computed node trains a copy of the global model parameters on its local data with multi-threading (asynchronously), and contributes periodically to the global model via model averaging across the network. The operator initiates a 1-node local H2O cluster and runs the algorithm on it. Although it uses one node, the execution is parallel.

The Deep Learning H2O-1 algorithm was run with the following settings: Leave one out cross validation with 2 epochs and the adaptive rate option activated. The implemented adaptive learning rate algorithm (ADADELTA) automatically combines the benefits of learning rate annealing and momentum training to avoid slow convergence. The specification of only two parameters (rho and epsilon) simplifies hyper parameter search. Values used in our experiments were rho = 0.99 and epsilon = 1.0E−8, as also used in previous research (Kamminga et al. 2017; Kim et al. 2018; Achmad et al. 2019). The standardization mode was set to automatically standardize the data, using the cross entropy loss function and the Bernouli distribution function. Deep Learning H2O-2 had the same settings as mentioned above with tenfold cross validation and stratified sampling, containing 2 hidden layers with 200 neurons each. The distribution function for the training data was set to multinomial with Quadratic loss function and 20 epochs.

Deep Learning-1 algorithm used tenfold cross validation with stratified sampling and use of a local random seed. Cross entropy was the selected loss function with Adam updater, having 100 epochs. The Xavier uniform weight initialization was selected with a stochastic gradient descent optimization method. The network type consisted of a simple neural net with its input dimension set equal to the amount of the initial features, having 10 epochs per log. Neural network also included 3 hidden fully connected layers with 256, 128 and 2 neurons respectively, using the ReLU (Rectified Linear Unit) activation function. Regarding the output layer, the number of neurons was set to 2, because we had a binary classification task. As for the first two layers, the number of neurons was selected according to the trial and error rule (Sheela and Deepa 2013). Deep Learning-2 had the same settings as Deep Learning-1, with a different optimization method (Line Gradient Descent).

GLMs are an extension of traditional linear models. This algorithm fits generalized linear models to the data by maximizing the log-likelihood. The elastic net penalty is used for parameter regularization. The model fitting computation is parallel, extremely fast, and scales extremely well for models with a limited number of predictors with non-zero coefficients. The operator initiates a 1-node local H2O cluster, and runs the algorithm on it. Although it uses one node, the execution is parallel.

The GLM-1 algorithm was used with the following settings: tenfold cross validation with stratified sampling, use of a local random seed and L_BFGS solver. Standardize and use of regularization with lamda search were also selected. GLM-2 had the same settings as above mentioned with 20-fold cross validation. Despite 20-fold cross validation might be indeed too much for such a case, we used this setting in order to present comparative results with the other algorithms.

3 Results

At the beginning of the experiments the new algorithms were applied on the dataset (referred to as Dataset 1 hereinafter) described in Sect. 2.2.1. The results are presented in Table 5.

Table 5 Results on the initial dataset

In our previous work (Zervopoulos et al. 2019), C4.5 was the algorithm that achieved the highest classification rate, reaching 84.5% (F:0.818). Deep Learning-1 reached at 82.2% (F:0.830), while GLM-1 outperformed them reaching at 85.56% (F:0.854).

As it has also been mentioned in (Zervopoulos et al. 2019), suffixes did not seem to form a clear pattern for suicidal tendency recognition. In order to confirm this assessment, we applied the algorithms on feature set A and feature B separately (Table 6).

Table 6 Results for feature sets A and B

Our previous assessment is confirmed, as the results from feature set B (suffixes) are very low, while the results for feature set A are remarkably higher (Table 6).

The algorithm that eventually achieved the highest classification rate was Deep Learning H2O-2 with feature set E, reaching 84.4% (F:0.854). Various statistics are presented in Table 7 and Fig. 1 detailing some of the tests done.

Table 7 Performance for both classes for each algorithm
Fig. 1
figure 1

F-1 scores per feature set for each algorithm

The resulting models indicate that the overall classification result is largely based on the semantic type features.

Deep Learning classifiers perform fairly well and result in simple models, which is in part attributed to the manual nature of the annotation process, as a result of the lack of reliable processing tools for the Greek language, as well as to the nature of the semantic features, which are of high-level linguistic knowledge. They have proven to perform fairly well in similar research conducted in the past as well. Our results are significantly better than previous research (mentioned above in Sect. 1.1), as our best algorithm achieved 84.4% accuracy and F-1 score of 0.854. DNN architecture contributes in better predictive results. Compared to previous work (Zervopoulos et al. 2019), recent results are also improved. Despite the fact that the accuracy score is almost the same (84.5%-84.4%), the F-1 score is improved (0.841–0.854).

4 Discussion

The main motivation for our study was to apply suicide tendency detection in Greek language text. Tackling the task of identifying suicidal tendencies in poetry, particularly for a language where no previous research has been conducted, is certain to pose many difficulties. Construction of the corpus was largely difficult due to the strict criteria described in Sect. 2.1. It was hard to determine whether someone did not at least attempt to commit suicide in their lifetime, as it may not have necessarily become known to people outside their close social circle. Furthermore, in the case where it was ambiguous whether the cause of death was actually suicide, as is the case with Maria Polydouri, the poet was left out, which further shortened the number of candidate poems. Perhaps, one solution to this would have been to focus more on the few poets that have written a lot of poems. This does, however, introduce the risk of the classifier learning the patterns of those particular poets and being unable to accurately classify poems by others.

Additionally, and perhaps most importantly, the process of annotating the corpus was especially demanding, since the most prevalent tools used in previous research were not available for Greek. Such tools include LIWC, the UAM CorpusTool (O’Donnell 2008), various word-lists, such as AFINN and Affective Norms for English Words (ANEW) (Nielsen 2011), used in sentiment analysis tasks, as well as corpora used in previous studies, which could have been used for comparison. This resulted in the manual annotation of a significant portion of the selected features. It was also a major restriction in the selection of the features, as the more sophisticated a feature is, the more rigorous the process of identifying it in the text needs to be. Not adhering to the appropriate level of rigorousness introduces further potential bias into the data, and given the annotators’ lack of a professional literary background, this would have been an increasingly precarious task.

Experiments confirmed that the suffixes do not form some clear pattern which could help identify suicidal ideation. Even in the case of the combination with the semantic class features, suffixes affect negatively the predictive results. Semantic class (E and F) feature sets perform better, regardless of the type of the algorithms. In contrast, semantic type features combined with Deep Neural Networks offer the ability for suicidal prediction in poetry. Positive and negative emotion words and TTR contribute in the improvement of the results. On the other hand, 1st person singular and 1st person plural words cannot be considered as essential. Introducing additional features, regarding the range of the emotional state and the personality traits of the poets proved to offer classification improvement.

In conclusion, semantic type (language-independent) features perform better, compared to the language type (language dependent) features. Consequently, the suggested model is language-independent. It can be also be applied to other languages, disregarding their specific syntax. For the same reason, it does not depend either on the generation of the poets, or on their writing style.

Algorithms tested achieved high accuracy and strong precision and recall, especially for the TRUE (suicide) class which is the sensitive one. The use of new attributes regarding the emotional state and the personality traits of the poets', based on the Big Five (OCEAN) model, contributed to the creation of a more robust prediction model. This is in line with the work of (Coppersmith et al. 2018). Results also confirm that deep learning models can outperform more traditional machine learning systems for suicide risk assessment (Morales et al. 2019).

5 Conclusions and implications

In this paper we extend our previous work on suicide prediction in Greek poetry (Zervopoulos et al. 2019) by using (i) DNN (ii) additional morphosyntactic and semantic features based on writers' emotions and the Big Five Personality Traits model and (siii) applying feature selection, for accurate suicide tendency prediction in Greek poetry.

In conclusion, a corpus of poems was composed to identify suicidal ideation in Greek poetry of the twentieth century. This proved to be a challenging task, since there has not been any previous work done on the language that was selected. Nonetheless, the resulting classifier, Deep Learning H2O-2, reached an accuracy of 84.5% (F:0.854). Results, compared to previous research have been improved. The features explored were mainly semantic in nature, while also utilizing morphosyntactic features, such as verb suffixes which are language specific and have not been investigated before.

As suicide rates among artists are significantly higher than rates pertaining to the general population, the results of this research cannot be applied to the general population and they cannot be used in psychology in general. They are suitable for poets. Nevertheless, the results are overall promising and the features selected are easily portable across different strategies, which should allow future studies to confirm how successful our methodology was.

The construction of a properly annotated corpus proved to be the most challenging part of this task. Therefore, the development of such a corpus would be worth looking into, as it would greatly aid future efforts in studying NLP related topics for Greek texts. It is of critical importance, as has been showcased by the attempts in this study, to pay special attention when manually annotating data, and to adhere to best practices which have been studied extensively before, for example by Wiebe, Wilson and Cardie (2005).

There’s also a scarcity of available tools, which are often required in NLP related tasks, due to the complex structure of the Greek language. For example, tools for lemmatization and stemming are somewhat difficult to implement, partly due to punctuation and grammar rules. This indicates it would be meaningful to spend time developing such tools, before further progress into more intensive NLP tasks is made. Considering the above, any future advancements in the area will be interesting to behold, after this first attempt has been made, to keep track of how such difficulties are handled.