Keywords

1 Introduction

Medical records means all the documents containing data, assessments and information of any kind on a patient’s situation and clinical development throughout the care process. They include the identification of all the professionals who have intervened and aim to obtain the fullest possible integration on the clinical documentation of every patient, at least in the area of each center [1]. The current legislation allows processing the clinical information in electronic support, being the only exception the informed consent [1]. Since the introduction of the Electronic Health Records (EHR) in the National Health System, the health personnel have become users of video display terminals (VDT). Most of the workplaces with VDT are designed for a sedentary work. Nevertheless, in health personnel, the workplace design frequently involves VDT and working standing up; this happens, for example, with the workstations for the administration of medication or with the portable terminal, which allows working beside the bed.

The work with VDT has introduced multiple benefits, but it also implies a series of risks for the health of the workers, since it demands certain requirements of physical and mental load. It can cause ocular and visual disorders, musculoskeletal disorders or other alterations as the mental fatigue. VDT’s use during long periods of time relates to visual intense efforts [2, 3] as well as to changes in the ocular surface and in the condition of the tear film [4, 5]. It needs a good coordination of the eye movements in order to merge the images of both eyes and to obtain a suitable binocular vision [2]. In addition, during computer work the frequency of winking diminishes, increasing the evaporating of the tear film, which compromises the good condition of the ocular surface. The American Optometric Association [6] defines the computer vision syndrome (CVS) as a group of eye and vision-related problems that result from prolonged computer use. The most common symptoms associated with CVS are eyestrain, blurred vision, dry eyes and headaches, among others. In Spain, there are two instruments that recently have been developed and validated to measure this syndrome [7, 8].

CVS’s prevalence is high, presenting variations in different studies, according to the characteristics of the sample, the method and the instruments of data collection used. It is related to individual and work-related factors. The higher the exposure to VDT, the greater the prevalence of CVS. Ranashinghe et al. [9], estimate CVS’s prevalence in 67.4% in a group of 2,210 computer office workers of Sri Lanka. A lower prevalence (53%) has been observed in a study carried out with a sample of 426 office workers of Spain [10] and in a group of 476 call centers’ operators in Brazil (54.6%) [11]. According to the bibliographical review, up to the date no studies have been identified that evaluate the effects on the visual health caused by the exposure to VDT in health personnel. The most common occupations of the samples studied in the revised bibliography include office workers, bank employees, high-tech workers, graphical editors and call centers’ operators. Only two studies carried out in Turkey included a sample of workers who used computers from two hospitals. The first one [12] included secretaries, computer operators and hospital data management system users. The second one [5] did not specify whether they were health personnel or not.

The aim of the present work was to select the characteristics of the subject most relevant for the occurrence of the CVS, and then, to develop a classification model for its prediction in health personnel.

2 Methods

2.1 Case of Study

This was an observational cross-sectional epidemiological study, based on the completion of two self-administered questionnaires, on health personnel of the Monte Naranco Hospital of Oviedo (Spain). This hospital is specialized in geriatrics and palliative care and it started using the EHR in 2007.

The study included health personnel that were using VDT at work of the following occupational categories: physicians and surgeons, residents, nurses, advanced practice nurses (APNs) in training and auxiliary nurse.

Of a total of 172 workers, 151 (87.79%) took part in the study. The sample that fulfilled the inclusion and exclusion criteria was finally of 139 workers. The reasons of exclusion were: 3 for dry eye, 1 for amblyopia, 2 for non-surgically controlled cataracts, 3 for vitreous and 1 for retinal disease. Also one worker was excluded because he had been in his job for less than one year and another one for the lack of information on their seniority.

All the participants signed an informed consent form accepting their participation, in which data confidentiality was guaranteed during the entire process. The study was authorized by the Health Authority and approved both by the Research Ethics Committee of the Principality of Asturias and by the Ethics Committee of the University of Alicante, Spain (coordinating institution of the study), in accordance with the tenets of the Declaration of Helsinki.

2.2 Data Collection

The sanitary workers who took part in the study answered the following self-administered questionnaires:

  1. 1.

    Anamnesis and History of Exposure Questionnaire, which was specifically developed for this study, to gather information about of the gender, age, history of eye diseases and treatment, previous eye surgeries, job, work schedule, service or unit of work and daily VDT usage at and outside work.

  2. 2.

    The Computer Vision Syndrome Questionnaire (CVS-Q), designed and validated by Seguí et al. in 2015 [7], was used to measure perceived ocular and visual symptoms during or immediately following computer work. This questionnaire evaluates the frequency (never, occasionally or often/always) and the intensity (moderate or intense) of 16 ocular and visual symptoms: burning, itching, feeling of foreign body, tearing, excessive blinking, eye redness, eye pain, heavy eyelids, dryness, blurred vision, double vision, difficulty focusing for near vision, increased sensitivity to light, colored halos around objects, feeling that sight is worsening, and headache. Subjects with a score of 6 or more on the questionnaire are classified as symptomatic (suffering CVS).

2.3 Support Vector Machines

The support vector machines (SVM) are machine learning techniques. Among other mathematical models for similar problems [13,14,15], SVM are used to model physical systems through the adaptation on their parameters [16,17,18,19]. These methods are broadly known for their usage in classification and regression problems [20, 21]. In the case of the present research, the SVM is used as a classificator. This technique has been selected due to its well-known performance. The performance of SVM relies on the adjustment of the model to data previously taken from the system to be modelled, as training data set. For training a SVM to model a classification problem, the vectors from the training data are used to map hyperplanes that define the separation of classes. The output estimation provided by a trained SVM can be formulated as:

$$ {\hat{\text{y}}}_{i} = a^{T} \varPhi \left( {x_{i} } \right) + b $$
(1)

Where \( x_{i} \) corresponds to the input vectors from the training set. The function \( \varPhi \left( {x_{i} } \right) \) linearizes the influences between inputs and outputs. In this scenario, the parameters are \( a \) and \( b \), which are a vector of the same dimension as the image of \( \varPhi \), and a coefficient, respectively. These parameters are determined by finding an optimized solution to the following problem and with restrictions:

$$ \mathop {\hbox{min} }\nolimits_{{{\text{a}},\upvarepsilon,\upeta_{\text{i}} ,\upeta_{\text{i}}^{{\prime }} }} \frac{1}{2} a^{T} a + c \left( {\frac{1}{N} \sum\nolimits_{i = 1}^{N} {\left( {\eta_{i} + \eta_{i}^{{\prime }} } \right) + v\varepsilon } } \right) $$
(2)
$$ a^{T} \varPhi \left( {x_{i} } \right) + b - y_{i} \le \varepsilon + \eta_{i} $$
(3)
$$ y_{i} - a^{T} \varPhi \left( {x_{i} } \right) - b \le \varepsilon + \eta_{i}^{{\prime }} $$
(4)

Where \( c \) is a regularization parameter, \( \varepsilon \) is the tolerance error for each input \( x_{i} \). Both \( \eta \) and \( \eta^{{\prime }} \) are the slack variables, that take positive values. Finally, \( v \) is a parameter for the adjustment of the tolerance. Therefore, the output of SMV [22] can be expressed as:

$$ {\hat{\text{y}}}_{i} = F\left( x \right) = \sum\nolimits_{i = 1}^{N} {\left( {\beta_{i}^{{\prime }} - \beta } \right) \varPhi \left( {x_{i} } \right)^{T} \varPhi \left( x \right) + b} $$
(5)

In this expression, \( \beta \) and \( \beta^{{\prime }} \) are the Lagrange multiplayers corresponding to the restrictions above. In this context, the kernel function \( K \), can be defined as \( K\left( {x_{i} ,x_{j} } \right) = \varPhi \left( {x_{i} } \right)^{T} \varPhi \left( {x_{j} } \right) \). Consequently, the estimation of SVM turns into the following expression

$$ {\hat{\text{y}}}_{i} = \sum\nolimits_{i = 1}^{N} {\left( {\beta_{i}^{'} - \beta } \right) K\left( {x_{i} ,x} \right) + b} $$
(6)

In general, the SVM for classification can be determined with the parameters \( c \) and \( v \), since, as said before, \( a \) and \( b \) can be obtained as the optimal solution to the quadratic problem [20]. Depending on the sort of function chosen as kernel, other parameters should be determined as well.

2.4 Genetic Algorithms

The Genetic Algorithms are procedures developed to simulate the evolution of a population in terms of optimizing the survival of the next generation. These algorithms were first developed for chromosomic studies [23], but now genetic algorithms work with the premise of improving the fitness over the iterations until a solution for an optimization problem is reached. For each generation, the adjustments to the elements are based on four basic genetic operators used as criteria: crossover, mutation, reproduction and elitism [24]. The optimization problem and the iterations of the genetic algorithm can be formulated as follows [25]:

For a function \( f :D \to R \), and a set of restrictions, the minimization problem consists on the search of the best value \( x \) in the domain \( D \) such that \( f\left( x \right) \le f\left( y \right) \) for all \( y \) in the domain \( D \). The value \( x^{{\prime }} \) in the domain \( D \) is a local minimum of \( f \), when a neighborhood \( N\left( {x^{{\prime }} } \right) \) of \( x^{{\prime }} \) exist where for all \( z \) in \( N\left( {x^{{\prime }} } \right) \), \( f\left( {x^{{\prime }} } \right) \le f\left( z \right) \).

In this scenario, the genetic algorithm begins with an initial population \( P_{0} = \left\{ { I_{0}^{1} , \ldots , I_{0}^{n} } \right\} \). Each step of the algorithm, the objective function is calculated along with its correspondent performance measures, then, generates a new population of which the elements are selected with a determined rule from the four genetic operators defined above. After \( m \) steps, the population is denoted as \( P_{m} \). The algorithm stops when the performance measures are not significantly improved in a chosen number of iterations.

The aim of the proposed algorithm is to find the adequate variables and parameters with which the SVM will model properly the classification problem proposed. The iterations of the genetic algorithm are focused in maximizing the value of the AUC (Area Under Curve) of each of the SVM models calculated. The steps of the algorithm are as Fig. 1 shows.

Fig. 1.
figure 1

Algorithm diagram.

The algorithm begins with the setting of the parameters for the genetic algorithm, such as crossover, mutation, elitism and population size. After this, an initial population should be created. As it was stated by Galán et al. [25], although the setting depends strongly in the data considered, a range of optimal parameters can be determined; for the present research, the probability values applied for crossover were those from 0.5 to 1 in steps of 0.1, while the mutation probability employed values of 0.1, 0.2 and 0.3. The elitism probabilities were 0.01, 0.05, 0.1 and 0.2. Different sizes for population of 10, 25 and 50 individuals were considered.

The set of the initial population must be a vector with size of the possible variation of parameters for the SVM. We will consider different types of initial populations, based on the type of kernel. To avoid this problem, we do consider a branching of the algorithm. Concretely, we will consider four ways in parallel; in each branch, the genetic algorithm is performed with each type possible of kernel. Now, the sets of initial populations can be created. Each item of the initial population has the following form, depending on the kernel used: Linear: \( \left( {c,v, x_{1} , \ldots , x_{k} } \right) \). Polynomial: \( \left( {\gamma ,\alpha_{0} , \alpha , c, v, x_{1} , \ldots , x_{k} } \right) \). Radial basis: \( \left( {\gamma , c, v , x_{1} , \ldots , x_{k} } \right) \). Sigmoid: \( \left( {\gamma , \alpha_{0} , x_{1} , \ldots , x_{k} } \right) \). The parameters common for all the populations are: the cost \( c \) of constraints violation which corresponds with the constant of the regularization term in the Lagrange formulation of the SVM, ranging from \( 10^{ - 2} \) to \( 10^{2} \); the tolerance error \( v \) for the determination of the SVM which takes values from \( 5 \cdot 10^{ - 4} \) to \( 5 \cdot 10^{ - 3} \); variables \( x_{1} , \ldots , x_{k} \), with values 0 or 1, depending if the variable is taken into account for the SVM model or not. For the other parameters, \( \gamma \) is needed in all kernels except linear and it takes values that range from \( 1/\left( {2 \cdot data\,dimension} \right) \) to \( 1/\left( {data\,dimension} \right) \); \( \alpha_{0} \) works as coefficient for the sigmoid and polynomial kernel (from −1 to 1), and finally, \( \alpha \) for the degree determination of the polynomial kernel (from 3 up to 5). The SVM model is then estimated with the previous values for each of the branches, and its AUC with a validation subset of the data is calculated. The stop criteria will be satisfied if the AUC does not change more than a 0.01% in the last 100 iterations of the algorithm.

When the stop criteria is not satisfied, a new population has to be created. This is performed through implementing the crossover, mutation and elitism. With the elitism process, the populations were sorted based on their value of the AUC, and only the ones with higher AUC were considered. Crossover will combine sections of the chosen populations, and finally with the mutation process an aleatory modification is introduced to the population. Then the whole process is repeated until the stop criteria are satisfied.

3 Results and Discussion

3.1 Results

The chosen model was selected due to its AUC. The value of this performance metric was 0.9433029, giving a high performance over the validation data. The average AUC of the models for each iteration is shown in Fig. 2. For each iteration the AUC is computed and the average value presented in this Figure. As it can be observed, the performance of the models increases quickly in the firsts iterations, and afterwards oscillates between the iteration 400 and the 900. Then, it raises again until the performance level is high, and finally the average value remains stable, for, at least, 100 iterations and consequently the algorithm stops, determining the model chosen. This model is set with a sigmoid kernel, and parameters \( \gamma = 2,1 \cdot 10^{ - 4} \), \( \alpha_{0} = 0 \), \( c = 1 \) and \( v = 5 \cdot 10^{ - 4} \).

Fig. 2.
figure 2

Average AUC of the estimated models over the genetic algorithm iterations.

The variables chosen are: VDT usage outside work, nurse, hours a day VDT workplace, easy application, shifts including nights, current dryness, gender, current use of eye drops, contact lens and age. Also, the AUC was compared with other performance metric, the Youden index, to corroborate the robustness of the model. The comparison is performed via the correlation between both metrics, having a result of 0.8594404, which implies a high degree of correspondence in the calculated models.

3.2 Discussion and Conclusions

Workers who use the computer outside work, those who work as a nurse and those who are being exposed to VDT more hours per day at the workplace are more likely to suffer CVS. In other studies the total number of hours working on a computer in a day (both at work and outside the workplace) was found to be related to visual discomfort [26]. Also prolonged VDT use at work was connected with an increased risk of CVS [9, 10] and dry eye disease [27].

Female gender was associated with the risk of developing CVS as well. Many studies have reported an association between female gender and the prevalence of CVS [9,10,11]. In our study, age is also related with CVS; the average age of those with CVS is 45.9 years and the average age of those without CVS is 47.1. In contrast, Ranasinghe et al. [9] reported a mean age higher in those with CVS and Rosenfield [28] indicated that it is unclear whether CVS is associated with age. Portello et al. [29] observed greater computer-related visual symptoms correlated with the number of hours spent using a computer, in females and patient with dry eye; however, they did not vary significantly with age.

Those who use eye drops for dryness are more likely to suffer CVS, as well as contact lens wearers. These results are consistent with previous studies that found higher prevalence of CVS [10] and also higher prevalence of dry eye disease [30] in contact lens wearers.

Work schedule with shifts including nights was also associated with the prevalence of CVS. Previous studies [31] suggest an association between rotating night shift work and several diseases, including cardiovascular disease, cancer risk, diabetes, hypertension, chronic fatigue, sleeping problems and early spontaneous pregnancy loss.

On the other hand, the prevalence of CVS was lower among those workers who considered the software application as “easy”.

Nevertheless, these results should be interpreted with caution given the limitations of our study. The first one is that it is a cross-sectional design, and we cannot be sure that the cause precedes the effect. A second one is that we do not include ophthalmic examinations that inform us of the workers’ refractive state. Finally, we do not take into consideration the use of mobile devices at and outside work, which could be a confounding factor.

Despite these limitations, the use of a validated questionnaire to measure CVS is a particular strength in this work, and this is the first study that describes the relationship of CVS and its associated factors in health personnel by means of genetic algorithm based on support vector machines. Both machine-learning techniques has been combined. SVM is used in order to perform classifications due to its well-known performance, while genetics algorithms are employed in order to optimize the SVM parameters. Finally, we would like to remark that the main contribution of the present research is a novel hybrid methodology able to determine the most important variables in a classification problem.