Keywords

1 Introduction

Due to the web an increasing amount of user generated content is available, that contains not merely objective but subjective information as well, for instance in terms of product reviews, restaurant ratings and the like. This subjectivity can be utilized for different purposes, for example within touristic applications. Travel guides—whether as a book or as a mobile application—contain mainly factual information, such as the address of a sight, opening hours or entrance fees. If the guide is a mobile application, it is possibly adapted to the spatial situation of the user. Emotional aspects are not taken into account, although emotions and space are connected fundamentally. Locations have an atmosphere which evokes diverse and often strong emotions in people (Mody et al. 2009). Places can provide feelings of privacy, control and security; can attract by the opportunity for social events; places can be sensed as boring, attractive, calming, scary or dangerous and the loss of a place can be an emotional experience (Korpela 2002). Considering emotional aspects might be also interesting for adaptive information filtering on rating-platforms or for the location-based communication of emotions in social networks. The project Emotional Maps for Mobile Applications (EMMA) aims at establishing a basis for the integration of those emotional aspects for location-based services. Emotions can be captured as aggregated emotions, so to say as averaged emotions sensed by many people at one place, or as individual emotions. Those captured emotions shall be considered spatially (How do other people feel at the place the user is located at or nearby?) as well as regarding the emotional situation of the user (Does the user long for adventure and thrill or for a place of calm and relaxation?). This chapter focuses on an approach of extracting aggregated emotions from user generated content of the photo platforms FlickrFootnote 1 and Panoramio.Footnote 2

2 State-of-the-Art

2.1 Volunteered Geographic Information

Based on mobility and interactivity, the today’s map user captures data independently and therewith undergoes a transition from a pure data user to a producer and user united in person, i.e. to a ProdUser (Budhathoki et al. 2008). This trend is accompanied by the widespread success of social networks, online communities and rating platforms where users can exchange personal opinions and appraisals. In general these contents are called user generated content. The phenomenon that thousands of people are willing to invest time for sharing geographically referenced content in the web without any prospect of financial reward, is called volunteered geographic information (VGI) or volunteered geography by Goodchild (2007). The kind of VGI can strongly vary from photos that are geolocated by tags or a georeferencing (like for instance in the portal Flickr) up to completely user generated world maps such as OpenStreetMapFootnote 3 based on the approach of CrowdSourcing.

According to Budhathoki (2010) local knowledge is the most significant determinant for suchlike contributions. If users become aware of their possession of knowledge in consideration of faulty and incomplete mapping of a region that is interesting for them, they are encouraged to map because obviously they are able to do a job more attentive to detail and more up to date than nonlocal agencies or mapping organizations.

The data provided by those agencies and organizations are formalized, accurate and allow to describe geospace in a consistent manner. If a subjective view is requested or the way a place is sensed, these data are unsuitable. VGI provides an opportunity to access notions of locations because the content of VGI contains bias of users as individuals (Purves et al. 2011). For instance according to Rorissa (2010) Flickr tags are much richer in semantic content than index terms assigned by professionals.

2.2 Tagging

Tagging is a term for assigning keywords to content in the web with the purpose of linking, categorising and describing that content. The relation of tags to each other is not structured hierarchically but serves for grouping elements. Tags are filed as metadata and are helpful for making searched elements detectable for the user (Sjurts 2011).

Sen et al. (2006) define three general classes of tags based on the seven detailed classes of Golder and Huberman (2006). The general classes reflect the intent of use of the particular tags. These classes are summarized in Table 1. However, Sigurbjörnsson and van Zwol (2008) assigned tags to six categories: location, artefacts or objects, people or groups, actions or events, time and other.

Table 1 Tag classes

More specific is a categorization of tags by Beaudoin (2007) particularly for Flickr-images. Beaudoin distinguishes 18 categories of tags (see Table 2). The five most used categories are place name (28.21 %), compound (14.05 %), thing (11.37 %), person (8.81 %) and event (5.69 %).

Table 2 Category model for image tags found in Flickr (Beaudoin 2007)

A particular kind of tags are geotags, i.e., geospatial metadata that belong to the dimension spatial tags as named in Table 2. Geotags add geographical identification data to media and usually contain latitude and longitude coordinates though altitude, accuracy data and place names as well. With the help of geotags, users can find location-based news, websites or images taken close to a certain location. Another less common term for geotagging is geocoding which more often refers to non-coordinate based geospatial identifiers like street addresses (Miller et al. 2009). Geotagged media such as travelogues or photos can be used for extracting tourism related knowledge, e.g., for analyzing travel patterns (Girardin et al. 2008), for detecting cultural differences of certain local regions (Zheng et al. 2011) or for the automatic generation of travel routes (Choudhury et al. 2010).

While most of the tags are subject related, some tags (such as ‘cool’ or ‘fun’) are indicating a user’s emotional reaction to an object represented by the particular document, i.e., these are affective tags and consist of words describing an emotional state. The use of those affective tags shows that users may regard tagging and classification as a holistic process (Kipp 2007).

2.3 Emotions

The often cited statement of Fehr and Russell (1984) “Everyone knows what an emotion is, until asked to give a definition” represents appropriately the comprehensive result of literature research for finding such a definition. The enormous number of definitions might be an indication for the fact that the occurrence of emotions is not resolved completely even scientific emotion research is older than 100 years. Most of the definitions have in common that emotions are a subjective occurrence, i.e., an inner excitement that is more or less consciously experienced as pleasant or unpleasant and comes along with neurophysiologic processes (Kroeber-Riel et al. 2009). Another important aspect of emotions is a high ego-involvement of the individual (Jahr 2000). A consensus is reached that there are basic emotions which refer to inherent emotions like surprise, anger or joy, as well as emotion schemas which describe emotions that differ across cultures and individuals and appear only in interaction with other individuals like for instance shame, guilt feelings or pride (Izard 2009).

Related terms of emotion are sentiment, affect and feeling which are often used as synonyms but rather should be distinguished. Sentiments are enduring, less intense diffuse emotions (Jahr 2000) that are not related to certain issues and can influence cognitive processes like perception, information processing and memory. The term feeling means the experience-related aspect of an emotion, that is to say the interpretation of the conscious and subjective perceiving of an emotion (Kroeber-Riel et al. 2009). Affect in Anglo-American language is a hypernym for mental processes, emotions, sentiments and also for attitudes. More seldom affect means merely the valence of experiences in the sense of pleasure and displeasure or positive and negative (Mau 2009). Whereas in German affects are essential, transient and intense feelings of acceptance or rejection, i.e., an emotion that is cognitively barely controlled and hardly differed regarding content (Trimmel 2003; Mau 2009).

Due to the possibility that all these terms impact closely on each other, a clear demarcation is not feasible in every case. The occurrence of a certain feeling can be accompanied by sentiments and sensations (Jahr 2000).

2.4 Structuring Emotions

The definition of emotions allows a distinction between emotional and non-emotional states as well as structuring those emotional states. Approaches for structuring emotions can be distinguished into dimensional and differential (Schimmack 1999). Dimensional approaches try to reduce affective states to a few dimensions. Thus, each emotion can be described as a combination of different severities of those dimensions. Latest research manifests two approaches that show up as two- or three-dimensional models. In almost every case, one dimension describes the valence of emotions. However, there is disagreement on the nature of arousal which is regarded as one- or two-dimensional. While Russell (1980) considers one dimension of arousal in combination with the valence-dimension as sufficient for the description of emotions, other researchers feel confirmed in their assumption of two arousal-dimensions beside one valence-dimension by empirical results (Mau 2009). The two dimensions valence and arousal proposed by Russell (1980) can be described as ranging from positive/pleasing to negative/displeasing and from arousing/intense to unarousing/numbing (see Fig. 1).

Fig. 1
figure 1

The two emotional dimensions valence and arousal

In difference to dimensional approaches which try to ascribe emotions to a few global dimensions, differential approaches emphasize the distinguishable subjectively experienced qualities of emotions (Izard 1977). Emotions are structured according to complex similarities. This similarity can be defined by the spectrum of emotional qualities in fundamental emotions or by statistical methods based on subjective appraisal (Mau 2009).

EMMA works with the two-dimensional approach of Russell (1980) because of the advantages of dimensional emotional models (Mau 2009):

  • The reduction of emotional experiences to a few dimensions simplifies the measurement and quantification of emotions. Not every possible emotional quality needs to be captured but merely the estimation of experience in two or three dimensions.

  • In most cases the emotional states gathered this way are described by two or three metrically scaled variables. This simplifies the analysis.

  • Due to the reference of emotional states to a few dimensions, the interpretation of results is simplified.

Dimensional approaches claim to describe completely the space of possible emotional qualities (Russell and Mehrabian 1977) but rise to doubts give results indicating that emotions which are experienced as qualitatively very different, have similar values in the dimensions of valence and arousal (Mau 2009).

2.5 Acquisition of Emotions

The overall emotional reaction has the following components (Battacchi et al. 1996, quoted by Jahr 2000):

  • physiological reactions (cardiovascular,Footnote 4 respiratory,Footnote 5 electrodermalFootnote 6)

  • tonic posture reactions (tension and relaxation of body)

  • instrumental motoric reactions (e.g., running away because of fear)

  • expressive motoric reactions (gestures, countenance, paralinguistic events)

  • expressive linguistic reactions (syntactic and lexical selection, stylistic varieties)

  • subjective experience components (emotions as such, referencing to the feeling that everybody experiences during having emotions)

These reactions can be exploited for gathering emotions with verbal or non-verbal procedures. Requesting emotions is a verbal procedure but has the disadvantage that verbal statements on emotions are often difficult to access, not detailed enough, are not made perception-simultaneously with the reception (Egner and Agüeras-Netz 2008) and might be manipulated or filtered by the proband (Kroeber-Riel et al. 2009). Non-verbal procedures can be distinguished into physiological measurements, explicit emotion measurements (Egner and Agüeras-Netz 2008) and behavioural observations like the analysis of facial expression (Westerink et al. 2008). For explicit emotion measurements, emotional states are classified on a scale with the help of a slider but this procedure requires a certain amount of training by the proband. Physiological measurements interconnect a scalar value of emotion to a measureable physiological value, e.g., to electrodermal activity (Egner and Agüeras-Netz 2008). The advantage of physiological measurements is that the user does not have to recognize and interpret his emotions. However, interpreting a meaning from those data is ordinarily difficult, though the complex meaning extraction of behavioural observations and physiological measurements is counterbalanced by the real-time pureness of emotions (Westerink et al. 2008).

2.6 Environmental Influence on Emotion and Behaviour

In a certain way, emotional states and places can be seen as external and internal versions of one another (Gallagher 2007). The principle is simple: a good/bad environment evokes a good/bad mood triggered by good/bad memories leading to good/bad behaviour. Usually those environmental stimuli are not even sensed consciously. Especially the just mentioned memories play an important role. A dramatic example is drug addiction. The body is longing for the drug particularly in the environment where it is used to get the drug or in an environment with cues of the used one, i.e., it can be also referred to an environmental addiction. Thus, a successful drug withdrawal should involve a systematic exposure to drug-related environmental cues. Another phenomenon is that occasionally addicts take the usual fix in a strange environment and die as if they had an overdose (Siegel et al. 1982). In an experiment, photos of jungle warfare or war movies like ‘Platoon’ were shown to combat veterans of Vietnam War. By looking at those pictures, their memory recalled the high arousal they experienced in this exotic milieu and stimulates the nervous systems to produce surges of opiates which are meant to soothe temporal stress (van der Kolk et al. 1996).

Csikszentmihalyi (1990) gathered about 25,000 experience reports in 25 years by prompting persons eight times a day by beeper to write down where they are, what they are doing and how they are feeling (Csikszentmihalyi 1990, quoted by Gallagher 2007). This way Csikszentmihalyi (1990) found out that most of his subjects felt happiest in parks, cafés and other sociable and carefree places as well as for some reason they liked to be in a car. Furthermore, he detected that the two genders favour different places at home: men prefer the basement whereas women consider the bathroom to be the best but both of them like the bedroom as well. A similar study was carried out by MacKerron and Mourato (2011) for the UK. With the help of a smartphone app named MappinessFootnote 7 they asked their participants several times a day how happy they are, whom they are with, where they are and what they are doing. The app determined the precise location of the participants via GPS at the moment of answering these questions. Furthermore weather data have been used for analyzing the results. MacKerron and Mourato (2011) found out that their participants are less happy at work than at home and outdoors they are most happy while doing activities typical for natural environments like gardening or running. Coastal locations are the ones rated with most happiness. Participants were particularly happy outdoors with good weather, i.e., with sunshine, without rain and fog, with high temperature and low wind (MacKerron and Mourato 2011).

It is commonly known that nature restores humans. Kaplan and Kaplan (1989) analyzed this statement by monitoring the responses of people in an ‘outdoor challenging program’ and found that nature indeed eases so called mental fatigue, a condition of inner weariness. The most notable reasons for this recovery is detected by the Kaplans as a sense of self-discovery in nature, desire to make nature a part of future life and enthusiasm for the experience (Gallagher 2007).

Less natural is the process of urbanization which will be the most important environmental influence in the tweenty first century according to social scientists. Urban places send many stimuli changing quickly and continuously that are often very intense while in nature the majority of stimuli change gradually and periodically as well as there are not many people (Gallagher 2007). A mostly permanent stimulus of urban space is noise. Noise facilitates the outburst of aggression and if the noise comes from an uncontrollable source, physiological arousal as well as aggression increases (Bronzaft 2002; Veitch and Arkkelin 1995). Another characteristic of urban areas is a high population density and crowding. Studies have indicated that urbanites are less willing to help strangers than people in rural regions—which do not mean that they are less helpful or friendly because paying less attention to other people might be a strategy for coping with excessive stimulation. Furthermore, conditions of high social density reduce interpersonal attraction (i.e., liking another person) and increase social withdrawal. For instance students are less sociable, talkative and group oriented when they are housed in a socially dense dormitory (Veitch and Arkkelin 1995). Perhaps this phenomenon should be reduced to forced social density because a dense party with friends does usually not evoke social withdrawal.

Extreme environments evoke extreme emotions. High mountain ranges are an extreme environment as well as polar or very hot regions or even artificial environments of space flight, flying or diving. The easiest und probably most useful method for reducing aversive arousal and stress caused by such extreme environments is humour. In the transcription of astronaut communication and in some Arctic groups, a lot of humour and wit occurred. Another method is the so-called paratelic dominance which means to regard an aroused state not as fear but as excitement leading to a more certain coping behaviour (Suedfeld 1991). Positive affective states in extreme and unusual environments are courage, self-sacrifice and altruism with well-known examples: people giving scarce food to others or health professionals giving up rest or the chance to escape for helping patients (Gallagher 2007). The most salient negative affective state in extreme environments is fear. Other ones are aggression (direct or indirect) often occurring within isolated groups, and boredom after an adaptation to the extremeness leading to hypersensitiveness concerning the characteristics of coworkers and in turn to hostility (Suedfeld 1991).

Behaviour or culture may not be predefined by climate but can be affected by certain limitations set up by it (Gallagher 2007). Research shows that aggression (thus violence as well) increases by temperature and in turn decreases in blazing heat (Bell and Fusco 1986; Veitch and Arkkelin 1995). Domestic violence is significantly higher during heat waves (Bell and Fusco 1986); suicide peaks in May and June (Gallagher 2007) and in 1967 temperature rose 1–3 days before the onset of urban riots in USA and outdoor temperature was at least 27 °C (Bell and Fusco 1986; Bell and Greene 1984). Of course negative behaviour is not solely a function of ambient temperature but also of other variables such as situational factors and individual differences in heat-tolerance (Bell and Fusco 1986) as well as clothing, acclimatization, humidity or air speed (Bell and Greene 1984). For instance judges in the Near East judge impulsive crimes less strictly that were committed when the dry and hot Khamsin is blowing. Strong wind gives exposed people a feeling of loss of control and causes an increasing degree of arousal (Veitch and Arkkelin 1995).

2.7 Expression of Emotions in Language

Besides physical reaction such as facial expressions, emotions are reflected in language. Non-verbal cues like the former ones can indicate which general emotion a person is experiencing but this way typically no precise information about the specific form of an emotion is imparted. However, language makes it possible to express the richness of emotions (Valitutti et al. 2004).

For the expression of emotions in language it is useful to distinguish between the ‘production’ of expression and the ‘occurring’ expression (involuntary expression). Usually an emotional expression—as far as it occurs—is based on a communicative purpose. An emotion modifies behaviour and is expressed without remarkable purpose at all or without communicative purpose (Fiehler 1990).

There are two ways of emotional expression: primary and secondary ones, i.e., different kinds or levels of socially normalized expression. Primary and secondary expression differ from each other regarding the situations they occur in, their frequency of usage and regarding their level of conventionalization. Primary kinds of expression are common, frequently used and form the normal repertoire for expressing an emotion. If in a certain situation with appropriate social rules a primary expression is undue, a secondary expression can replace it. Secondary expressions occur especially in all forms of institutional communications. For this kind of communication, the dictate of emotional neutrality is valid which hampers the primary expression of emotions or makes it impossible. A secondary expression evolves into a primary one if the primary expression is permanently undue in certain situations because of social rules (e.g., politeness) (Fiehler 1990). For instance instead of uttering disappointed “He failed completely.” the secondary expression “He made reasonable efforts.” might be used. Nevertheless according to Dittmann (1972, quoted by Fiehler 1990) “most emotional messages are probably sent without controls, or without very little effort to control”.

In the linguistic field of language-and-emotion-research, two spheres can be distinguished: pragmatic-communicative approaches that examine empirically emotions as speech attending and influencing phenomena as well as semantic-lexically approaches that examine and describe the potential of expressive media in one or several languages. The latter investigate the emotional vocabulary of a language that is available to a linguistic community in a mental lexicon for naming emotional categories (Schwarz-Friesel 2007). Hence each culture has its own vocabulary, syntactic forms, semantics and range of pragmatic effects. Although emotions are regarded as transcultural, their characteristic and the manner they occur are too different, so that culture and language has an influence on the categorization of emotions (Jahr 2000).

Emotional aspects of linguistic meanings of utterance, that are coded by formal-grammatical factors, are not related to discrete emotions termed by lexemes like hate, envy, anger, disgust, fear, mirth, affection, appreciation, joy, love, curiosity etc. but especially to the affective appraisal of objects or issues (Fries 1996).

Table 3 summarizes formal-grammatical means of expression for emotional meanings.

Table 3 Formal-grammatical means of expressions for emotions (Fiehler 1990; Fries 1996; Schwarz-Friesel 2007)

Phonetic means of expression for emotional meaning are disregarded in this paper for the reason that EMMA only concerns written language.

2.8 Sentiment and Affect Analysis

Research areas studying the relationship between language and emotional information and dealing with their computational processing are sentiment analysis and affect analysis. Those approaches focus on text which is an important medium for extracting emotions because the majority of computer user interfaces are based on text (Valitutti et al. 2004).

Sentiment analysis originates from text mining and computer linguistics and deals less with the content analysis of a document but rather with the overall polarity of opinions and sentiments in it, usually in the sense of positive, negative and neutral sentiments (Zafarani et al. 2010). Sentiment analysis is also called opinion mining, sentiment extraction or sentiment detection. However, affect analysis considers a significantly larger number of potential emotions, such as joy, sadness, hate, excitement, fear etc. (Abbasi et al. 2008).

Sentiment and affect analysis require linguistic resources containing emotional knowledge (Valitutti et al. 2004). Possible resources are Affective Norms for English Words (ANEW; Bradley and Lang 2010), Berlin Affective Word List Reloaded (BAWL–R; Võ et al. 2009), List of Emotional Words (LEW; Francisco and Hervás 2007) or SentiWordNet (Esuli et al. 2010). ANEW is a list containing 2,476 English words with values for the three dimensions valence, arousal and dominance. Each dimension ranges from 1 to 9 (Bradley and Lang 2010). BAWL-R with 2,901 German words covers the dimensions valence (−3…+3), arousal (1…5), and imageability (1…7) and contains furthermore a set of psycholinguistic factors known to influence word perception. Visualizing the values of all words of ANEW and BAWL-R in a diagram reveals a boomerang-shaped distribution (see Fig. 2) which has been reported for many languages (Võ et al. 2009).

Fig. 2
figure 2

Distribution of ANEW and BAWL-R words in valence-arousal-space

Some of the influences of emotions on language summarized in Table 3 can be found in the just mentioned word lists (examples: see Table 4). Expressive verbs are rated with much more arousal in ANEW than inexpressive and more formal verbs. Insults and nick names are rated as similarly arousing but differ clearly in valence. Affective adjectives have more distinct values than condition or shape adjectives. Not all linguistic phenomena of Table 3 are demonstrable in ANEW or BAWL-R because those lists contain only nouns, verbs and adjectives as basic forms.

Table 4 Valence- and arousal-values for selected words from ANEW

‘Beautiful picture of an ugly place’ is a project applying sentiment analysis to Flickr-photo-comments for extracting emotions related to the photo quality and emotions related to the place where the photo was taken, both on a positive-negative-scale (Kisilevich et al. 2010). Four places in Poland (Krakow, Warsaw, Wisla and Auschwitz) and one in Germany (Dachau) were chosen for testing purposes. Using linguistic features, an own lexicon of adjectives with opinion strength has been built. The two concentration camp memorials Auschwitz and Dachau have a more negative general sentiment in contrast to the popular touristic cities Warsaw and Krakow with high positive sentiments. The neutral place Wisla lies in between these extremes.

2.9 Existing Projects Combining Cartography and Emotions

The first time emotions were gathered related to space in 2004 in the context of the project Bio Mapping (Nold 2009) with the help of a device using of GPS as well as a biometric sensor for measuring electrodermal activity. The project cooperated with artists, psychogeographers, designers, cultural scientists, futurologists and neuroscientists for investigating political, social and cultural implications of visualization of body data and emotions.

The project EmoMap (Ortag and Huang 2011) addresses emotions in combination with user generated content. EmoMap is based on the assumption that every person perceives urban space in a different way. Some places are perceived as beautiful, other places as unsafe. This perception is subjective and influenced by emotions of the particular person. The idea of EmoMap is to collect emotional spatial data in a CrowdSourcing approach and to make these data publicly available in the form of an online database (Gartner and Ortag 2011). The resulting data can be used for different purposes such as urban development and planning. Abdalla and Weiser (2011) believe that future urban planning should be oriented towards the computer game Sim City which contains a so called aura-layer with emotional information. However, EmoMap focuses on the visualization of emotional data and their utility for improving pedestrian navigation systems, i.e., EmoMap aims at adding a subjective layer for providing more satisfying navigation services. The data will be collected in situ in the study area of Vienna with the help of a mobile application asking people for their feelings regarding pleasantness/unpleasantness, stress-relaxation/excitement-boredom and environmental qualities (traffic, noise, smell, attractiveness etc.) (Klettner et al. 2012).

Another project is WiMo (Mody et al. 2009) working with a mobile application as well based on a prototypical two-dimensional emotion matrix for location-based emotion tagging. One dimension contains values from ‘comfortable’ to ‘uncomfortable’; the other one ranges from ‘Like it’ to ‘Don’t like it’. The matrix is build upon the finding that those two variables are used commonly and intuitively but are not urgently correlating.

The web applications EmographyFootnote 8 and TwittermoodFootnote 9 extract emotional information from georeferenced TwitterFootnote 10-messages and visualize them in coarse resolution. Twittermood distinguishes merely over- and below-average moods in the USA whereas Emography focuses on Ekman’s six basic feelings (happiness, sadness, fear, anger, disgust and surprise) all over the world.

The project EmBaGIS develops an innovative urban planning tool for identifying and removing spatial barriers for handicapped people (Bergner et al. 2011). Emotionally significant barriers are identified working with the ‘Empirical Three-Level-Analysis’. On the first level, velocity is measured based on the hypothesis that increasing kinetic energy indicates the impact of a spatial barrier. The second level represents EDA indicating attention and on the third level, changes of skin temperature are used as an indicator for stress.

3 Approach for an Emotional Analysis of Photo Metadata

The aim of EMMA is to develop a touristic application that considers emotions connected with touristic travel motivations and expectations. This application is supposed to contain emotional maps suggesting places that are sensed as pleasant, adventurous, relaxing etc. considering age and gender. Those suggested places can be laminar as well as point-related places (e. g., for ‘relaxing’: park vs. thermal bath). EMMA focuses on Dresden as a study region.

In the approach for gathering location-based emotional data that serve as base data, affect analysis is applied to metadata of user generated pictures of the photo platforms Flickr and Panoramio. Based on the assumption, that users tag and describe their photos differently when they liked a place than when they felt uncomfortable there, certain metadata of pictures are analyzed with the help of ANEW and BAWL-R. The metadata of those photos have been downloaded with the help of the particular API and stored in a database. The most important metadata are:

  • title

  • description (only Flickr)

  • tags

  • geographical latitude and longitude.

Title, description and tags are analyzed the following way (see Fig. 3): As an initial step, all non-characters as well as hyperlinks and the like are removed from title, description and tags of each photo. The Language Detection LibraryFootnote 11 is used to detect whether the language of title and description of a photo is English or German. Afterwards those two items are analyzed if they contain special cases, i.e., words indicating an intensification (e.g., ‘very’), alleviation (e.g., ‘not really’) or negation (e.g., ‘not’) of the affected word. With the help of Java WordNet Library (JWNLFootnote 12) for English words and Tree Tagger for Java (tt4jFootnote 13) for German words, the affected words are lemmatized, i.e., nouns are reduced from plural to singular, verbs are reduced to infinitive and all declinations and comparisons of adjectives are eliminated. After that, the obtained basic form of the word is looked up in ANEW or BAWL-R respectively. If the word is contained in the list, the appropriate valence and arousal values are altered according to the particular influencing language phenomenon; if not, a synonym for adjectives and adverbs or the most frequently used hypernym of verbs and nouns for the word is retrieved with JWNL for English vocabulary or respectively the most frequently used hypernym with the GermaNet Java APIFootnote 14 for German terms. Subsequently the determined synonym/hypernym is looked up again. If it is not contained in the respective list, it is skipped. After detecting these special cases, all remaining words of title and description are treated the same way but without altering the valence and arousal values of ANEW and BAWL-R. The same procedure is applied to the tags of a photo but without analyzing the tags for the above mentioned special cases because now it is dealt with keywords, not with sentences, as well as without deriving hypernyms/synonyms and without language detection for the reason that most taggers use to tag in several languages. So each tag is looked up in both ANEW and BAWL-R. The formal-grammatical mean of word repetition is regarded in so far as words appearing multiple times are not reduced to one occurrence. As a last step, all obtained values of valence and arousal are averaged for each photo and stored in a database together with its photo-ID, its URL and the geographical coordinates.

Fig. 3
figure 3

Simplified flowchart of emotional analysis process

The emotional analysis is depicted schematically in Fig. 3 but in a simplified way because the process of detecting the described special cases is not addressed specifically. The analysis of those special words differs merely in altering the valence and arousal values read from ANEW or BAWL-R, otherwise the procedure is the same.

EMMA works with a valence-arousal-space ranging from 1 to 5 for arousal and from −3 to +3 for valence. Those ranges are the same as the ones of BAWL-R and were chosen for the reasons that valence represents a kind of positive-negative-feeling which can be expressed best with a bipolar scale and that arousal has a certain intensity which cannot be negative.

4 Results

The method described above has been applied to 45,172 photos of Dresden region (32,609 Flickr photos and 12,563 Panoramio photos). 28,983 photos (64 %) of this total amount have been suitable for the emotional analysis because their metadata include words or have hypernyms/synonyms that are contained in ANEW or BAWL-R. Averagely 4.3 words per photo could have been used for the analysis. Figure 4 shows the distribution of the analyzed photos in valence-arousal-space. The average value for valence is 0.69, for arousal it is 2.74.

Fig. 4
figure 4

Distribution of analyzed photos in valence-arousal-space

Figure 5 shows a visualization of an excerpt of these results in the form of a map with colour-coded valence- and arousal-values. The map contains the borders of Dresden’s districts. Arousal is coded with a brightness gradient, valence is visualized with the traffic lights principle (red-yellow-green). The structure of the gradients within the map reflects the density of photos: in dense areas the structure has a high degree of detail and precision whereas in areas of low density, the structure is rather coarse. One dense area is especially the historic city of Dresden (Innere Altstadt) where the majority of touristic attractions can be found. Exemplary photos of the hereafter outlined phenomena are placed left to the map.

Fig. 5
figure 5

Colour-coded representation of valence- and arousal-values in Dresden (Photos 1, 2, 4, 5, 6: Flickr; Photos 3, 7, 8: Panoramio; district borders: OpenStreetMap)

The analysis provides expected results as well as unexpected ones. The places marked in Fig. 5 with 1, 2, 3, 4 and 8 are some of those expected results. 1 and 2 are places with comparatively high arousal and positive valence due to the words used for the particular photo title, description and tags. For Dresden Airport (1) those words are primarily airport (valence: 0.68, arousal: 3.49), airplane (valence: 1.07, arousal: 3.89) and sky (valence: 1.78, arousal: 2.64). Those emotional values are caused by the positive excitement connected with flying. Photos of Dresden Zoo (2) are tagged with words like reptile (valence: −0.17, arousal: 3.09), crocodile (valence: 0, arousal: 3.52), lion (valence: 0.43, arousal: 3.6) or giraffe (valence: 1.23, arousal: 3.0) which reveal Dresden Zoo to be an exciting place. However, a contrary place regarding arousal is the Dresdner Heide (4), a large city forest of more than 6,000 ha in the north eastern part of Dresden. According to our analysis the Dresdner Heide is sensed as unarousing in combination with a positive valence, i.e., it is a calming and peaceful place. This result is conditioned by nature-related tags like nature (valence: 1.99, arousal: 2.69) or sun (valence: 1.91, arousal: 3.02) and by the German word Heide (valence: 1.0, arousal: 1.61) itself which is contained in the name of this forest (English: heath). Places with mid-arousal values and high valence are the district Dresden-Hellerau (3) and parts of the historic city of Dresden (8). Hellerau was founded in 1909 as the first garden city of Germany. The tags of the particular photos illustrate significant characteristics of this district: tree (valence: 0.99, arousal: 2.21) and house (valence: 1.69, arousal: 2.78) for instance. Emotional hotspots within the historic city are the Church of Our Lady, Dresden Castle and the square Theaterplatz which is surrounded by the baroque building Zwinger and the opera house Semperoper. The emotional values of these hotspots are caused by an abundance of tags and words that cannot be named at this point.

An unexpected and interesting but logic phenomenon is the detection of several ruins. The following four ruins are the most apparent ones in Dresden (5): the Sachsenbad (a former natatorium) in Dresden-Pieschen, an old tram station in Dresden-Mickten, the past ruin of Waldschänke in Dresden-Hellerau (a former restaurant) and the old granary of the former army bakery in Dresden-Albertstadt. Those ruins are tagged with decay (valence: −1.74, arousal: 2.72), ruins (valence: −0.12, arousal: 3.24) and similar German terms which result in unpleasant but arousing hotspots.

For the reason that the pure numerical values of the valence-arousal-space are hardly descriptive, an assignment of emotions to classes that each covers a field of 0.5 × 0.5 in valence-arousal-space has been carried out for making this space more approachable. The assignment is depicted in Fig. 6 and has been undertaken with the help of the Ontology of Emotional Categories (Francisco et al. 2010), a taxonomy that covers from basic emotions to the most specific emotional categories. As a first step, all basic emotions of this ontology (affection, anger, bravery, disgust, fear, happiness, neutral, sadness, surprise) has been looked up in ANEW and allocated to the appropriate area in valence-arousal-space (instead of neutral, the values of the synonym indifferent have been applied). After that all subordinated emotions of the ontology contained in ANEW were assigned. Remaining empty fields has been filled with words of the particular range out of ANEW that are an emotional state or are very closely connected with one. As a last step, still unfilled fields were assigned to emotional German words from BAWL-R and translated into English. Some fields are still not filled due to the fact that neither ANEW nor BAWL-R cover the entire valence-arousal-space.

Fig. 6
figure 6

Assignment of emotions to certain valence-arousal-ranges

5 Discussion and Evaluation

One further, not yet mentioned hotspot in Fig. 5 within the historic city of Dresden is the square Altmarkt, where the Striezelmarkt, Germany’s oldest Christmas market, takes place every year. This case reveals that for further processing of the emotional data, temporal aspects need to be considered as well. Another example for this phenomenon are the places that are marked with (6) in Fig. 5. When visiting these places, surely nothing arousing will be found but those emotional hotspots have a strong reason. Cases like this can be referred to Clark (2011), an American journalist focusing on location-aware technologies and their power as storytelling tools, who says “Every place has a story, and every story has a place”. Clark (2011) understands landscape as a structure formed over time by layers of stories like geological strata. The stories detected with the emotional analysis of EMMA have taken place the 13th and 19th of February in 2010 and 2011. On that date but in 1945, the bombing of Dresden in the Second World War was performed. Each year a remembrance of the happenings on 13th of February 1945 takes place, but since the last 15 years more and more right-wing extremists use this event for own propaganda purposes. Out of this grew counterdemonstrations. Both demonstrations also established for the 19th of February. In the last years there were arguments and riots on both sides. Those annually events are documented by tags like Polizei (English: police, valence: −0.2, arousal: 3.17) or Nazi (valence: −2.9, arousal: 4.67) and effect such a negative and arousing emotional appraisal in the map of Fig. 5.

Figure 4 reveals some outliers which are caused by misinterpreted words, incorrect language detection or missing context. For instance the photo with the lowest valence value shows Pillnitz Castle, a baroque castle at the eastern end of Dresden, which is surely no unpleasant place. The description of this photo is in Dutch but has been interpreted as German, so the contained Dutch word tot (English: to, until) was looked up in BAWL-R. In German tot means dead and this of course is rated with a low valence value in BAWL-R. Another misinterpretation example is number 7 in Fig. 5: the so called Blaues Wunder (English: Blue Wonder), a blue painted cantilever truss bridge. In the map of Fig. 5, this area is marked with negative valence and high arousal although this bridge and its immediate surrounding are quite scenic and a popular photo motif. This is caused by an inappropriate basic form detected by Tree Tagger. Wunder is the German term for wonder but Tree Tagger interprets it as wunder, i.e., as the declination of the German adjective wund (English: sore). Due to the fact that this word does not exist in BAWL-R, the hypernym verletzt (English: injured) is applied which has the values −1.8 for valence and 3.94 for arousal. This misinterpretation is caused by a proper name and thus is not an isolated case.

From the total amount of all analyzed photos, 50 sample units have been selected randomly for making a statement about the misinterpretation rate as well as for finding and eliminating their reasons. 20 of the 50 analyzed photos contained misinterpreted words. According to the sample data, each photo is annotated with averagely 15.94 words. 20 % of the words or the respective synonym/hypernym were found in ANEW or BAWL-R and 22 % of those words are misinterpreted (see Fig. 7).

Fig. 7
figure 7

Interpretation and misinterpretation rate of 50 sample photos

Four causes for misinterpretations have been identified. One is the already mentioned reason because of proper names, i.e., the lemmatization of a word or the derived synonym/hypernym is not false in principle but a context information about the occurrence of the proper name is missing, like in the case of the bridge Blaues Wunder. The second cause is a wrong lemmatization, e.g., in one case the noun building has been interpreted as a verb and has been reduced to the infinitive build. Another reason can be the derivation of improper synonyms/hypernyms. For instance the verb abstain was used as a synonym for fast which is not false but the synonym quick was needed. Incorrect language detection can be a further reason for misinterpretations, e.g., the German term Gemäldegalerie (English: art gallery) has been processed with the English lemmatization and has been reduced to gem.

These causes of misinterpretations can be eliminated by including contextual information in the algorithm of emotional analysis so that proper names are taken into account. A further possibility is the application of POS tagging (part-of-speech tagging) for detecting to which part of speech a word corresponds to. Hence cases like the one with ‘building’ mentioned above, can be avoided. Improving language detection is difficult because many users tag in multiple languages and applying language detection to single words reduces the validity vastly.

The results of ascribing the emotional values of the exemplary photos of Fig. 5 to the assigned emotions of Fig. 6 are coherent outcomes which do not exactly fit in every case but are still satisfying. The photos of Dresden Airport (1) and Dresden Zoo (2) fall in the field of the emotion zealous, Dresden-Hellerau (3) reveals to be a calm place and photos of the Dresdner Heide (4) belong to a range of calm as well, whereas most of the photos of Dresden’s historic city are ranged in relief. The ruins (5) reveal to evoke fright, photos of the demonstrations of the 13th and 19th of February (7) belong to madness as well as photos of the bridge Blaues Wunder (6). Figure 4 shows that the majority of analyzed photos are concentrated on a certain range within valence-arousal-space. This makes it advisable, to refine the assignment of emotions with a higher resolution for this range which certainly makes the allocation of photographed places with an emotion more appropriate.

6 Conclusion and Future Work

For the study area Dresden, emotional information hidden in the word choice of photo metadata could be extracted with the help of affect analysis. Not merely one overall ‘averaged’ feeling but also emotional hotspots could be detected with high arousal and low valence and the other way around as well. Consequently the metadata of the photo platforms Flickr and Panoramio are suitable for an emotional analysis insofar as they include words or that these words have synonyms/hypernyms that can be found in ANEW or BAWL-R. Only 64 % of all photos of Dresden contained those words, because many photos are merely titled with the name of the image file given by the camera or an entire series of pictures is titled with ‘Dresden 2010’ for instance. According to Beaudoin (2007) 41 % of Flickr photos are annotated with photographic tags and 85.76 % with place names. Those words do not provide any emotional information. For this reason, georeferenced tweets from Twitter will be included in the emotional analysis as well because they contain pure written language.

The results of the emotional analysis require a further processing, especially a simplification because they are much too fine granular. For the reason that emotions can be only validated by subjectivity, it is difficult to identify if this fine granularity in regions of high photo density is caused by different individual ways of sensing a place. At the same time it is questionable if places are sensed entirely different and if the metadata of photos are significantly influenced by incisive personal experiences (like for instance an experienced car accident on the bridge Blaues Wunder).

For further processing the spatial density of photos needs to be considered as well because of the reason to take photos: people take photos when they are visually attracted by something. That means if more photos are taken at a certain location, then this place is more attractive (Zheng et al. 2011) and sensed as more pleasant.

After the tweets are included to the emotional analysis, the results will be examined regarding gender differences and seasonal differences. Furthermore it is conceivable to apply the emotional analysis to a further region which is different to Dresden, for instance the national park Saxon Switzerland located southeast of Dresden, for examining if significant emotional differences appear in nature as well or if they are a phenomenon of urban space.

The emotions retrieved by the presented emotional analysis represent aggregated emotions. For obtaining a dataset of individual location-based emotional data, physiological reactions of probands will be measured while viewing photos of Dresden, referring to the experiments of Lang et al. (1993) and van der Kolk (1996).