Introduction

Research has recognised the benefits of opportunistically harvesting information from micro-blogging and social media services such as Twitter. In the time critical phases of a crisis, contributions can be made to situational awareness (MacEachren et al. 2011), which has been described as ‘…an individually as well as socially cognitive state of understanding “the big picture”…’ (Vieweg et al. 2010, p. 1079). Event types which have served as case studies include forest fires (De Longueville et al. 2009; Vieweg et al. 2010), floods (Poser and Dransch 2010), earthquakes (Stollberg and de Groeve 2012; Mendoza et al. 2010), and political protests from a geographically localised (Starbird et al. 2012) to country-wide scale (Cheong et al. 2012; Kumar et al. 2013; Starbird and Palen 2012). However, identifying relevant information pertaining to a particular event and assessing the credibility of users and information derived is challenging. In addition, researchers have identified false rumours spread via social media during crisis events (Mendoza et al. 2010; Goodchild and Glennon 2010), with false reports that the New York Stock Exchange was flooded during Hurricane Sandy in 2012 misleading even mainstream news networks such as CNN (Mirkinson 2012; Wortham 2012).

MacEachren et al. (2011) found that maps that identify both the location of the event and the micro-blogger to be the most useful for crisis management personnel. These maps would enable the visual assessment of how close a micro-blogger is to the event they are contributing information about. Researchers with focus on social media have identified that ‘People who are on the ground are uniquely positioned to share information that may not yet be available elsewhere in the information space’ (Starbird et al. 2012, p. 2). Diakopoulos et al. (2012) describe the journalistic focus and importance of eyewitnesses for breaking news, where being in close proximity to an event and being able to make a report is preferred to expert knowledge on a topic. Providing eyewitness accounts to a breaking news story provides credibility to the story and is one of the ‘… quintessential acts of journalism’ (Diakopoulos et al. 2012, p. 2452). Witnesses are a fundamental component of the criminal justice system and criteria to assess witness credibility include ensuring the opportunity to view the crime (Wells and Olson 2003).

For Twitter, researchers have identified numerous sources which can be analysed to identify the geographic location of micro-bloggers and their topics (e.g. Ostermann and Spinsanti 2010). These include, but are not limited to, the optional micro-blogger’s account metadata location (metadata location), the optional mobile device determined GPS location, and spatial descriptions within the content of the micro-blogs (content locations). Despite there being numerous sources, it is a complex problem to determine the location of micro-bloggers and events, especially to a granularity of less than a city. Hecht et al. (2011) found 34 % of metadata locations contained non-geographic or blank entries, and for valid geographic entries, less than 10 % were to a granularity of city. Micro-blog content relating to crisis events has been identified to contain location references in up to 40 % of cases dependent on the type of event (Vieweg et al. 2010), but deriving unambiguous locations from such content continues to be challenging. Recent reports indicate just 3 % of tweets have linked GPS coordinates (Leetaru et al. 2013). This complimentary research contributes to identifying locations of the micro-blogger and/or the event they are discussing, but does not focus further on the relationship between the user and the event.

This research explores whether witness accounts of an event can be differentiated from social media micro-blogs. The definition of a witness account will be further explored later but for now a witness can be understood as a person who has directly observed the event and posted a micro-blog about their observation, called the witness account (WA). It is envisaged, a range of characteristics might differentiate likely WAs from those which are not. These characteristics include descriptions of sensing such as ‘I see’, ‘hear’, or ‘smell’, linked content such as photos, and explicit acknowledgement of being impacted by the event. The hypothesis for this work is that likely WAs of an event can be differentiated from unlikely WAs, based on a categorisation of characteristics of micro-blogs.

To be a witness, the micro-blogger must be in the region affected by the event, which will depend on the type of event and effect that is reported. If the relationship between the event and the micro-blogger can be inferred it reduces the reliance on social media location sources to explicitly locate the micro-blogger for credibility assessment, and creates additional spatial intelligence. Previous research has identified that places may not be referred to by their name but as general place categories as they become assumed knowledge within the social network, and further used as reference points (Vieweg et al. 2010). This research also explores content that contains informal place categories related to personal action spaces (Goodchild 2009) such as ‘home’ and ‘work’ rather than place names, which may additionally identify these micro-bloggers as to have local knowledge. Fewer place names in micro-blogs from people in close proximity to an event may have implications for research seeking to identify actionable content related to events. Specific assumptions to be tested include WAs provide influence regions between micro-bloggers and events, and contain fewer named place references than non-witness accounts.

Additional goals of this research are motivated by questions raised from previous social media research. The use of metadata locations for research has limitations (Hecht et al. 2011), leading some researchers to reject them as a location source (e.g. Starbird et al. 2012), and others to accept them (e.g. Cheong et al. 2012). This research will explore metadata locations to determine whether they are appropriate for corroborating likely WAs.

The approach adopted is an in-depth manual analysis of a single case study. The case study event was a bushfire on the northern urban boundary of metropolitan Melbourne. This was a significant local event causing widespread disruption in the surrounding suburbs. Data collected included 1711 micro-blogs using the keyword ‘bushfire’, micro-blogger account metadata, and publicly available linked content. Using manual methods, 461 on-topic, individual and original (OIO) micro-blogs were identified, and of these 198 were identified as potential WAs which were further coded for identified categories. The results of this research include models of witnesses and related concepts and indicate that likely WAs can be differentiated and categorised by the reported effects. Influence regions between witnesses and the event, based on reported effects, are visualised and further characteristics related to place name usage by witnesses are also established.

Section “Background” presents background research, in Sect. “Theory” concepts are developed, and Sect. “Methods” outlines the method for collection and preparing a corpus case study. Results are presented in the Sect. “Results” with specific interpretations, and Section “Summary” provides a high level discussion and evaluation with conclusions and potential future research in Sect. “Conclusion and future work”.

Background

When an emergency is occurring authorities and journalists alike will seek out witnesses who can provide accounts of the unfolding events. Journalists will seek witnesses because they can provide differentiation and credibility to a news story (Diakopoulos et al. 2012), and witnesses themselves will be assessed by authorities to determine suitability for providing evidence. The seminal paper of Fogg and Tseng (1999) defines the credibility of information as believability. Their synthesis of the academic research concludes credibility as a perceived value with a number of dimensions including trustworthiness and expertise. Research has tested the perceived credibility of topics evolving in Twitter (Castillo et al. 2011), and which Twitter features people use to perceive the relative credibility of their fellow micro-bloggers (Yang et al. 2013; Ringel Morris et al. 2012). These examples of credibility and Twitter research do not consider the location information that can be generated from metadata and content. In comparison Thomson et al. (2012) conclude that micro-bloggers who do not publish a metadata location can be correlated with sharing less credible information, but find this correlation is mitigated when micro-bloggers are from the same country as the event.

With an excess of 400 million micro-blogs daily (Twitter 2013), there appears to be universal agreement that the micro-blogging service Twitter, is a noisy information stream. In this stream, research focuses on identifying the most relevant information, which differs from, but is related to identifying the most credible information. In research with this focus, location information plays a larger role. Relevance theory states that an input is relevant to a user when its processing in context produces a positive cognitive affect (Sperber and Wilson 2004). Kumar et al. (2013) seek to identify the most relevant information about events, and to this goal distinguish between ‘… local users who witness the unfolding event and remote users who are connected via social media’ (Kumar et al. 2013, p. 139). They describe automated methodologies employed to assign topic affinity and geo-relevancy scores. Spatial analysis to derive the geo-relevancy scores do not attempt to determine if micro-bloggers are witnesses to any of the events. Rather, GPS or metadata locations are used to determine whether the micro-blogger is within the country where the events are occurring. So though micro-bloggers with a higher-than average geo-relevancy and topic affinity score are labelled as ‘eyewitness users’, they are not necessarily a witness according to the definitions of this research.

Kumar et al. (2013) also refer to a concept of micro-bloggers being ‘on-the-ground’ (OTG). Other research classifies micro-bloggers into two categories being ‘… those who were on the ground and tweeting information from the ground, and … those who were not on the ground or were not tweeting information about the protests from the ground’ (Starbird et al. 2012, p. 6). Manual content analysis was utilised to achieve this categorisation for the purposes of analysing crowd recommender behaviour, with earlier work looking specifically at retweet behaviour of local micro-bloggers in comparison to other micro-bloggers (Starbird and Palen 2010). For this body of research metadata locations were not used, as it was identified these may be purposely falsified by people tweeting in political protests. They also identified micro-bloggers who were ‘… tweeting real-time information from the ground without being physically present at the event’ (Starbird et al. 2012, p.6) by utilising a live broadcast of the event. This highlights challenges of identifying micro-bloggers whose information is gained from direct observation from those observing news broadcasts and other sources.

In contrast to the researchers already described, who primarily utilise spatial information to identify those micro-bloggers which are OTG, Diakopoulos et al. (2012) develop a dictionary-based technique to classify potential witnesses based on 741 words from numerous Linguistic Inquiry and Word Count (LIWC)Footnote 1 categories including percept, see, hear, and feel. They compare their classifier with human analysis using three subjects sourced from Amazon Mechanical Turk, and conclude their method suitable for their purposes of prototyping a tool for journalists, however, acknowledge their approach to be ‘… only a first step toward the challenge of classifying eyewitnesses …’ (Diakopoulos et al. 2012 p. 2455). Today, the main way for the news media to source early witness reports and images is from everyday citizens (Wigley and Fontenot 2010). Witness reports in social media have been identified to be two and half times more likely to contain linked content (Harlow 2011). But journalists need to be increasingly cognisant of the definition of credibility as believability when considering these sources, with investigation of the most retweeted images related to Hurricane Sandy identifying many fakes (Burgess et al. 2012).

Vieweg et al. (2010) present an in-depth analysis of what location information was contained in social media content for two emergency events, a flood and a grassfire with a view towards extracting useful information to contribute to situation awareness (SA). They found 40 % of micro-blogs related to the fires contained a clearly identifiable address or place, while 18 % did so related to the flood. Though not explored, this research identifies the presence of what is named ‘markedness’ and ‘relative references to location’. Markedness of places is described as places no longer referred to by their name but by their general category as an emergency event unfolds, which impacts the ability to extract relevant micro-blogs. Relative references to locations are described as ambiguous reference points used in spatial descriptions. The authors identify for the flood and fire case studies that this occurs for 6 and 8 % of micro-blogs respectively.

Researchers have used Natural Language Processing (NLP) and Machine Learning (ML) techniques in efforts to automate the classification of micro-blogs that contain actionable information contributing to SA (Verma et al. 2011). The authors found that training the classifiers on spectrums of ‘…subjectivity, personal or impersonal style, and linguistic register (formal or informal style)’ enhanced the results (Verma et al. 2011, p. 386). This is based on the authors positing that micro-blogs ‘…that contribute to situational awareness are likely to be written in a style that is objective, impersonal, and formal…’(Verma et al. 2011, p. 386). NLP has also been employed to extract place descriptions from micro-blogs (e.g. Gelernter and Balaji 2013). Numerous approaches to identifying, extracting and disambiguating place names that can be found within the content of micro-blogs and or in combination with the metadata locations, can be referred to (e.g. Sankaranarayanan et al. 2009; Cheng et al. 2010). Identifying words common to geographic regions is another approach (e.g. Eisenstein et al. 2010). Additionally, approaches for identifying and disambiguating place descriptions in other content forms such as streaming news are relevant (e.g. Lieberman and Samet 2012). This significant body of complementary research generally seeks to determine the location of the micro-blogger or event (or may not distinguish between the two), rather than further distinguish the relationship between the micro-blogger and the event. These methodologies may be used to extract the on-topic micro-blogs, which in addition to being individual and from an original source, are pre-requisites for identifying witnesses. An exception is research which implements regression analysis to establish demographic indicators between micro-bloggers who contribute actionable content that can be geocoded (a dependency for the analysis), and a wildfire event (Kent and Capello 2013).

Although it has been clarified that geospatial information opportunistically harvested from social media differs conceptually from Volunteered Geographic Information (VGI) (Winter and Richter 2011; Harvey 2013), research into quality of VGI can prove useful. In comparison to traditional authoritative geospatial datasets, it has been argued the quality of VGI be considered in terms of credibility rather than accuracy (Flanagin and Metzger 2008). Approaches typically focus on deriving characteristics of the contributor, including their familiarity with the environment, experience, and value of past contributions (e.g. Keßler et al. 2009; Goodchild 2007). In VGI research, many of the computational approaches leverage the availability of the contributors spatial metadata, for example home locations (Bishr and Mantelas 2008) or more sophisticated mobility models (Mashhadi and Capra 2011), which are typically not available for micro-bloggers using services such as Twitter (Leetaru et al. 2013). However, evidence of a micro-blogger’s familiarity with the environment might be present in social media such as Twitter. Goodchild (2009) defines ones action space as the locations in which daily life is played out, including places of home, work, school and leisure. The major places within an action space have been defined as where people spend long periods of time ‘…and people usually have an explicit name for them (home, work place, etc.)’ (Schmid 2007, p. 656). Using social media data, the question of whether place name references reflect a micro-blogger’s action space centred on their home location has been explored (Xu et al. 2013). This collective research underpins the idea that witnesses to many event types are likely to be within their action space (Gonzalez et al. 2008), and additionally, in the context of a social network of friends, may refer to informal places such as home, work, school rather than specific place names or addresses. Using these informal place names and categories suggests familiarity with the environment.

There is a significant history of research from numerous disciplines defining an event, including models relevant to geographic phenomena (e.g. Galton 2000). An event can be defined as ‘something that happens at a given place or time’ (Miller 1995), or as an occurrence (Worboys 2005). However, social media specific research into events does not always differentiate real world events from virtual space events. Boettcher and Lee (2012) disambiguate these concepts beginning with the definition of an event ‘…as a significant occurrence or happening that is restricted in time’ (Boettcher and Lee 2012, p. 358), and then differentiating virtual space events defined as those ‘…that are only relevant in the Twitter user community’ (Boettcher and Lee 2012, p. 358) from real world events. Real world events are described as being further categorised into those which are global events, not restricted to a specific location, in contrast to local events. In geographic information science, a more specific scale of space would be defined, for example the definition of a perceptual scale, where vista space is the space which can be ‘…apprehended from a single place without appreciable locomotion’ (Montello 1993, p. 315), which is complementary to the direct observation of a witness.

Theory

This section presents the development of concepts to identify (likely) witnesses.

Definitions

Critical to this research is the definition of witness. Common dictionary meanings of the word include a ‘person who sees an event happening, especially a crime or an accident’.Footnote 2 From a journalism perspective, witnesses may be defined as ‘people who see, hear, or know by personal experience and perception’ (Diakopoulos et al. 2012 p. 2455). ‘WordNet’ defines witness to be ‘someone who sees an event and reports what happens’ (Miller 1995), which suggests expansion from being able to perceive an event to being able to provide a report. The use of the word ‘seeing’ could be interpreted literally; however, this research will not restrict observation to the visual, but expand the meaning to include direct observation of the event by any of a person’s senses. This expansion is supported by the criminal justice system with ‘earwitnesses’ providing accounts of conversations overheard, and Australians’ living in bushfire prone areas report the smell of smoke as it may be detectable long before a visual verification. In this research events are generally recognised as occurrences (Worboys 2005), and a local event in social media (Boettcher and Lee 2012), with scale defined from the vista space to the geographic space (Montello 1993). The consequences of an event that can be witnessed by people are labelled ‘effects’. It is recognised that these effects may additionally be defined as events, that is a ‘phenomenon that follows and is caused by some previous phenomenon’ (Miller 1995). Witnesses are defined as people who directly observe the event or its effects and provide a report of these observations, their WA.

Table 1 presents the terminology adopted by this research for micro-bloggers and their micro-blogs, distinguished by the primary content categories of topic, location and time, for which Table 2 provides more details. Pre-requisite for establishing witnesses are the content is OIO. In addition to a direct observation, the content categories of direct impact and relay have been coined. A micro-blog is classified as an impact account (IA) when the content does not include a direct observation, but indicates the user is personally impacted by the event or is undertaking an action because they are impacted by the event. For example, micro-blogs can provide a status of how close a bushfire is or indicate the activation of evacuation plans. Users who send IAs are defined as potential witnesses. Potentially, other micro-blogs in their social media timeline might be WAs. The micro-blog is classified as relayed when the account is about a direct observation or impact of a person who is not the micro-blogger. In the context of a social network, the micro-blogger might be relaying the observations of friends or family. An alternative to micro-blogging ‘I can see smoke’ is taking a picture with a phone and sharing with a social network. This micro-blogger fits the definition of a witness providing a WA. As with the textural content of an account, the linked content needs to be classified as being original and depicting an effect, to be categorised as a WA.

Table 1 Witness and related categories
Table 2 Explanations of primary content categories

This research seeks to differentiate more probable WA from those which are less probable, and characterise those from which the status of a micro-blogger as a witness or potential witness can only be inferred. It assumes that accounts have not been maliciously fabricated. It should also be noted that witnesses can only be identified from their micro-blogs, and therefore, micro-bloggers might be witnessing the event and even micro-blogging on the topic, but will only be identified if their content contains witness characteristics that can be observed through human analysis.

Influence regions

As outlined previously, researchers have used a variety of mechanisms to identify micro-bloggers who are OTG, including manual analysis of a micro-blogger’s social media feed (e.g. Starbird et al. 2012), and automated analysis of metadata locations (e.g. Kumar et al. 2013). This research complements the existing by inferring a micro-blogger is OTG, if it is determined they have provided a WA or IA. This research seeks to further refine OTG by the identification of an influence region, which is the region in which the micro-blogger can be inferred to be located, by the effect or impact they have reported.

Various events have different characteristics of their influence regions. For example, in a bushfire event a micro-blogger who reports seeing flames will need to be in a region in relatively close proximity to the event compared to a micro-blogger who reports seeing smoke. The boundaries of an influence region are considered to be vague in two ways. Spatially, in that the boundaries are indeterminate, and in actuality, in that it is uncertain which effects and impacts are considered to be caused by the event and which are not. This research assumes the micro-blogger links their account to an event by keywords. An assumption is that the effect or impact that the micro-blogger is reporting reflects their proximity. For example, a micro-blogger seeing flames reports this, rather than seeing smoke because it is a more pertinent observation to share. If the micro-blogger seeing flames and smoke reports smoke, the micro-blogger will be placed in the larger influence region.

Typically spatio-temporal characteristics of many effects are defined to support prediction modelling for emergency response efforts, and in a real-life scenario these predictions would be adapted to accommodate evolving conditions. In conjunction with external data sources, a validity test of the observations being reported might be possible (e.g. Bishr and Mantelas 2008; Mashhadi and Capra 2011; Yanenko and Schlieder 2012). For example, as it is possible to infer that a micro-blogger is in close proximity to the event because they have reported seeing flames, it might also be possible to discredit such an account. In a bushfire, very few individuals would be in proximity to the fire, and if they were, they are likely to be defending their property and have limited opportunity to micro-blog about it. The number of individuals with the opportunity to observe and report smoke could be vast in comparison. Similar considerations can be established for other categories of events. A traffic accident can be in vista space, or the noise of the crash can be heard, or the queues of congesting cars behind the accident can be observed. An open air concert can be attended, or heard from a distance.

Place descriptions

This section outlines the concepts related to the characteristics of place name and place category references within the content of WAs and IAs.

Place names

Work in this paper will characterise how place names are used in WAs and IAs. In addition to place names, the use of formal and informal place categories will be explored, as to whether these are used more frequently by micro-bloggers who are OTG. For this purpose Table 3 provides descriptions of categories and terminology. Informal place categories are separated, as these may reflect more personal places such as neighbourhood rather than a broader environment.

Table 3 Explanation of place description categories

Place names are used to name an event

For many events, the names of the places in which they occur, naturally become the names which are used to identify the event (e.g., the ‘Kingslake Bushfire’, or the ‘Queensland Floods’), or vice versa (Chan 2014). In emergency response scenarios, these event place names will be broadcast widely by news and emergency services to provide updates to citizens. For OIO micro-blogs which use place names, an exploration will be made of the use of those which were widely broadcast and those which were not, to establish if any characteristics can be identified. Though many witnesses are expected to use these event place names, an exploration of those which do not, may provide unique observations of the event.

Corroboration

Existing methods to identify if micro-bloggers are OTG are used to corroborate WAs and IAs. Depending on the end-use of harvested information and the extents of the effects of the event, different granularities and needs for redundancies might be considered. This will have particular implications for accounts that do not contain place names in their content. To enable comparison with previous research, and enable observations of appropriateness, each account will be compared with the metadata location and GPS locations when available, to form a matrix of corroboration. This exploration will provide an indication of what proportion of accounts cause issues due to inconsistent locations, and what scenarios might cause these. Additionally, an analysis will also be completed for those micro-bloggers who micro-blogged multiple times, after their individual micro-blogs have been categorised. It is expected a mix of account categories will be present.

Method

This section uses a case study to apply the concepts presented previously, and test them against the research hypothesis.

Event description

The event used for this case study was a bushfire which commenced at approximately 1 pm on Monday 18th February 2013, which was not a school or public holiday. The ignition point was identified on a rural road on the northern urban boundary of metropolitan Melbourne and progressed southwards towards more densely populated suburbs, where residents were advised to evacuate. The Hume Freeway, the main arterial road connecting Melbourne and Sydney was closed. The fire was attended by 175 fire-fighting personnel including volunteers and considered under control during Tuesday 19th February 2013. The fire burnt approximately 2040 ha, a number of buildings were lost but casualties were not reported. The event is described as a significant event,Footnote 3 leading evening news bulletins in the state of Victoria.

Data collection

Twitter is used as the source social media for this case study. Micro-blogs or ‘tweets’ were collected using the keyword ‘bushfire’, which is considered appropriate for the event of the case study. ‘Bushfire’ is used ubiquitously and predominantly in Australia in reference to uncontrolled fires outside urban areas, which enables the exclusion of uncontrolled fires outside of Australia, and urban building fires. As previous research indicates, only a small proportion of tweets come with GPS coordinates (e.g. Leetaru et al. 2013), and therefore, the more geographically specific keyword contributed to the ability to ensure the integrity of on-topic tweets.

The package of software tools described by Bruns and Liang (2012) and Bruns and Burgess (2011) was utilised. This includes a tweet retrieval and storage environment, which predominately uses the Twitter streaming API to collect tweets containing user configured keywords. This environment was running for the majority of the bushfire season for south-eastern Australia, and as such it collected tweets from the beginning of the event and as the event unfolded in real-time. Micro-blogger profile metadata was also collected for each micro-blogger that appeared in the tweet archive, with a script checking for new micro-bloggers every two hours.

Pre-processing to establish the OIO corpus

Each tweet in the archive was linked with the micro-blogger metadata. Manual processing on a number of passes was undertaken to establish a corpus of data with OIO tweet content. Figure 1 provides an overview of the processing of each tweet. This was a process of elimination and does not guarantee that the tweets left are OIO. However, during this event there were no other significant bushfire events bordering suburban regions of Australia. This contributes significantly to the confidence that tweets without place names or metadata locations are likely related to the event of the case study.

Fig. 1
figure 1

Overview of the process to establish the OIO corpus

Linked content

Once the OIO corpus was established, the linked content was processed, which involved reconstructing URLS, manual inspection and collection of any original content. Of the 461 OIO tweets, 102 had linked content, 95 of which could be stored for analysis. The content that could not be stored was either privacy protected or had been removed since the event. The content was then manually inspected to establish if it was original. In this case study it was common for tweets with original text content to link to mainstream news and emergency service websites, which is relatively simple to identify as an unoriginal source once the URLs have been reconstructed. However, there were instances where very compelling and professional photographs were linked on personal websites but not credited, thus requiring further investigation.

Coding of characteristics

The coding of the witness and related categories was through manual inspection of each tweet via a number of passes, by the author. The priority for each pass through the OIO corpus was the category of interest in the following order, direct observation, a direct impact or action, and relays. Each tweet for each category was then further inspected and sub-categorises developed and applied for observable effects and impacts. Linked content identified as original, was further categorised as to whether it constituted a direct observation or not, for example smoky sky versus a screen grab of a mobile phone application. The resulting dataset is referred to as the reference dataset.

Coding process evaluation

Experiments to evaluate the coding process, by testing whether agreement could be stated for the reference dataset were completed. Procedures involved the participants completing three experiments, with 10 % of the OIO corpus extracted to support training tasks. However, it must be noted that the training exercises could not replicate the familiarity the author has with the corpus. The first experiment required the participant to read each tweet in the OIO corpus and code WA and IA that they identified in separate passes. Testing of the experimental procedures revealed that fatigue with the categorisation tasks could be reduced if the participants focused on a single category at a time, and completed experiments over a number of days rather than a single sitting. Additionally, the limited number of RA proved insufficient to enable adequate training or justify a third pass through the corpus. The second and third experiments were to read each WA and IA from the reference dataset respectively, and code the sub-categories defined. Testing of the experimental procedures revealed difficulties in training participants to identify the sub-categories other, however, these were included and their influence is discussed further in the results. Two participants completed the experiments, and can be described as native English speakers and Australian residents since childhood.

Results

In this results section, where complete or near complete tweet text or images from linked content are presented, the full URL is provided in footnotes. However, where ‘snippets’ of tweet text are presented the full URL of the source is not provided, and the authors can provide further details on request. The root source of snippets is https://twitter.com, access date Monday or Tuesday 18th and 19th February 2013.

Summary

461 tweets representing OIO were differentiated. Table 4 presents a summary of results for the primary categories defined in Table 1. Accounts classified as delayed only include those that could be identified from the text content alone. With only ten tweets, RAs were not a large category, but distinguishable with careful consideration. References to personally known people (e.g. ‘son’, ‘mum’, ‘dad’, ‘cousin’, ‘family’), and the impacts these people are experiencing dominated the category with eight of the ten accounts. As this category is small, absolute numbers rather than percentages are presented throughout this paper.

Table 4 Summary results for each primary category

Witness accounts

As outlined in Table 5, in this case study WAs are dominated by observations of smoke (77 %) which is not unexpected due to the potential spatial extent of this effect. A total of 234 effects were identified, with 34 WAs coded with two effects and one with three effects. Examples of explicit and implicit sensing were present in the majority of sub-categories. Explicit observations of seeing smoke dominated during the day, with smelling smoke at night. Nine accounts report observation of a bushfire moon, which is assumed from descriptions to be an observation of what the moon looks like when smoke is in the sky at night. A number of WAs were additionally identified to include impact descriptions. For example, three WAs also refer to evacuation and numerous provide spatial descriptions indicating the bushfire is near to them or indications of being fearful.

Table 5 Summary of bushfire effect categories

Linked content

Of the 95 OIO tweets with linked content that could be collected, 59 were categorised as linking to original content, and of these 48 were categorised as direct observations of the event, and therefore considered WAs. 34 of these WAs also had textual content which stand-alone would have been categorised as a WA. Almost all are photographs of smoke in the sky, with a number of examples of recognisable buildings in the skyline (see Fig. 2). The only images reporting traffic related effects are also shown in Fig. 2. No linked content was differentiated in the corpus of actual fire-fighting, though uncredited examples were identified through comparison of images published by the media, shown in Fig. 3. There were a number of daytime images posted at night (determined via timestamp) which were not previously coded as delayed. This suggests that more delayed accounts exist than can be identified from the text content alone.

Fig. 2
figure 2

Linked content reporting: a traffic congestion on the Hume Highway (https://twitter.com/holly_yeatman/status/303394166686224384/photo/1. Access date 6 March 2013), b closure of the Hume Freeway (https://twitter.com/chriscorneschi/status/303411867542507520/photo/1. Access date 6 March 2013), c smoke from a suburban backyard (https://twitter.com/nmg75/status/303408200122769408. Access date 6 March 2013), d smoke from Melbourne’s central business district (CBD) (https://twitter.com/taitems/status/303351685848379392. Access date 6 March 2013)

Fig. 3
figure 3

Example of linked content identified as uncredited media source, and therefore not a WA (https://twitter.com/SukhSandhu/status/303398640167288832/photo/1. Access date 6 March 2013)

Impact accounts

IAs are dominated by a category which has been named event near me because as the example in Table 6 indicates, this is what the content communicates. This content category appears similar to the relative location referencing described by Vieweg et al. (2010), but with approximately 12 % of OIO in this case study a greater proportion. For the evacuation category, potential witnesses considering their evacuation or ‘Bushfire Survival Plans’ were identified, as were those who had already evacuated. Similarly, content indicating plans might be affected, or had been affected were categorised under plans change, mostly travel plans. With a larger corpus it might be possible to further differentiate categories of anticipated impacts, from impacts which have already occurred. Ten of the other impact accounts were distinguished because of an apparent heightened emotional state alone. However, this can be observed in accounts across all categories in a variety of ways. Two tweets imply that the potential witnesses are involved with emergency response activities. On further investigation, it can be confirmed that one of the users is a volunteer fire fighter, and the other cannot be discredited.

Table 6 Summary of bushfire impact or action categories

Coding Process Evaluation

Table 7 presents the results of the first experiment to evaluate the coding process for primary WA and IA categorisation. These results indicate agreement, in particular for WA, and therefore, the validity of the methodology employed. Difficulty in training participants in other sub-category tweets (refer to Table 5 and Table 6 for WA and IA respectively) was reported, leading to the investigation of results with their inclusion and without. Their influence is more pronounced for IA, due in part to their larger proportion of this smaller category.

Table 7 Coding process evaluation results for primary categorisation

Participant A and B achieved 97 % and 98 % agreement on average with the WA sub-categories (refer to Table 5) in experiment two, and 80 % agreement on average with the IA sub-categories (refer to Table 6) in experiment three, or 84 % agreement when the other sub-categories are excluded. Of note, Participant A commented they had not coded all references to a ‘Bushfire Survival Plan’ as belonging to the evacuation sub-category because the plan might be to ‘stay and defend’Footnote 4 rather than evacuate, reducing their agreement for this sub-category to 67 % and highlighting the importance of robust definitions. For the sub-category of IA with a significant number of examples to support training, event near me, Participant A and B achieved 88 and 81 % agreement respectively. This acceptable but lower figure than the WA sub-categories may reflect a reliance on expertise in spatial science, whereas WA sub-categories did not require such expertise.

Place descriptions

This section presents results for place descriptions contained within the different categories of accounts.

Place names

Place names and categories described in Table 3 were coded and results presented in Table 8. The categories defined as personalised place categories, informal place categories and personalised informal place categories described in Table 3 have been combined in Table 8 under the heading personal place category. This decision was made primarily due to the small sample size. The WAs and IAs in combination contain fewer place names, 47 % compared to 69 %, and more place categories or personal place categories, in combination 34 % compared to 5 %. These differences are more pronounced for certain categories such as event near me. Additionally, it can be noted that egocentric spatial descriptions are not present in NWIRA, which reflects that micro-bloggers who tweet from a personal perspective appear to be credible witnesses. The figures in Table 8 are the number of accounts that include place categories, not a count of instances within the tweet. There are very few examples observed where more than one instance of a non-place name was within a single tweet. However, a list of multiple place names within a single tweet is more common. Table 9 provides example tweet content and further observations.

Table 8 Place name and category summary
Table 9 Place name and category comments and example snippets

Place names used to name the event

An event such as the case study is widely broadcast via the news media. Below are two of the first WAs identified from Twitter for the case study event:

  1. 1.

    yo does anyone know what’s up with the haze coming from the north of melbourne? smells like bushfire. can’t find any news online. Footnote 5

  2. 2.

    Hmm what looks to be a sizeable bushfire off in the north-east? http://t.co/jRIk5upX Footnote 6

These accounts provide evidence that it might be difficult to determine from observing smoke in the sky which places are on fire. For the general majority, the names of the places affected might only become known from observing mainstream news. Identifying which accounts do not contain widely broadcast place names might provide further evidence supporting WAs. Table 10 lists which place names were identified as being widely broadcast and therefore eliminated and Table 11 lists lesser used place names which remained for WAs and IAs versus NWIRA. Lesser used place names for IAs and WAs are mostly suburb names whereas NWIRAs appear to report more precise information. But when these accounts are investigated, it appears that each of these can be attributed to another source such as the media rather than the individual (and therefore, why they were eliminated as WAs at processing time). Five of the seven road names presented in Table 11 can be attributed to a single micro-blogger who is identified as the most prolific in the corpus. This micro-blogger is discussed further in Sect. "Corroboration".

Table 10 Place names identified as event place names or highly publicised place names
Table 11 Place names other than the event place names or highly publicised place names present in the corpus

Influence regions

Two methods were used to explore influence regions between the event and witnesses. Figure 4 maps the place names presented in Table 11, corresponding to WAs that reported smoke or traffic conditions. Google Maps Engine was used to geocode place names and create the visualisation, with point icons representing each place name as Google Maps suggests. In this small corpus, smoke and traffic conditions were the only categories with enough content to be suitable for this visualisation. The number of reports, and the potential spatial extents and proximity to the event which can be derived from this content, appear to fit the model described in Sect. “Influence regions”. Figure 4 also shows what is interpreted to be a boundary of ‘metropolitan Melbourne’ as presented by Google Maps in the search result for ‘Melbourne, VIC, Australia’.Footnote 7 The fire extents are as depicted by Emergency ServicesFootnote 8 during the event.

Fig. 4
figure 4

An approximation of influence regions for smoke effects and traffic effects using WA which contain place names that are not event names or highly publicised place names, with fire extents and ‘metropolitan Melbourne’

The event near me category is characterised and differentiated by content that appears to directly define an influence region using spatial descriptions, which when generalised communicate bushfire near me. The qualitative spatial relation near is specified in a variety of ways, with varying levels of precision: some simply used the words near or close, ten accounts were in units of time (e.g. 1015 min drive), five accounts in units of distance (e.g. not even 10 km away), and five accounts in units of suburbs (e.g. neighbouring suburb).

Corroboration

Table 12 presents a summary of the metadata locations for the OIO corpus. Note these statistics were calculated by the number of accounts, not the number of unique micro-bloggers. They compare broadly with those presented in Hecht et al. (2011), however, differences can be noted as to the granularity of the valid geographic entries, perhaps in part due to population density differences between Australia and USA. Potential witnesses creating IAs have significantly fewer valid geographic entries.

Table 12 Summary of metadata locations including breakdown by granularity for valid geographic entries

Table 13 presents corroboration results using metadata location or GPS. The metadata location is considered not suitable for corroboration if it is blank, contains a non-geographic entry, or valid geographic entry that is not specific enough. For this event, the granularities considered suitable were from the most precise GPS coordinates to metropolitan area (see Fig. 4 to visualise, for example, ‘metropolitan Melbourne’). In the majority of cases, the metadata location corroborates WAs and IAs, and most usefully for 60 and 29 accounts respectively, where place names were not present in the content. Seven WAs and IAs were identified which were not consistent, and these were explored further. Five were confirmed as WAs and IAs even though their metadata did not indicate Melbourne, and two were identified as false positives. The majority of NWIRA was also from micro-bloggers disclosing their location to be Melbourne, confirming that micro-bloggers can be micro-blogging about an event but not provide evidence that they are witnesses or are impacted. Another interpretation of these results might be that the event was primarily a significant local event resolved in less than 1 day, and therefore primarily of local interest only.

Table 13 Corroboration matrix indicating whether metadata or GPS locations corroborate the account

Table 14 outlines the number of micro-bloggers who sent multiple accounts related to the event, and indicates how many could be described as mixed categories or single categories. A single category is assigned if all the micro-bloggers accounts are IAs and WAs, or are NWIRAs, and a mixed category is assigned if a combination of WAs and IAs and NWIRAS is detected. All micro-bloggers categorised as mixed categories were further investigated and verified, with only two requiring additional comment. The first micro-blogger had five accounts all NWIRAs, except one WA which is consequently identified as a false positive. The credibility of this micro-blogger was already in question as the example in Fig. 3 indicates. For the second micro-blogger who contributed over 16 tweets, the context provided, when considering all tweets collectively does bring into question the categories previous tweets were given. This suggests that as an event is progressing, the status of each micro-blogger might be reviewed, when a certain number of tweets are reached. Additionally, this micro-blogger provides an example that a person can legitimately be a witness to parts of the event or be personally affected, but not necessarily be a witness to other parts of the event.

Table 14 Summary of micro-bloggers by the number of accounts, and category of accounts contributed

Discussion

This case study was suited to test the hypothesis and models developed, despite being based on a small dataset and a single event. Manual analysis of the corpus enabled an in-depth exploration of the characteristics of the micro-blogs related to the event, and avoided the challenges of automatic interpretation which when dealing with unstructured social media data can be significant. Additional advantages of this approach and the event selected include the ability to incorporate local knowledge for the event area, the ability to relatively define temporal and spatial characteristics of the event, and the reasonable certainty of the integrity of the OIO corpus in the absence of absolute data such as GPS. Disadvantages include potential bias being introduced because of relying on human interpretation and that the model was applied to a single type of event, which might introduce characteristics not applicable to a general model. Additionally, it did not support an in-depth exploration of relayed or delayed accounts characteristics.

For this event, witnesses could be differentiated and categorised by the effects they reported observing. The effects were dominated by smoke (77 % of WAs). As smoke can be detected over vast areas, it might be expected, that other types of events may have proportionally fewer witnesses. Micro-bloggers reported their direct observations in either explicit (e.g. ‘I see smoke’) or implicit ways (e.g. ‘thick orangey haze’). Additionally, subtle changes in language could be detected as day became night, where the sense of smell became more prominent. These findings provide clues to the challenges of future automatic interpretation to differentiate WAs. The categories of effects and sensing need to be identified and interpreted in both explicit and implicit forms. For two of the effect categories, smoke and traffic congestion there were sufficient WAs with unique geocodable place names to support a visualisation of the influence regions. Additionally, it was observed that a significant number of WAs and IAs contained spatial descriptions, and in particular the category event near me. Potentially, these may be formalised (e.g. Vasardani et al. 2013), to enable the refinement of influence regions for particular witnesses and potential witnesses.

The category of IA was created to include those micro-blogs which described how the micro-bloggers were directly impacted or what actions they were considering. Though these micro-blogs do not include direct observations, it can be inferred that they may be in areas where effects would be experienced. Additionally, for some IA categories, it might be inferred that the potential witnesses could be in closer proximity than if they reported a direct observation, an example being IAs of evacuation compared to WAs of smoke. Impacts were not distinguished from actions in this research, nor were intentions to undertake an action distinguished from actions already completed. This categorisation is not always apparent in micro-blogs, but a larger corpus might enable further exploration of whether possible and beneficial.

Additionally, such categorisation might be potential WAs and IAs are less likely to contain place names—47 % compared to NWIRAs with 69 %. This was not because they included fewer spatial descriptions, but because they included more personal place categories, with 23 % compared to less than 1 % for NWIRAs. This finding suggests that likely witnesses and potential witnesses use personal place categories instead of place names, at least for a bushfire event, or perhaps more generally for events that occur in people’s action spaces with which they have familiarity. Additionally, within the context of a social network, the environmental context may not need to be explicitly stated. An analysis of place name presence, in accounts not widely broadcast by the media or emergency services, revealed both IAs and WAs versus NWIRAs were comparable, with 27 % versus 21 % respectively. However, on initial observation, the NWIRAs appeared to contain more street names and landmarks compared to suburb names. Further investigation revealed that in general, though the textual content for these NWIRAs were original, the information they contained could be attributed to other sources. Whereas the place names in the WAs and IAs reflected unique perspectives of the event, with suburb names the preferred granularity level in an urban environment to report the effect of smoke.

Though some researchers have been identified to use, as this research did, manual analysis to determine the likely geographic location of micro-bloggers (e.g. Starbird et al. 2012), the majority rely on the metadata location (e.g. Kumar et al. 2013). 64 % of metadata locations were identified as being from a granularity of GPS coordinates to metropolitan Melbourne, which was deemed suitable for a corroboration exercise for this event. 60 WAs and 29 IAs without place names in their content were associated with corroborating metadata locations, and only seven were not. These were investigated further and five were found to be legitimate accounts, although the metadata location suggested otherwise, and two were confirmed as false positives. Dependent on the intended end use of the information sourced from social media, future work might seek redundant location data for corroboration at the granularity of the effect or impact, applying greater scrutiny. For example, a report of a road block would not be considered corroborated unless it could be grounded with a location at that granularity. For micro-bloggers who had posted multiple accounts, additional investigation was completed to ascertain the level individual accounts corroborated each other or not. The overarching outcome of this analysis was confirmation that it is legitimate to have micro-bloggers sending accounts categorised as WAs, IAs and NWIRAs. For example, a micro-blogger might post accounts indicating they see smoke is in the sky of a neighbouring suburb, then that they might need to evacuate, then good wishes and luck to Melbournians. As events can span beyond the vista space it is valid to find that a single micro-blogger can be a witness for some effects but not others.

Although formal analysis was not undertaken on the linguistic style of the text content of the micro-blogs, it was observed that WAs and IAs seemed more credible, more believable, if they were personal and informal in style, as defined by Verma et al. (2011). Accounts which were formal and objective were often more difficult to believe as to be unique and created by an individual, because the style is similar to that which emergency services and the mainstream news media would use. This is especially so when uncredited retweets, uncredited linked content, and accounts made by users who are micro-blogging about what they see on TV were identified. Contributing to the perception that the style of WAs and IAs is more informal might be the inclusion of personal place categories.

The accounts were categorised as to the primary category of interest, firstly a WA, then an IA or RA. It was observed in a small number of cases, WAs also made reference to evacuation, though it was more common that two effects are found in a single tweet. Only one account was identified to have three effects reported. With a 140 character limit for micro-blogs in Twitter, there is probably a limit to how much can be communicated. Considering this overlap might be useful from a corroboration perspective, because the influence regions of multiple effects, impacts or combinations in a single micro-blog could be tested to be consistent with each other or not. Emotion is an exception. It is observed that the WA and IA categories do appear to have an elevated level of emotion. Although ten IAs were distinguished as being IAs for this reason alone, it appeared that emotion was elevated for all accounts, though this was not the subject of formal analysis.

Conclusion and future work

From a range of disciplines, previous research has stated a relationship between a micro-blogger’s proximity to an event and the relevance of their contributions, and a relationship between witnesses and credibility. But due to location information sparsity for social media including Twitter, it is only possible for the smallest minority of micro-bloggers to establish their proximity to an event in absolute terms. Consequently, this research sought to establish if it is possible that likely witnesses could be differentiated by assessing the content of their micro-blogs. A defining model of WAs, and related IAs and RAs were established. The hypothesis was supported by the results of this research. Likely WAs could be differentiated and categorised by the effect the micro-blogger reported as their direct observation of the event. Direct observations of numerous effects were identified from the case study bushfire event with smoke dominating. Observations of traffic congestion, road closures and emergency response vehicles were also reported by witnesses and categorised. The witnesses often reported explicitly the sense used to make their observations, especially for observations of smoke. IAs could also be differentiated and categorised, with reports of having undertaken or intending to evacuate, and volunteer fire fighters travelling to and from the event identified. However, the dominate category observed was named event near me as this is what the potential witnesses reported, using a variety of qualitative spatial relations and personal place categories. Influence regions for the case study event type could be visualised based on geocoding content present in two sub-categories of WAs, smoke and traffic conditions. There were fewer place names in WAs and IAs for the case study event type, which was not because they had less spatial descriptions, but because of the increased presence of personal place categories. This may suggest that witnesses are reporting from their action spaces with which they have familiarity, and/or within the context of a social network, the environmental context may not need to be explicitly stated.

Many avenues for future research have been identified. Though this case study enabled initial models of witness categories to be defined and explored, other events in terms of a larger corpus size and differing event types is required for further model testing and refinement. Specific challenges identified for the automatic interpretation of witness and related accounts may be pursued. Particular challenges are the identification and interpretation of categories of effects and sensing in both explicit and implicit forms. Expansion of testing to different event types will also provide direction on when the use of place names and categories found in this research can be assumed for other event types. Additionally, it would be beneficial to expand analysis of witness characteristics from a single micro-blogger and micro-blog, to complete timelines, including off-topic content. A more formal exploration of the linguistic style of WAs, might provide important contributions for research focused on identifying relevant or actionable content for situational awareness. Finally, exploration on whether spatial descriptions—which are present in a significant proportion of WAs and IAs—can be formalised, may generate significant spatial intelligence in addition to the refinement of influence regions.