Cities as Cyberplaces for Social Capital

Cities are the heart of a complex body called ‘the space economy’, while interconnected (infrastructural, knowledge, social, trade or business) networks make up the blood circulation of this body. With the drastic conversion of the geography of our world into an ‘urban planet’ – coined the ‘New Urban World’ by Kourtit (2014a, b) – the urban economy has become a prominent feature of the 21st century. The fate of our economies seems to rest in particular in the hands of large cities, as these cities outperform all other human settlement patterns in terms of productivity and wealth creation (Glaeser 2012). Such cities have turned into complex and efficient intra-urban networks, while also being connected to the outside – sometimes global – world through multi-layer inter-urban networks (Neal 2013). The ‘New Urban World’ has thus become a complex evolutionary organism.

The complexity of the emerging global urban system is further rising as a result of the structural increase in the world population in the decades to come. Cities will become the necessary havens to accommodate the rising volume of people. The abovementioned long-term megatrend in population movement towards the city will intensify large-scale urbanization; cities will mirror both the natural population increase and large migration movements in various parts of the world. Consequently, the emerging urban challenges may become the most complex and critical factors for sustainable development of our world in the future. Despite (sometimes) negative perceptions of cities, it should be noted that in general cities are meant to be the ‘home of man’. Clearly, they will have to meet strict sustainability conditions in a dynamic – often global – environment. The urban economics literature has rightly argued that cities are able to create many positive external benefits, so that from an economic perspective there will be a structural tendency for an increasing influx of people into urban areas. Socio-demographic changes (e.g., ageing), migration and mobility, entrepreneurial dynamics, sustainability and efficiency of transport and energy systems, information and communication technology (ICT) (and other advanced technologies) and increasing returns to scale in urban agglomerations are the driving forces behind the persistent rise of urban settlement patterns in our modern society (Nijkamp and Kourtit 2011; Arribas-Bel et al. 2013, p.249; Kourtit and Nijkamp 2013, p. 172–173).

Cities are clearly no longer ‘isolated islands’, and have to be competitive to acquire a strong position in an open world (Kourtit et al. 2011). This new trend has prompted the popularity of the concept of a ‘smart city’, a city that is able to exploit its indigenous assets (knowledge, technology, entrepreneurship, accessibility, sustainability, culture, etc.) to gain a relatively strong (socio-)economic position in an open spatial system (Caragliu et al. 2011; Kourtit et al. 2013). An important vehicle for smart city initiatives is formed in the design, access and use of appropriate ICT technologies and facilities on a fit-for-purpose basis. This awareness has led to the development of a varying nomenclature for ICT-instigated smart cities, such as: computer city, plug-in city, cyber city, fiber city, software city, wiki city, wired city, digital city (see also Grosveld 2002; Steenbruggen et al. 2014).

The urban system is indeed a multi-faceted dynamic organism with many characteristics of a complex economic, social, political and cultural nature. Cities are evolving systems driven by a multiplicity of actors and stakeholders. It goes without saying that cities have many ‘faces’; they are not uniform or identical. The management and governance of these modern, complex and ever-rising urban agglomerations calls for effective information strategies and focused decision-making tools (Kourtit 2014a, p.183, b, c). There is a general awareness that traditional urban planning tools (models, stakeholder analysis, and consultation methods) are no longer able to cope with the multiplicity of challenges faced by cities in an open world, driven by digital technologies.

It is increasingly recognized that advanced ICT use in an urban setting may drastically change the organization of smart cities. A review of the various opportunities created by ICT services in the context of urban social capital can be found in Cohen-Blankshtain and Nijkamp (2013). It is clear that ICT adds a new dimension to social capital in the city, as introduced by Bourdieu (1986) amongst others. ICT offers a low-cost, high-frequency and user-friendly addition to local social capital, for instance, through digital opportunities such as Facebook, Twitter or text messages. This, combined with recent advances in the fields of statistical and machine learning, opens an entirely new range of possibilities in terms of sensing and measuring many of the urban phenomena that Jane Jacobs described in her seminal book “The Death and Life of Great American Cities” (Jacobs 1969) and that are at the heart of understanding cities. The use of ICT reinforces the impact and position of smart cities, not only from an efficiency perpective, but also from a social cohesion perspective (cf. Sadler 2005; Lewis 2008; Ratcliffe and Newman 2011). Social media are one of the most recent and rapidly growing phenomena (Facebook was launched in 2004, Panoramio in 2005, Twitter in 2006, FourSquare in 2009, and Instagram in 2010), and are essential in the generation of bottom-up information. For example, the concept of Web 2.0 and the related new information services, such as social media, have proved to be a powerful tool in directing the mindsets of the tourists, and creating new opportunities for a variety of tourism services and consumer product industries. Thereby, tourists can be seen as co-creators of innovation in mood analysis (e.g. Frank et al. 2013; Mitchell et al. 2013) and language diversity (Mocanu et al. 2013). Social media are likely to change the functioning of urban spaces, as a permanent flow of interactive information will change the behaviour of citizens and visitors into dynamic urban consumerism (Pentland 2009). Modern cities tend to be increasingly controlled and influenced by – top-down and bottom-up organized – complex data platforms. Access to and smart use of such data platforms will most likely become critical success factors for our urban future.

The trend towards new geo-science applications (e.g., geo-design, geo-imaging) ties in with the current discussion on intelligent and smart cities, in which the use of digital technology in cyberspace (through the Internet) lays the foundation for modern urban analysis and planning. Essentially, a smart city emerges when urban intelligence is added to information. This information is created by sensors of several forms and stored in ‘electronic warehouses’ (e.g. remote servers, data centres, etc.) and offers a new map of the structure and interactions in urban life. This new approach is often also coined internet geography or cyber geography (Batty 2012; Leamer and Storper 2001; Malecki 2002; Storper and Venables 2004; Tranos 2013). This information incorporated in these – often real-time – databases is increasingly used in social media and location-based services. For example, Foursquare and Facebook are able to inform others on the spatial imprint of daily urban activity behaviour of residents or visitors, and opens new ways for digital geography and even psycho-geography (in relation to spatial mental maps). In the public domain this type of information can also be used for geocentric incident management and urban or regional crowd management (see also Boyd and Crawford 2012).

As argued by Nijkamp and Kourtit (2012), density, proximity, accessibility and connectivity are the cornerstones of a modern, ICT-connected city. Modern ICT-led cities are able to meet a series of strategic objectives (cf. Hollands 2008):

  • benefits from networked infrastructure in terms of enhanced economic productivity, a rise in administrative efficiency, and increased opportunities for social, cultural and urban development;

  • exploration of new opportunities offered by a business-led urban development;

  • utilization of urban social capital through the inclusion of urban residents, the business sector and visitors in public services;

  • strategic reliance on high-tech potentials of the city (including creative industries) for long-run urban development;

  • creation of a new socio-economic potential through the development of social connectivity and trust principles and initiatives in the city; and

  • development of new strategic initiatives and plans for social and ecological sustainability as a solid and promising bases for a smart city.

This paper presents a novel spatial analysis of one of these new sources of urban data from digital feeds. In particular, it analyses a set of Twitter messages geo-referenced within the municipality of Amsterdam. The contribution is two-fold: at a specific level, it offers insight into the spatial structure, dynamics and the distribution of population in a global city, adopting a distinct (spatial-)data-driven approach; at a more conceptual level, it presents a concise application, almost a pilot study, of how elements of the smart-city discourse can be readily applied to better understand the functioning of an urban area. This is realized by combining not only different data sources of recent appearance, but also techniques from the field of statistical and machine learning in an exemplification of how urban studies can benefit from this seemingly foreign domains of expertise. This is by no means the first attempt at exploiting this kind of data to further urban understanding. Previous efforts in this same direction include Ferrari et al. (2011), Kling and Pozdnoukhov (2012), Ozdikis et al. (2013), Del Bimbo et al. (2014) or Lovelace et al. (2014).

This paper is organized as follows. After this introductory section, we will highlight in he role of modern cities as social network machines producing ‘tons’ of socio-spatial interaction information, including tweets. The subsequent section is devoted to a description of the Twitter database used to study spatially detailed information patterns in the city of Amsterdam. Thereafter, a series of statistical and visual findings will be presented based on the Twitter data analysis for Amsterdam. And the final section will then draw some policy lessons on the use of ‘big’ digital data for smart city management.

Cities as Social Network Machines

As argued in the introductory section, we live in the ‘New Urban World’, which is a new epoch in the history of mankind (Kourtit 2014a, b, c). The ‘New Urban World’ refers to a new global geography characterized by a persistent rise in the share of a nation’s population that lives in urban agglomerations, be it in a geographically concentrated form (e.g. cities) or a deconcentrated but functionally connected form (e.g. metropolitan areas, poly-nuclear spatial patterns, sprawl areas) (Kourtit 2014a, b, c). The ‘New Urban World’ does not exhibit a uniform and stable settlement pattern, but rather a spiky and dynamic geographic landscape (Nijkamp and Kourtit 2011; Kourtit and Nijkamp 2013, p. 180). This megatrend offers various great challenges and opportunities for urban development, but puts, at the same time, enormous pressure on our urban areas by inducing also negative externalities, such as pollution, congestion, security issues and social degradation. In addition, cities – especially in Europe – have an abundance of historical-cultural heritage that needs to be managed (Kourtit 2014a, b, c). A smart city has to integrate various dimensions that shape the future of the city through proactive intelligent data management. Such a city has to face the challenging task to govern a complex and open spatial system as a set of (internally and externally) connected intelligent subsystems (see also Komninos 2002).

A prominent feature of all cities – as dynamic and interconnected living ecosystems – is that they house human beings, be they residents or visitors. Their social behaviour forms altogether a fuzzy set of social interactions, sometimes weak (or even negligible) and sometimes strong (positive or negative). The type of interactions depends on various functions ranging from joint consumption (e.g., visiting a theatre or a football match) or joint production (e.g., working together in an office) to strong social interdependence (e.g., a protest demonstration or a meeting in a bar). Thus, social communication is not one-dimensional, but multifunctional and takes place in different layers and functional hierarchies. Modern high-tech cities tend to become multi-tasking smart engines of social and cognitive proximity (see also Boschma 2005).

Social networks – and in a more general sense – social capital (see e.g. Bourdieu 1986) – have received a great deal of attention in social science research over the past decades. Social networks are functionally organized ways of social – often interpersonal – interaction (see Hanneman and Riddle 2005). It should be added that social relational structures have also a spatial constellation, the social space which refers to the geographical constellation that shapes or facilitates interactions as interdependences among individuals or social groups in a given geographical environment. Thus, geographical referencing is the dual side of social interactions (see also Abbott 1997; Bottero and Crossley 2011; Pattison and Robins 2004).

Social network analysis does not refer to a ‘wonderland of no spatial dimensions’, but is tied to concrete and systematic geographical pattern analysis, including the presence of nodes and ties (edges or links), as well as neighbourhoods. Thus, social networks represent interrelational and functionally determined systems mapping out the behaviour of social actors.

An intriguing question is now whether a similarity among social actors (e.g., gender, age, education, social class, profession, language, religion) favours socially interactive behaviour and, in general, social communication. The reverse question is whether social distance (or socio-psychological heterogeneity) discourages interaction among people (Fleming and Petty 2000; Hipp and Perrin 2009; Massey 1981). Clearly, interaction may refer to the intensity or frequency of communication (‘strong and weak ties’) or to the nature and contents of communication among individuals (‘functional ties’). In both cases, geographical distance may play an important role and even induce hierarchical interaction patterns (Zipf 1949; Mouw and Entwisle 2006; Verdery et al. 2012; Modica et al. 2013). This observation is in agreement with the first law of Waldo Tobler (1970, p. 234) in geography: “everything is related to everything else, but near things are more related than distant things” (see also Nijkamp 2013). The role of geography is also prominently present in the ‘death of distance’ hypothesis, in which the assumption is advocated that in a digital world geography hardly matters (see for a critical review Tranos and Nijkamp 2013).

In recent years, some researchers have pointed at an interesting paradoxical development: when both physical and virtual mobility increases (e.g., thanks to the internet), the sense of neighbourhoods and places we know is intensified (see e.g. Ioannides and Zabel 2008; Krysan and Bader 2009; Blacksher and Lovasi 2011), most likely as a response to the need for self-identity in an open global world. From this perspective, digital information may be helpful in the search for social spaces where self-identity (including friendship, social entertainment, etc.) can be realized (see, for example, Eagle et al. 2009; Raento et al. 2009).

As mentioned above, our paper will address the issue of how ICT – and more specifically, digital Twitter information shared by group members (where groups are defined as social entities that have access to the same type of information) – will affect group behaviour in space and time. This calls for the analysis of large digital data systems on social network behavior, taking into consideration the fact that cities turn into smart spatial data and information engines that map out on-line or real-time spatial interactions in physical and digital urban space. As a consequence, digital spatiality tends to shape the future morphology and interaction patterns in urban agglomerations. In a smart city environment, open digital information sources (e.g., intelligent location-based services) allow for a reliable identification, understanding, contextualization and timely response to events, especially when there is a common willingness of all urban actors concerned to offer intelligent solutions for vital and sustainable cities (see also Abdoullaev 2011).

Smart cities are thus in general characterized by an intensive use of digital technology, including social media. Not only the socio-economic acceptance pattern of social media, but also the spatial distribution of the use of social media in smart cities is an important research challenge. In the present study we will focus on the analysis of Twitter data in the city of Amsterdam. Twitter data are popular vehicles for transmitting short messages to a wide audience, are easy to use and to receive, and have a high penetration rate in many countries and cities.

In the current research project, we tackle the classification of different areas of a city based on their function and spatio-temporal patterns of activity In this way, Twitter data can be used to depict the social composition and diversity of tweets in a geographically referenced smart city, e.g. in terms of language, user categories, user frequency of tweets, etc. In the next section, we will describe the database used for our investigation.

The Digital Database for Amsterdam

Much data from information machines comes from social interaction and social network use. A social network is an organized social structure made up of individuals (actors or organizations) called ‘nodes’, which are tied (connected) by one or more specific types of bonds or interdependency, such as friendship or a common interest.Footnote 1 Social media create new communication channels to collect, generate, share, circulate and exploit information, as well to generate a different type of value-added information (personal comments and insights).

Social networks are relatively recent phenomena: Facebook was launched in 2004, Twitter in 2006, FourSquare in 2009. The diffusion has been so rapid and viral that the number of users of some of these websites is larger than most nations in the world. With the advent of social media, the entire logic of communication has changed. While the institutional channels still remain in place, a series of autonomous and unstructured channels has emerged. Usually, the first source of information of a specific event is formed by ‘eyewitnesses’, who share images or texts almost immediately through websites and social networks. This process can be much faster than any other form of communication. This information is a prominent source for both the public and the media. The latter implements ways to capture and organize bottom-up sources and comments to create reports, as well as to ensure a real-time information feed. For example, twitter messages for the emergency services embody three vital emergency functions: assessment (What is happening?), coordination (What is needed?), and response (What must be done?). This provides an additional source to support situational awareness for emergency organizations. The large amount of heterogeneous data pumped into the information ‘pot’ from the media, the public and government sources leads also to questions regarding its accuracy, reliability, and privacy (Beinat et al. 2011, p.63). In our study on smart cities we address in particular the potential of Twitter.

Twitter is a popular online social networking and micro-blogging service that enables its users to send, read and share text-based messages of up to 140 characters, known as ‘tweets’ (see Beinat et al. 2011). The structure of a Twitter message usually includes:

  • User information

    • Name and nickname

    • User ID

    • Time zone

    • Number of followers

    • Number of following

    • Account information (e.g. language and home location)

  • Message text

    • Message ID

    • Message text

    • Links to external sources, hashtags (#) and references to other user (@)

    • Source (service through which the tweet is generated)

    • Status (e.g. retweet)

  • Message metadata

    • Time stamp

    • Location (if applicable)

Through this structure, a tweet can contain the following elements of information:

  • users’ comments or reports concerning current events, which they consider important or interesting;

  • direct reference to the particular topic or group of posts by a hashtag “#” e.g. #eurocup. The creation of hashtags is community-driven;

  • reference to the particular user with the ‘@’ sign followed by a username, e.g. @Rijkwaterstaat; and

  • links to external websites or pictures using a hyperlink.

Twitter offers four different types of Application Programming Interfaces (APIs): (i) Twitter for websites with ready-to-use website components, e.g. ‘tweet’ and ‘follow’ buttons; (ii) REST API for handling posting messages to Twitter; (iii) Search API with the possibility of querying back in time up to one week, with a limited number of queries types (access via HTTP GET); and (iv) Streaming API which is the most suitable for downloading big amounts of data, and recommended for data mining (access via HTTP POST). A twitter database contains basically two options: messages with and without geo-location. For our case study in Amsterdam, we have used three months of data (October 2012–December 2012) which contained about 200,000 geo-located tweets of Greater Amsterdam. Messages are filtered based on their location: only those with attached geographic coordinates, located within the Greater Amsterdam area. The hourly distribution of tweets over an entire day can be seen in Fig. 1. The dashed line represents the average day, while coloured straight lines show the aggregate for every day of the week.

Fig. 1
figure 1

Tweet volume aggregated by hour of the day (0h-24h) by day of the week in Amsterdam

The main pattern is clear and intuitive: early hours in the morning show low activity. Activity starts picking up around 6am, and continues increasing until the mid-morning when it reaches the day peak; the afternoon displays a small valley in the volume of activity that is overcome towards the evening when work hours are over. Besides this general trend, there is also an interesting duality between weekdays and weekends. The time when the morning activity begins to pick up is clearly delayed for Saturday and, particularly, for Sunday, while it is very homogeneous and earlier for the rest of the week. The highest day peak is reached on Saturdays around 11am. Interestingly, Sundays show the largest drop between the morning peak and the evening trends, although it is just second to Saturday in terms of morning activity; tweets on Sunday appear to decrease to the lowest level of the week in the evening hours. In the next section we outline an analysis of Twitter data representing smart functions for the city of Amsterdam.

Visualization and Analysis of Social Buzz in a Digital City – Twitter Feeds in Amsterdam

In order to identify the structure and to explore regularities in the Twitter activity of Amsterdam, we aggregate tweets by hour and to a postal code 4 level. This allows us to tackle the temporal and spatial dimensions at a level at which we do not lose much detail but also at which we can still count with enough data to see variation (using lower levels of detail yields many ‘empty bins’, points in time at which activity is zero in a particular area). The spatial distribution of the aggregate volume is shown in Fig. 2.

Fig. 2
figure 2

Tweet volume aggregated to a postal code 4 level, Amsterdam

The map in Fig. 2 contains 81 polygons distributed in the region of the Amsterdam municipality. The main challenge is to extract patterns out of the time trajectories of each postal code to investigate the space-time distribution of tweets. In a literal sense, this implies obtaining lines or trajectories, such as those in Fig. 2 for every single polygon. This is shown in Fig. 3, where all the 81 trajectories are plotted together.

Fig. 3
figure 3

Hourly tweet volume by postal code 4 level

The method we propose is to use a clustering algorithm to group similar trajectories of postal codes by their shape. Essentially, what we are aiming for is to detect the main structures behind all of the trajectories shown in Fig. 3 and tag the polygons behind them that display a similar trajectory with the same label. This is a particular case of a standard clustering procedure, in which the attribute values on which the clustering is performed are the volume of tweets for each hour. In order to avoid level ‘distortions’ and to focus really on the shape of the trajectory, we use standardized trajectories for clustering. This means that, instead of using raw volumes of tweets per hour, we calculate the percentage that each hour represents, as a share of the total volume of that postal code. Clustering of time series is a well-known problem in pattern recognition. Following the taxonomy established in Warren Liao (2005), we use an approach that employs a general-purpose clustering algorithm on the raw data (as opposed to on features extracted from the data or on model outputs). Robustness of the approach is tested by initially using two very distinct techniques to cluster the trajectories: the widely-known K-means algorithm, and the somewhat more advanced self-organizing map (SOM) (Kohonen 2001). Although their function in this context is the same (to group observations based on attribute similarity), the underlying mechanics of both algorithms differ substantially: while K-means tries to optimize an objective function that minimizes the cluster variance, the SOM employs an iterative approach in which a neural network learns the properties of the training dataset to later assign the original observations to output neurons.

After experimenting with different numbers of solutions and alternative parameters in both algorithms, we find that the outputs look very similar.Footnote 2 The number of postal codes within each solution is very close to each other (i.e. polygons are assigned virtually the same labels by both algorithms), and so is the overall shape of the trajectories in each cluster. The robustness of these results is particularly reaffirming, given the very different paradigm that each algorithm uses. Given this similarities, we will use results only from the K-means algorithm in the remaining part of the analysis, since it is an arguably more popular technique.

Figure 4 shows the output from a solution with five clusters, which we find particularly interesting. After experimenting with higher and lower number of clusters in both the SOM and K-means, we decide to use five for the rest of our study. Additional clusters do not seem to add particularly different cases or new trajectory shapes, while smaller solutions remove some distinctive groups which we find interesting to focus on.

Fig. 4
figure 4

Postal code 4 trajectories by cluster

There are three main clusters containing the vast majority of areas (≈93% of the sample) and two clusters with outliers. Almost half of the observations are labelled in a group with, roughly, two peaks and two valleys (#1, in lime colour). Hours of low volume in this case are late at night (1am to 6am) and, albeit with more activity, the afternoon (around 3pm). Rush hour in this group is divided between mid-morning, with the day peak around 10–11am, and the evening where it picks up from the afternoon. The second largest cluster (#0, in red) displays a similar profile, but the morning peak is not as pronounced and more activity seems to occur in the evening. The third most relevant cluster (#3, in blue) has a markedly different profile: it is characterized by only one peak that occurs towards the end of the morning and, after that, a decay in the intensity of tweets spans throughout the entire day, unlike in the previous two cases, without any second comeback in the evening.

The remaining two clusters only contain a handful of areas and thus capture very outlying trajectories. The one with more than one area (#2, in green) is characterized by a significant portion of its activity concentrated in the morning peak and a low profile for the rest of the day, while cluster #4, in purple, with only one area (a peripheral postal code with very little volume of tweets) has a profile that appears random throughout the day, but shoots up at the very end of the day (after 9 pm), indicating a spree of activity towards the end of the day.

Combining the temporal dimension with its spatial distribution, Fig. 5 shows the geographic location of the clusters and a time plot of the average trajectory in each of them.

Fig. 5
figure 5

Space-time distribution of clusters

The colour scheme is consistent across views, so that a visual comparison can be quickly established. The lower panel of the figure displays the average volume of tweets by hour for the postal codes in each cluster. It is important to reinforce that, although the graph is based on the raw volume, the clustering was performed, as explained before, based on standardized changes. Clustering on standardized values allows us to group by similar shapes, ignoring the level effect, while representing the output of the clustering with actual volumes allows us to get a sense not only of the shape of each group of areas, but also of the overall level of the cluster in the general picture. From a functional standpoint, the clusters can be interpreted as representations of very different types of areas in a city.

Cluster 1 (Lime)

Mixed uses in the case of the one in lime are highly frequented areas from which people tweet throughout the day and, particularly, during the morning, when the day peak is. Still, the volume of tweets sent from these areas remains high well into the evening. This is typically associated with areas where a range of urban uses, such as residential, commercial or office space, are intertwined and people are attracted, and hence end up tweeting, not because of only one reason but because of several of them (e.g. residence, job, amenities). Perhaps not surprisingly, most of these areas are located in the center of the city as well as in more peripheral neighbourhoods which, as it is well known for the Netherlands, maintain a high degree of land-use mix.

Cluster 0 (Red)

Residential areas are classified in red show a similar trajectory, with a high total number of tweets and two marked peaks in the day, be it that in this case, the most intense time of the day is concentrated towards the evening. This may be indicative of areas which people particularly frequent once they have left work, either because of these are their places of residence or because they offer leisure activities such as bars and cafes, as is the case of the touristic. Leidseplein in Amsterdam, the most central area in red, which is full of tourist amenities and remains lively well into the late evening. Red polygons in the West and North of the city are classical examples of residential areas and also fit this profile of Twitter activity, particularly high at the end of the day, albeit that in this case the tweets originate most likely from homes.

Cluster 3 (Blue)

Work space, i.e. parts of the city with a high concentration of employment and a limited degree of mixed use, are captured in the blue cluster. This is reflected in the evolution of Twitter activity over the day. In effect, these areas are characterized by only one peak in the morning that starts a bit later than in the lime and red places, and a slow but constant decay from there throughout the rest of the day. This is typical of work places, where people are attracted only for one reason and which fulfil only the work function. This lack of range of activities concentrates activity in only one part of the day, and leaves the rest of it with a lower number of tweets. Spatially, they are located in the periphery of the city and, to an eye familiar to the geography of Amsterdam, blue areas map out nicely the largest employment centers of the city, e.g. the docks and the World Trade Center, among others.

Clusters 2 (Green) and 4 (Purple)

These are outliers referring to five areas classified into green and one into purple. Green areas have a marked peak very early in the morning, a sudden drop and a bit of activity picking up towards the end of the day. These are also not the most popular places: on average, the volume of activity they attract is not small, as compared to the previous groups. This profile could indicate remote residential areas where people can only tweet at the beginning and the end of the day because they are somewhere else (working) during the rest of the day. The only area categorized as purple is located in the north-east part of the city and comprises mostly countryside, very sparsely populated. With a very low number of tweets, even a small peak in one hour (as it is the case at the end of the day, where there are ten tweets sent) can distort the entire profile of the area and this change might be picked up by the clustering algorithm as something very distinct. This case is a good example of the relevance of performing data quality checks throughout the entire analysis and to always be cautious when interpreting results from this type of approaches with data from social media.

Finally, in order to verify some of our hypotheses about the function and use of the areas delineated by the five clusters, Fig. 6 shows the respective shares in each cluster for residential, office, industrial and commercial space.Footnote 3 Very much according to our previous description, clusters 0 and 1 (red and lime) have a high presence of residence and commercial space, and a medium level of office activities. As mentioned, areas in lime align more with a highly mixed use of space and, hence, the proportion of offices is also higher. Industrial use is very low in both clusters. Established as work destination by the Twitter trajectories, blue areas are markedly defined by their industrial and office space, medium levels of commercial and low shares of residential use. Green areas contain a mix of residence and industrial uses, with very low space devoted to other functions. Conversely, the outlier in purple contains a high share of residential and industrial use and virtually no space devoted to office or commercial.

Fig. 6
figure 6

Average proportion of land use by cluster

Policy Lessons for Smart Cities

This paper has focussed the attention on the use of digital information – in this case, tweets – in the city of Amsterdam. As argued before, smart cities are characterized by an intelligent use of e-data, and Twitter is one of these information technologies. Its rapid transmission potential and wide acceptance makes Twitter an appropriate social capital tool in a smart city.

Such social media are still in their infancy, but they tend to have increasingly a radical impact on the social fabric of the city. First, they have a drastic impact on communication behaviour, especially because of their fast and pervasive nature, so that communication patterns in urban areas are changing substantially. Next, even though cyber space seems to act as a disjoint dimension of physical space, in reality there is a close connection between these two spaces. Patterns of tweets may resemble and mirror land use, diversity and proximity patterns in cities.

Consequently, urban policy should not regard social media use as a communication technology per se, that is disconnected from real urban space. Instead, current land-use patterns and uses (e.g., entertainment centers) prompt the use of social media as a modern communication tool, whereas on the other hand social media may exert a great impact on mobility, business and leisure patterns of urban residents and visitors. It seems thus plausible that the role of digital technology in urban dynamics and morphology will increase substantially in the future. Complementarily, regardless of direct effects, this technology has on the actual patterns of behaviour, these new data sources offer an opportunity to more closely monitor, study and, ultimately, understand the idiosyncrasies and needs of cities. In this direction, the present study offers a window to look into more ambitious possibilities. As an example, it does not take much invention to envision a real-time system based on the analysis displayed here but that is constantly updated, offering the urban planner and policy maker a close connection to the city.

In conclusion, digital information has in the past years become a source of novel research. The hundreds of millions of tweets sent out on a daily basis offers an unprecedented research potential with great policy implications, not only at a societal level but also at a local community level. These information sources are also extremely interesting for analysing cultural diversity in cities on the basis of the language of tweets. The density and time pattern of tweets offers also important strategic information on the human geography (consumers and producers) in modern cities. For geoscience applications in urban planning (e.g., crowd management, spatial segregation trends, socio-spatial profile analysis, traffic management, etc.), the use of digital information including cell phone data, tweets, GPS information and other micro-based digital sources may open new avenues for smart cities.