Describing Geographical Characteristics with Social Images

Zheng, Huangjie; Yao, Jiangchao; Zhang, Ya

doi:10.1007/978-3-319-51811-4_10

Huangjie Zheng^18,19,
Jiangchao Yao¹⁹ &
Ya Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

International Conference on Multimedia Modeling

3331 Accesses

Abstract

Images play important roles in providing comprehensive understanding of our physical world. When thinking of a tourist city, one can immediately imagine pictures of its famous attractions. With the boom of social images, we attempt to explore the possibility of describing geographical characteristics of different regions. We here propose a Geographical Latent Attribute Model (GLAM) to mine regional characteristics from social images, which is expected to provide a comprehensive view of the regions. The model assumes that a geographical region consists of different “attributes” (e.g., infrastructures, attractions, events and activities) and “attributes” are interpreted by different image “clusters”. Both “attributes” and image “clusters” are modeled as latent variables. The experimental analysis on a collection of 2.5M Flickr photos regarding Chinese provinces and cities has shown that the proposed model is promising in describing regional characteristics. Moreover, we demonstrate the usefulness of the proposed model for place recommendation.

Y. Zhang—The work is partially supported by the High Technology Research and Development Program of China 2015AA015801, NSFC 61521062, STCSM 12DZ2272600.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Regional Exploration and Recommendation System Based on Georeferenced Images

Sustainable Tourism: Crowdsourced Data for Natural Scene and Tag Mining

Event Identification from Georeferenced Images

Keywords

1 Introduction

Geotagged images are pervasive, and they also provide an intuitive and objective view of our life. Thanks to these properties, images can easily reflect personal, regional, even social characteristics, and plenty of research works have been conducted with social images to facilitate people’s life. Geographical analysis from social media has been widely investigated in the recent years. While most of existing studies focus their analysis on landmarks with the assumption that they are representative to regions [1,2,3,4], other perspectives such as local festivals and events could also be essential for profiling a region. We thus study the problem of forming comprehensive description of geographical characteristics from social media. With the description of geographical characteristics in one specific region, we could better recognize this region and boost a number of utilities such as tourist advertising, etc.

While some existing applications such as tourist recommendation and location retrieval could also extend to this problem [5,6,7,8], they mainly rely on the textual information, e.g., social tags. To our best knowledge, geotagged photos help understand intuitively a specific region and it can boost plenty of applications in several domains. For example, it is interesting that systems could generate a recommendation based on its understanding of images, which leads us free from taking effort to find a proper word for the description of the region. Therefore, since the goal is to understand a region from images, the challenge lies in how to map low level visual features to semantic characteristics.

In this paper, we propose a Geographical Latent Attribute Model (GLAM) to learn geographical characteristics from photo collections. We assume that each region consists of some latent “attributes” (considered as characteristics) and each “attribute” consists of image “clusters”. The motivation of our model is illustrated in Fig. 1 using Beijing as an example. A city may be described by several aspects (e.g., historical buildings), and each aspect includes different image clusters (e.g., antiques, temples, sculptures). These clusters are summarized from images taken in Beijing. Following the idea of the generative model, we introduce corresponding latent variables to formalize this procedure. By learning the latent parameters, a comprehensive view about geographical regions is formed.

The major contributions of this paper could be summarized as follows:

We propose a Geographical Latent Attribute Model (GLAM) to learn geographical characteristics from photo collections without utilizing any textual information.
We validate the proposed model with 2.5M Flickr photos taken in China to demonstrate its effectiveness in both qualitative and quantitative ways.
As one of the potential applications, a region recommendation strategy is proposed based on the similarity between region’s characteristics and user’s interest according to his/her photo album.

The rest of paper is organized as follows: In Sect. 2, we review the related work. Section 3 explains our model and its inference technique. The experiment results will be displayed in Sect. 4, and we conclude our paper in Sect. 5.

2 Related Work

Plenty of works have been conducted in geographical analysis. Ji, et al. [2] propose a hierarchical structure to mine city landmarks from view, scene and city layers. [9] analyzes the attribute at region level for region exploration and [10] handles the urban understanding with CNN. Livia Hollenstein and Ross S. Purves [11, 12] focus on social media to find out how people generate their understanding for a city. Similarly, [1] extract the tags representing landmarks to better present and extract view of one region. In [3, 4], the authors find the popular landmarks using mean shift.

This work is also related to several applications such as location retrieval, tourist recommendation, etc. [5] shows the same viewpoint that users are more interested in a geographic area than the precise GPS coordinate. Our work thus pay more effort into recommending users with a proper geographic area rather than location estimation with exact geographic coordinates. [6, 7] give personalized tourist recommendation based on users’ interest and their similarity, while our work focus more on the similarity between user’s interest and geographic characteristics.

3 Model

3.1 Geographical Latent Attribute Model

The plate notation of GLAM is illustrated in Fig. 2. Assuming that we have M regions and each region has $N_m$ images, we target to learn the regional attribute distributions $\{\theta _m\}_{m=1,...,M}$ from these images. We first use GoogLeNet to extract one D dimensional feature vector $v_{mn}$ for each image. Then our problem could be formalized to learn $\{\theta _m\}_{m=1,...,M}$ from the feature collection $\{v_{11},...,v_{MN_M}\}$.

We transform this problem into a generative procedure and consider that each region has a distribution over characteristics and each characteristic has a distribution over clusters which are modeled by a series of Gaussian mixtures. Both “characteristic” and “cluster” are introduced as latent variables in this hierarchical structure and could be inferred by the observed variables $\{v_{11},...,v_{MN_M}\}$. The generative procedure is summarized as follows:

Choose regional characteristic proportion $\theta _{m} \sim Dir(\alpha )$.
Choose the characteristic of one image $i_{mn} \sim Multinomial(\theta _{m})$.
Choose the cluster $z_{mn} \sim Multinomial(\phi _{i_{mn}})$, where $ i_{mn} \in \{ 1,2,...,K \}$.
Choose each visual vector , where $z_{mn} \in \{1,2,...,K^{\prime }\}$.

In our model, $\{(\mu _{k^{\prime }},\sigma _{k^{\prime }})\}_{k^{\prime }=1,...,K^{\prime }}$ constitute the visual space and $\{\varPhi _k\}_{k=1,...,K}$ are used to capture the characteristic-cluster distributions. Latent variables $z_{mn}$ and $i_{mn}$ are decided by $v_{mn}$ and reversely affect the regional characteristic distribution $\theta _m$. In short, we use a topic model structure to learn the high level concepts at the top layer and facilitate Gaussian mixture model to cluster low level visual features at the bottom layer.

3.2 Inference and Learning

In this part, we present our inference algorithm. The key inferential problem of our model is to compute the posterior distribution of latent variables given data as Eq. 1.

$$\begin{aligned} p(\theta ,i,z|\alpha ,\phi ,\mu ,\sigma ,v) = \frac{p(\theta ,i,z,v|\alpha ,\phi ,\mu ,\sigma )}{p(v|\alpha ,\phi ,\mu ,\sigma )} \end{aligned}$$

(1)

Above equation is intractable due to the non-integrable denominator and an alternative method, e.g., Gibbs sampling or variational approximation [13], could be employed. In this paper, we adopt a mean field variational bayes method [14] (variational EM) to deal with our model. Following its methodology, we assume that the variational distribution is defined as

$$\begin{aligned} q(\theta ,i,z) = q(\theta |\gamma )q(i|\psi )q(z|\varPhi ), \end{aligned}$$

(2)

where $\gamma $ is the Dirichlet parameter and $\psi $, $\varPhi $ are the multinomial parameters. With this specification, the latent variables could be approximated by minimizing the Kullback-Leibler (KL) divergence between Eqs. 1 and 2.

$$\begin{aligned} {\arg \min }_{(\gamma ,\psi ,\varPhi )}D(q(\theta ,\psi ,\varPhi )|p(\theta ,\psi ,\varPhi )) \end{aligned}$$

(3)

By setting the derivative of free parameters $\gamma $, $\psi $, $\varPhi $ in Eq. 3 to zero, we obtain the following equations.

$$\begin{aligned} \varPhi _{mnk^{\prime }} \propto \exp (\sum _{k}\psi _{ijk}\log {\varPhi _{kk^{\prime }}})\mathcal {N}(v_{ij}|\mu _{k^{\prime }},\sigma _{k^{\prime }}) \end{aligned}$$

(4)

$$\begin{aligned} \psi _{ijk} \propto \exp (\varPsi (\gamma _{ik}))\exp (\sum _{k^{\prime }}\varPhi _{ijk^{\prime }}\log \phi _{kk_{\prime }}) \end{aligned}$$

(5)

$$\begin{aligned} \gamma _{ik} = \alpha _{k} + \sum _{j}\psi _{ijk} \end{aligned}$$

(6)

The most frequent approach to estimate the model parameters is maximizing the likelihood of observed variables, i.e., $p(v|\alpha ,\phi ,\mu ,\sigma )$. Although there is no analytical integral for this likelihood, Jensen’s inequality could be used to get an adjustable lower bound.

$$\begin{aligned}&\ln p(v|\alpha ,\phi ,\mu ,\sigma )) \nonumber \\&= \ln \int \limits _\theta {\sum \limits _{i,z} {p(v,\theta ,i,z|\alpha ,\phi ,\mu ,\sigma )d\theta } } \nonumber \\&= \ln \int \limits _\theta {\sum \limits _{i,z} {\frac{{p(v,\theta ,i,z|\alpha ,\phi ,\mu ,\sigma )q(\theta ,i,z)}}{{q(\theta ,i,z)}}d\theta } } \\&\geqslant {E_q}(\ln p(v,\theta ,i,z|\alpha ,\phi ,\mu ,\sigma )) - {E_q}(\ln q(\theta ,i,z)) \nonumber \\&\triangleq L(\alpha ,\phi ,\mu ,\sigma ) \nonumber \end{aligned}$$

(7)

With previous optimal free parameters $\gamma $, $\psi $, $\varPhi $, we could maximize the lower bound L by setting the derivatives to zero with respect to the parameters $\phi $, $\mu $, $\sigma $ respectively. Then, we have following solutions:

$$\begin{aligned} \phi _{kk^{\prime }} \propto \sum _{i}\sum _{j}\psi _{ijk}\varPhi _{ijk^{\prime }} \end{aligned}$$

(8)

$$\begin{aligned} \mu _{k^\prime } = \frac{\sum _i\sum _j\varPhi _{ijk^{\prime }}v_{ij}}{\sum _i\sum _j\varPhi _{ijk^{\prime }}} \end{aligned}$$

(9)

$$\begin{aligned} \sigma _{k^{\prime }} = \frac{\sum _i\sum _j\varPhi _{ijk^{\prime }}(\mu _k^{\prime }-v_{ij})^\mathrm {T}(\mu _k^{\prime }-v_{ij})}{D\sum _i\sum _j\varPhi _{ijk^{\prime }}} \end{aligned}$$

(10)

And for Dirichlet prior $\alpha $, we use Newton-Raphson method to update it like LDA [15]. Iterating the inference and parameter estimation procedure, we would gradually acquire the solution of our model.

4 Experimental Results

To validate GLAM for geographical analysis, we evaluate it on a Flickr dataset of 2.5M photos in both qualitative and quantitative ways. In addition, we show its potential to retrieve the regions of interest.

4.1 Experimental Settings

We crawled 6.5M photos that had the GPS information in the YFCC100M dataset [16]. Then with the database of GADM^{Footnote 1}, which is a database containing the boundary geo-coordinates of each administration region, we filter out the photos not taken in China and the 2.5M remaining photos are divided into 34 groups according to the administration regions as shown in Fig. 3. One feature vector is extracted for each image from the dropout layer (the second last layer) of GoogLeNet [17].

Table 1. Comparing the correlation between ground truth and the three types of features.

Full size table

4.2 Quantitative Evaluation

In this section, we provide a quantitative evaluation for our GLAM model. The GLAM aims to find a better description for regions based on social images. As we know, textual content is good at delivering semantic information. Thus, we employ the documents from the online tour guide “TravelChinaGuide”^{Footnote 2}, the largest and most authoritative online tour operator in China, for comparison. Each document covers general introduction, facts, even life details for each region. We build topic models with LDA [15] from the textual document. The Euclidean distance between regions is computed based on the learned topic model. Similarly, we compute the distance between regions based on visual features learned by GLAM, Gaussian Mixture Model (GMM), and average visual features extracted directly from GoogLeNet. The corresponding distance matrix are shown in Fig. 4, where brighter colors mean higher similarity. It can be seen that our model presents more similar results as textual features, suggesting that our model generates a better semantic description for regions.

To test the effectiveness of our model, we employ the Kernel Canonical Correlation Analysis (KCCA) to compute the correlation between the distance matrix obtained from the textual feature and the other three types of visual features. As shown in Table 1, from textual feature we learn respectively 5, 10, 15, 20, 25 and 30 topics. Meanwhile, GLAM is severally trained with 200 and 500 clusters, and the number of characteristics K is set to 10, 15 and 20 respectively in the experiments. Distance matrix built from GMM and average visual features lead to a weak correlation to that of textual feature, with the highest correlation at 0.52 and 0.46, respectively, while the highest correlation for GLAM is 0.82, confirming it has a higher similarity to textual features in terms of semantic region description. This superiority is due to that geographical characteristics is abstract and semantic, while GMM and CNN features lack the mechanism to model the semantic features, which makes them difficult to discover complex patterns.

4.3 Qualitative Evaluation

We illustrate here an example (Fig. 5). A region is described by its dominant characteristics and each characteristic is described by the corresponding top 5 clusters. Here we only present one set of experiment results for qualitative evaluation, where the number of characteristics and number of clusters are respectively set to 15 and 500 with the strongest correlation in Table 1. The rest of results can be accessed at: https://sites.google.com/site/geolatentim/.

Take Beijing and Shanghai, two famous cities in China as an example. As shown in Fig. 5, according to Beijing’s characteristic distribution, the characteristic 11 dominates, which can be regarded as the main descriptor for Beijing. To interpret this characteristic, the top 5 representative clusters are picked out to describe it. We manually summarize these five clusters, which correspond to Chinese antique, Chinese tower, Chinese architecture, Chinese roof decoration and pedestrian street, indicating people in Beijing prefer a Chinese traditional atmosphere. This conclusion is well-aligned with Beijing because Beijing is the national center of Chinese history and culture and the historical sites are quite common. Similarly, we can see that Shanghai, the economic center of China, is a modern city with large population, as its characters are mainly described by skyscraper, city scene, urban night, modern traffic, and street scene with people crowd. Among all these regions^{Footnote 3}, it is remarkable that some cities are dominated by one single characteristic (e.g. Beijing, Shanghai) while others possess diverse characteristics (e.g. Sichuan, Shandong) because of geographical and cultural reasons.

4.4 City Recommendation

In this section, we introduce a strategy for region recommendation based on user’s photo album. We evaluate the effectiveness of GLAM for recommendation with the Mean Reciprocal Rank metric (MRR).

A photo collection could reflect a user’s interest since it contains snapshots of things that the user adores. Here we design a strategy based on the similarity between a user’s interest and a region’s geographical characteristics for recommendation. First, we compute an interest distribution $\theta _{new}$ for a photo collection by Eq. 6. Then, we measure the similarity between this distribution and a region’s characteristics with the following distance metric:

$$\begin{aligned} d_i = { ||\theta _i - \theta _{new} || }_{i=1,...,M}^{2} \end{aligned}$$

where $\theta _i$ is the characteristic distribution in the $i^{th}$ region. The smaller the distance is, the more similar the collection and the region are. The top 3 similar provinces are picked as a recommendation. In our experiments, we crawled additional photos with GPS information from Flickr community^{Footnote 4} (not included in our training data) for both quantitative and qualitative evaluation.

For quantitative evaluation, according to the GPS information, we choose 100 images from a province to form a virtual album and the province is regarded as the label of this album. Then we input different amount, accumulating gradually until 100, of images for each album and compute the average MRR to show the recommendation accuracy. Figure 6 presents the average recommendation accuracy with different parameters. The best average MRR performance of GLAM region feature ($K=15, K^{\prime }=500$) is over 40% when input number is more than 70, and according to the property of MRR, we can infer that the label region appears in the top 3 recommended regions, which provide us a reliable recommendation result. Compared with GMM features and visual features, they possess close performance when the input number is small. Nevertheless, it is clear that our model could better perform with more input images and outperform GMM feature and average CNNs visual features because more images could better cover the personal characteristics. For qualitative evaluation, we randomly pick several users, and in each user’s photo collection, we randomly select 100 images to form test photo albums. Since the parameter set as 15 “attributes” and 500 “clusters” provide the best performance (Fig. 6), we here employ this parameter setting. Figure 7 present one example: the photo collection containing mostly nature scenes which present mountain and waterside. This indicates the owner of the photo collection may be a fan of traveling in nature. Our recommendation result shows Yunnan, Chongqing and Jiangxi, which are famous for their landscape. Browsing the photos in these regions, we observe the scenery is similar to the photo collection.

5 Conclusion

In this paper, assuming “attributes” as the descriptors of regional characteristics, we have attempted to find the characteristic relevance of a region and use the high-relevant ones to describe this region. Meanwhile, representative clusters, formed by social images, are picked out to present the attributes of regions. The experiments on photos in China qualitatively and quantitatively demonstrate our model has the capacity to semantically describe a region with image content. Based on our model, the regional features could be extracted, from which the recommendation strategy profits to provide reliable results and outerperform GMM features, as well as average CNNs features in the experiments. Therefore, our model is promising for plenty of applications and could be further developed in future work related to geographical characteristics.

Notes

1.
https://www.gadm.org/.
2.
https://www.travelchinaguide.com/.
3.
To see other examples with different parameter sets, please go to our website: https://sites.google.com/site/geolatentim/.
4.
https://www.flickr.com/.

References

Kennedy, L.S., Naaman, M.: Generating diverse and representative image search results for landmarks. ACM, New York, April 2008
Google Scholar
Ji, R., Xie, X., Yao, H., Ma, W.-Y.: Mining city landmarks from blogs by graph modeling. ACM, October 2009
Google Scholar
Crandall, D.J., Backstrom, L., Huttenlocher, D., Kleinberg, J.: Mapping the world’s photos. ACM, New York, April 2009
Google Scholar
Crandall, D., Snavely, N.: Modeling people and places with internet photo collections. Commun. ACM 55, 52–60 (2012)
Article Google Scholar
Cao, L., Jie, Y., Luo, J., Huang, T.S.: Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 125–134. ACM (2009)
Google Scholar
Clements, M., Serdyukov, P., de Vries, A.P., Marcel, J.T.: Reinders: using flickr geotags to predict user travel behaviour. ACM, New York (2010)
Google Scholar
Popescu, A., Grefenstette, G.: Mining social media to create personalized recommendations for tourist visits. In: Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications, COM.Geo 2011, pp. 37:1–37:6. ACM, New York (2011)
Google Scholar
Li, J., Qian, X., Lan, K., Qi, P., Sharma, A.: Improved image GPS location estimation by mining salient features. Image Commun. 38(C), 141–150 (2015)
Google Scholar
Fang, Q., Sang, J., Changsheng, X.: Giant: geo-informative attributes for location recognition and exploration. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 13–22. ACM (2013)
Google Scholar
Porzi, L., Bulò, S.R., Lepri, B., Ricci, E.: Predicting and understanding urban perception with convolutional neural networks. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 139–148. ACM (2015)
Google Scholar
Hollenstein, L., Purves, R.: Exploring place through user-generated content: using Flickr tags to describe city cores. J. Spat. Inf. Sci. 2010, 21–48 (2010)
Google Scholar
Cranshaw, J., Schwartz, R., Hong, J.I., Sadeh, N.: The livehoods project: utilizing social media to understand the dynamics of a city. In: International AAAI Conference on Weblogs and Social Media, p. 58 (2012)
Google Scholar
Blei, M.D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Xing, E.P., Jordan, M.I., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, pp. 583–591. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Blei, M.D., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Thomee, B., Elizalde, B., Shamma, D.A., Ni, K., Friedland, G., Poland, D., Borth, D., Li, L.-J.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. IEEE (2015)
Google Scholar

Download references

Acknowledgments

The work is partially supported by the High Technology Research and Development Program of China 2015AA015801, NSFC 61521062, STCSM 12DZ2272600.

Author information

Authors and Affiliations

SJTU-ParisTech Elite Institute of Technology, Shanghai Jiao Tong University, Shanghai, China
Huangjie Zheng
Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
Huangjie Zheng, Jiangchao Yao & Ya Zhang

Authors

Huangjie Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiangchao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Ya Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huangjie Zheng .

Editor information

Editors and Affiliations

CNRS–IRISA, Rennes, France
Laurent Amsaleg
Reykjavík University, Reykjavik, Iceland
Gylfi Þór Guðmundsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
Reykjavik University, Reykjavik, Ireland
Björn Þór Jónsson
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, H., Yao, J., Zhang, Y. (2017). Describing Geographical Characteristics with Social Images. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-51811-4_10
Published: 31 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics