1 Introduction

There are many factors impacting the audience rating of a TV show, e.g., broadcasting time, the influence of the TV channel, the marketing strategy, etc. But one phenomenon is the popularity of a TV show usually differs at different regions. In China, regional culture is an important factor determining audience rating, especially the regional appreciation preference division has become more and more significant in these years [16, 21]. One example is the TV show named “Country Love” achieved 21.41 % of high audience rating in Shenyang, but in contrast less than 5 % in Guangzhou [21] in 2010. Some TV shows can resonate with audiences from some regions but fail in other regions. Hence it is important for a TV show to predict the audience rating prior to the broadcasting and make the right strategy for different regions. However, it is not straightforward to know the preference of a region’s audiences which is hard to describe or quantize, and it is not easy to establish the relationship between a TV program with a region. In this paper, we explore the problem of automatic popularity prediction for TV shows at different local regions from the angle of multimedia analysis. Basically it is a recommendation problem and the purpose is to offer new TV shows to a region based on multiple sources of data.

Recent years have witnessed the high-speed development of social networks. People would like to spend more time on social network and feel more free to express their views there. Many applications such as social image search [6, 14] have been proposed. For TV shows, the social network platform like Sina WeiboFootnote 1 is a good platform to disseminate the TV shows and gather the comments from audience. A typical scenario is if an audience watched a TV show and got good or bad experience, he or she would post the feeling on social network and sometimes ‘@’ (the action of mentioning) his friend. Moreover, location of the users is also an important factor which has been shown in [20]. To analyze the microblogs from a region on platform like Sina Weibo, we can look for these microblogs according to the registration place of the users posting these microblogs. Hence it is more effective and convenient to analyze TV show popularity on social network.

There have been huge effort devoted to recommendation system in these years such as [9, 18, 24]. In [18], location semantic analysis is proposed to model every location for news article recommendation. In this method, the location has its own geographical topics just like the articles have topic vectors. Kim and Pyo et al. [9] proposed a TV program scheme where the users’ interests on watched TV program contents is implicitly inferred and the recommendation is conducted by collaborative filtering. Kaminskas et al. [8] proposed a location-aware music recommendation method, which is mainly based on representing both POIs and music with tags. These location-aware recommendation methods generally recommend items based on the matching between location profile and semantic content of items. They are not suitable for our location-aware TV show recommendation because the new TV show generally do not have any semantic tags and the video auto-tagging is not reliable. To the best of our knowledge, the application of location-aware video recommendation has not been studied and no suitable methods can be directly applied.

How to represent a geographical region has been researched a lot recently. Yin et al. [25] proposed a latent geographical topic analysis method to combine location and text information. This method can be used to find regions of interests and compare topics across different locations. Son et al. [19] proposed a probabilistic explicit semantic analysis model in order for a probabilistic of topics and construction of relevance of locations. However, these methods work mainly on general topics instead of TV show related topics. For these works, the corpus is mainly from Wikipedia.

In this paper, we propose an automatic location-aware TV show recommendation method. The method predicts the popularity of the TV show at a specific region before it is shown. Multiple data sources are employed including the social media and geographic information of a region, and audio feature and visual features of the TV show. There are mainly three contributions of this work. First, to our best knowledge, it is the first location-based TV show recommendation work by incorporating microblog information, geographic information, visual information and audio information. Second, we propose a location profile representation based on geographic information and microblog text information. Third, we develop a TV show similarity metric-based learning by using multiple video and audio features. A real world dataset has been collected and experiments have been conducted to demonstrate the effectiveness of the proposed method.

The rest of this paper is organized as follows. Section 2 introduces the whole work flow of this work. In Sects. 3 and 4, we detail the location profile model and the TV show representation model, respectively. Section 5 shows the experimental results. Finally, the paper is concluded in Sect. 6.

2 Whole framework

Fig. 1
figure 1

The overall process of the localized TV show recommendation

We denote the popularity of different TV shows across different locations as a location-show matrix \(M\). Here, \(M_{ls}\) is the popularity of show \(s\) at location \(l\). For a new TV show \(v\), we first initialize the popularity using the popularity of other similar shows at the corresponding location:

$$\begin{aligned} M_{lv} = \sum _{u=1}^{n} M_{lu} \times {\rm SIM}^V_{uv}, \end{aligned}$$
(1)

where \(u = 1, \ldots , n\) is the TV show with known popularity and \({\rm SIM}^V_{uv}\) denotes the similarity between TV show \(u\) and TV show \(v\). A relevance graph is constructed based on the relevance among different shows. A random walk-based updating procedure is employed as:

$$\begin{aligned} M_{lv}^{t+1} = \beta M_{lv}^0 + (1-\beta ) \sum _{u \ne v} {\rm SIM}^V_{uv} M_{lu}^t \end{aligned}$$
(2)

The initialized location-show popularity matrix contains many noises. The latent representation of location and shows are desirable to obtain the accurate popularity prediction, similar to traditional user-item matrix. Given the location-show matrix \(M\), we aim to find the latent representation of location \(L\) and show \(V\) such that \(M \approx L V^T\). The non-negative matrix factorization (NMF) [10] is a popular method to achieve this goal. Moreover, in order to introduce the location similarity, a graph regularized non-negative matrix factorization (GNMF) [2] method is employed. In our proposed method, we aim to minimize

$$\begin{aligned} || M - L V^T||^2 + \lambda \sum _{ij} || L_i - L_j ||^2 {\rm SIM}^L_{ij}, \end{aligned}$$
(3)

where \({\rm SIM}^L_{ij}\) denotes the similarity between two locations and similar locations tend to have similar latent representation.

The overall process is illustrated by Fig. 1. We extract geographical and topic features for location representation, and fuse the two features for the computation of \({\rm SIM}^L_{ij}\) used in (3). The TV show feature consisting of visual part and audio part are computed, and the video similarity metric \({\rm SIM}^V_{uv}\) used in (2) is learnt. The feature representations and similarity metrics are put into the collaborative framework and achieve the TV show popularity prediction.

3 Location profile

In this paper, we think a location is characterized by two factors, i.e., geographical coordinate and textual comments on TV shows obtained from Sina Weibo.

3.1 Geographical location

Each location is a province in China in this paper. In most cases, two adjacent provinces will have similar culture and custom which will possibly make the two provinces to have similar preference towards TV programs. There are a few features to calculate location similarity including shape, orientation and position [22]. In our case, we only consider the geographical distance as the location similarity metric.

We set the location of each province as the coordinate \(\mathbf {L}\) depicted by longitude and latitude of the capital. The longitude and latitude of a province capital are looked up from Baidu Map.Footnote 2 To calculate the spatial distance \(d_{ij}\) between two capitals \(i\) and \(j\), we use Haversine formula by inputting the coordinates of two locations. We calculate the similarity of two regions according to

$$\begin{aligned} {\rm SIM}^G(i,j) = \exp \left( -\frac{d_{ij}^2}{\sigma _G}\right) , \end{aligned}$$
(4)

where we choose \(\sigma _G\) empirically.

3.2 Social network information

For some provinces far apart, they may also tend to have similar TV show preference which can be reflected by what the audiences posted on social networks. To explore this phenomenon, we employ the information from social networks. After watching a TV show, nowadays lots of people will use social network platform to post TV show related information including comments or recommendation to friends, etc. Hence the texts about a TV show in a region can reflect the characteristic of this region. To extract this characteristic

To acquire the location information of microblogs, the frequently used identification is GPS information. However, there are two drawbacks of making use of GPS information. On some social network platforms like Sina Weibo, only small fraction of microblogs contain GPS information which leads to massive TV show related microblogs ineffective and wasted. The other drawback is GPS does not mean the users are really living in this region, instead they might be tourists or pass by. In this work, different from [18, 19, 25], we determine the location of a microblog as the registration place of its poster which is almost always available. The registration place for user means the place he has or will stay for rather long term, hence the view from this user can stands for the place he is living in.

Fig. 2
figure 2

Illustration of microblogs are assigned to each region

Suppose we have \(M\) regions and \(N\) TV shows all together. For TV show \(i\), we collect all the microblog texts \(T_i\) related to this TV show. Then as depicted by Fig. 2 we divide the \(T_i\) into local regions and get \(\left\{ T_i^j\right\} \), the texts related TV show \(i\) in region \(j\). Then for all the TV shows of one region \(j\), we get the text set \(T_j = \left\{ T_i^j\right\} _{i=1}^M\). In the next step, we extract the topics at this location. For each \(T^j\), we extract the topics in this region, where the topic vector \(l_j = \langle \phi _1(T_j), \phi _2(T_j), \ldots , \phi _{N_To}(T_j)\rangle \), where \(N_To\) is the number of topics. Here we employ Latent Dirichlet Allocation model [1] to extract the topics and the corpus is set as all the microblog texts related to all the TV shows.

With this location representation, we calculate the similarity between location as a cosine similarity

$$\begin{aligned} {\rm SIM}(l, l') = \frac{l \cdot l'}{\Vert l \Vert \cdot \Vert l' \Vert }. \end{aligned}$$
(5)

3.3 Similarity by feature fusion

With the geographical and textual information from Sina Weibo, to calculate the similarity between two locations, we fuse them linearly according to

$$\begin{aligned} {\rm SIM}^L = \alpha {\rm SIM}^G + \left( 1 - \alpha \right) {\rm SIM}^T, \end{aligned}$$
(6)

where \(\alpha \) is a coefficient to determine the weight of geographical location feature which we set as \(0.1\) throughout this paper.

4 TV program features

If two videos are similar, we can expect it will achieve similar popularity for the same region. A video consists of consequent frames and audio. To model a video, we extract visual and audio features. The extracted features can be used to compute the similarity between videos. The feature can be represented as \(f = \left( \phi _1, \ldots , \phi _N\right) \).

4.1 Visual features

Visual features are important when measuring the similarity between two videos. When two videos show visually similar shots they tend to attract similar audiences. To extract visual features, we do sampling and get a part of frames from a video. To reduce the computation overhead, we sample one frame from every 5-s frames. Two kinds of visual features are extracted on each sampled frame and concatenated to form one visual feature vector. Visual feature vector is first extracted on each sampled frame and then the average vector is computed and set as the visual feature vector of the video. In [7], a few visual features have been tested in interestingness recognition. Here, we employ the two features which achieve best performance when fusing two features in their work. Our intuition is interestingness recognition depends on video content and is similar to a similarity comparison process. Hence the effective visual features in [7] should also yield good performance in terms of video similarity computation.

Sparse SIFT (Scale Invariant Feature Transform) feature. SIFT feature is a very popular and effective feature in applications like image retrieval and object recognition. In this work, for each sampled frame, we employ the same features used in [17]. In detail, interest points are firstly detected, respectively, by using SIFT detector [11] and MSER (Maximally Stable Extremal Regions) [12] detector on each sampled frame, and then the SIFT descriptors are computed on each individual interest point region. In the following, the SIFT descriptors are quantized by following the bag-of-words (BOW) representation using a spatial pyramid with a codebook of 500 words. The BOW feature vectors extracted from two types of interest points are concatenated and form a higher dimensional feature vector.

HOG2x2 (Histogram of Oriented Gradients) features. HOG feature was first proposed for pedestrian detection and achieved state-of-the-art performance [3]. Unlike sparse SIFT features which is extracted on sparsely distributed interest points, HOG feature is computed on densely sampled image patches. Following [23], HOG feature vectors from \(2 \times 2\) neighborhood are further stacked together to form a descriptor in order for more descriptive power. The new feature vector is quantized into 300 visual words and then spatial pyramid feature is computed similar to the sparse SIFT feature computation.

4.2 Audio features

Audio effect is very important for a TV program. Without sound, audience cannot understand the dialogs or catch the tone between the roles. In particular, for talk show or music election show type program, audio plays more important part than visual. Audio feature plays an important role when comparing two videos. In this work, we choose the two features with best performance of two-feature subset in [7].

MFCC (Mel-Frequency Cepstral Coefficiency) MFCC is a frequently used audio feature in applications such as speech recognition [5] and music information retrieval [13]. It works well for audio similarity measure. We divide audio into \(32\)-ms-long audio frames with 50 % overlap and compute MFCC on each audio frame. Similarly, MFCC features are also quantized into BOW representation.

Spectrogram SIFT We use the feature used in [7]. An image is synthesized based on the constant-Q spectrogram of the processed audio and the energy distribution is visualized. On the energy map, the SIFT descriptors are computed and further quantized into BOW.

4.3 Similarity metric learning for videos

With the visual feature vector and audio feature vector, we concatenate them and form a \(P \times 1\) feature vector \(\mathbf {v}\) to represent the video. However, with the feature vector, how to compute the distance is still a problem so far. The frequently used Euclidean distance, Cosine distance, or intersection distance might be candidates; however, they work well only in terms of single feature type. In terms of multiple features, a good distance metric is in need.

Suppose we have a training video set \(\mathbf {S} = \left\{ S_1, S_2, \ldots , S_M \right\} \) containing different types of videos, where the video set consists of \(M\) subsets corresponding to different categories of videos. For each subset \(S_i\), it contains \(N_i\) videos and there are totally \(N=\sum _i N_i\) videos from all subsets. Here we propose to learn the distance metric and employ a method that is similar to [15]. Our objective for distance metric learning is to minimize the distance of videos from same category and maximize the distance of videos from different categories, which can be written as

$$\begin{aligned} \min \left\{ \sum _{\mathbf {v}_i \in \mathbf {S}} \left( \sum _{\mathbf {q} \in S_i^+} d \left( \mathbf {v}_i, \mathbf {v} \right) - \sum _{\mathbf {v} \in S_i^-} d \left( \mathbf {v}_i, \mathbf {v} \right) \right) \right\} , \end{aligned}$$
(7)

where \(S_i^+\) denotes the subset video \(\mathbf {v}_i\) belongs to, and \(S_i^-\) denotes the subset video \(\mathbf {v}_i\) does not belong to. To simplify the computation, formula (7) can be rewritten as

$$\begin{aligned} \min \left\{ \sum _{i=1}^N \left( \sum _{j=1}^{n^+} d \left( \mathbf {v}_i, \mathbf {v}_{j+} \right) - \sum _{j=1}^{n^-} d \left( \mathbf {v}_i, \mathbf {v}_{j-} \right) \right) \right\} , \end{aligned}$$
(8)

where \(\mathbf {v}_j+\) is a video in subset \(S_i^+\), \(n^+\) is the number of videos in \(S_i^+\), \(\mathbf {v}_j-\) is a video from one of subsets \(S_j^-\) with shortest of distance to \(\mathbf {v}_i\) in that subset.

Here we define the distance of two videos using Mahalanobis distance metric

$$\begin{aligned} d \left( \mathbf {v}_i, \mathbf {v}_j \right) = \left( \mathbf {v}_i - \mathbf {v}_j \right) ^T \mathbf {W} \left( \mathbf {v}_i - \mathbf {v}_j \right) , \end{aligned}$$
(9)

where \(\mathbf {W}\) is the weight matrix. When \(\mathbf {W}\) is an identity matrix formula (9) corresponds to Euclidean distance. To make the problem well-posed we add a constraint \(\det \left( \mathbf (W) \right) = 1\). With the Lagrange multipliers method, the optimization problem depicted in (8) can be obtained:

$$\begin{aligned} L = \sum _{i=1}^N \sum \limits _{j = 1}^{{n_ + }} {d\left( {{\mathbf {v}_i},{\mathbf {v}_{j + }}} \right) } - \sum _{i=1}^N \sum \limits _{j = 1}^{{n_ - }} {d\left( {{\mathbf {v}_i},{\mathbf {v}_{j - }}} \right) } + \lambda \left( {\det \left( {{\mathbf {W}}} \right) - 1} \right) \end{aligned}$$
(10)

To optimize (10) we can derive:

$$\begin{aligned} \begin{array}{l} \displaystyle \frac{{\partial L}}{{\partial {w_{{{st}}}}}}= \sum _{i=1}^N \sum \limits _{j = 1}^{{n_ + }} {\displaystyle \frac{{\partial d\left( {{{\mathbf {v}_i}},{\mathbf {v}_{j + }}} \right) }}{{\partial {w_{{{st}}}}}}} - \sum _{i=1}^N \sum \limits _{j = 1}^{{n_ - }} {\displaystyle \frac{{\partial d\left( {{{\mathbf {v}}_i},{\mathbf {v}_{j - }}} \right) }}{{\partial {w_{{{st}}}}}}} + \lambda {\left( { - 1} \right) ^{s + t}}\det \left( {{\mathbf {{\overline{W}}}_{{{st}}}}} \right) , \end{array} \end{aligned}$$
(11)

where \(w_{st}\) is the element with index of \((s,t)\) of \(\mathbf {W}\), and \(\mathbf {\overline{W}}_{st}\) is a \((P-1)\times (P-1)\) matrix generated by removing the \(s\)th row and the \(t\)th column of \(\mathbf {W}\). We set \(D_{st}^+ = \sum _{i=1}^N \sum _{j = 1}^{{n_ + }} {\displaystyle \frac{{\partial d\left( {{{\mathbf {v}_i}},{\mathbf {v}_{j + }}} \right) }}{{\partial {w_{{{st}}}}}}}\) and \(D_{st}^- = \sum _{i=1}^N \sum _{j = 1}^{{n_ - }} {\displaystyle \frac{{\partial d\left( {{{\mathbf {v}}_i},{\mathbf {v}_{j - }}} \right) }}{{\partial {w_{{{st}}}}}}}\). Let \(\partial L / \partial w_{st} = 0\), it can be obtained

$$\begin{aligned} \det \left( {\mathbf {{\overline{W}}}_{{{st}}}^{}} \right) = \displaystyle \frac{{D_{st}^- - D_{st}^+}}{{\lambda {{\left( { - 1} \right) }^{s + t}}}} , \end{aligned}$$
(12)

where \(D_{st}^+\) can be calculated according to (13).

$$\begin{aligned} D_{st}^+&= \sum \limits _{i=1}^N\sum \limits _{j = 1}^{{n_ + }} {\displaystyle \frac{{\partial {{\left( {{{\mathbf {v}}_i} - {\mathbf {v}_{j + }}} \right) }^T}{\mathbf {W}}\left( {{{\mathbf {v}}_i} - {\mathbf {v}_{j + }}} \right) }}{{\partial {{w}_{{{st}}}}}}} \nonumber \\&= \sum \limits _{i=1}^N\sum \limits _{j = 1}^{{n_ + }} {\left( {{\mathbf {v}_{j + }}\left( s \right) - {{\mathbf {v}}_k}\left( s \right) } \right) \left( {{\mathbf {v}_{j + }}\left( t \right) - {{\mathbf {v}}_i}\left( t \right) } \right) }. \end{aligned}$$
(13)

\(D_{st}^-\) can be calculated in the same way as (13).

Let \(\mathbf {A} = \lambda {\mathbf {W}^{ - 1}} = \left[ {a_{st}} \right] \), i.e., \(a_{st}\) is the \(\left( {s,t} \right) \)th element of \(\mathbf { A}\). Then we can calculate the element of \(\mathbf {A}\) with indices of \((s,t)\) as

$$\begin{aligned} a_{st} = \frac{\lambda (-1)^{s+t} \det \left( \mathbf {\overline{W}}_{st}\right) }{\det {\left( \mathbf {W}\right) }} = D_{st}^- - D_{st}^+. \end{aligned}$$
(14)

With \(\mathbf {A}\) we can get \(\lambda \). Since we have \( \det \left( \mathbf {A} \right) = \lambda ^P \det \left( \mathbf {W} ^{-1} \right) = \lambda ^P\), then

$$\begin{aligned} \lambda = \left( \det \left( \mathbf {A} \right) \right) ^ \frac{1}{P}. \end{aligned}$$
(15)

So \(\mathbf {W}\) can be computed as

$$\begin{aligned} \mathbf {W} = \lambda \mathbf {A} ^ {-1}. \end{aligned}$$
(16)

With the learnt distance metric \(d(\mathbf {v}_i, \mathbf {v}_j)\), the similarity of two videos is defined as

$$\begin{aligned} {\rm SIM}_{ij}^V = \exp \left( - \frac{d(\mathbf {v}_i, \mathbf {v}_j)}{\sigma _v^2} \right) , \end{aligned}$$
(17)

where \(\sigma _v\) is empirically determined.

4.4 Datasets for distance metric learning

To train the distance metric, we use the dataset made public in [7]. In  [7], there are two parts of datasets, one is acquired from Flickr with 15 different categories with a specific subject, and the other is from YouTube with another 14 categories. We combine the two datasets and get overall 29 different subsets for training. We make an assumption that the videos are similar if they are from the same category and different if from different categories.

During learning, we treat each individual video equally, which means all the videos will be used for training although in  [7] videos are associated with levels of interestingness.

5 Experiments

5.1 Settings

To demonstrate the proposed method, we collect data from multiple sources. We first crawl social network data from Sina Weibo. To cover more tweets, for each TV program we collect some keywords which accurately relate to the TV program and introduce no ambiguity in meantime. The keywords usually include the name of the TV programs and the role names in the TV programs.

To extract the location information, we extract the registration province of the microblog owner after getting the microblog data. Hence each microblog is now associated with a location. Most of the time, the registration place is the place this user is currently living in or be active in, so this property can reflect the comments of this province. After we get all the tweets related to the TV program, we divide the microblogs into different provinces according to the registration province. There are 15 TV programs in our dataset.

The videos for each TV program are downloaded from Letv.Footnote 3 We assume a TV program is usually consistent and can be represented by one of its episodes. Thus, for each program we only pick the first episode.

5.2 Experimental results

In order to evaluate the performance of the proposed method, a dataset of TV shows and their corresponding discussions on social network is collected.

In the TV show dataset, we selected 14 popular shows on social networks, including “Dad, Where Are We Going”, “Longmen Express”, “I Am Singer”, “The Voice of China”, etc. The videos are collected from internet. Then, by using the title of the show as keywords, we queried Sina Weibo and relevant tweets are retrieved. By further identifying the registered locations of the authors of the tweets, the location-aware tweets for different TV shows are obtained.

The experiments are divided into two parts. In the first part, we evaluate the location-aware popularity prediction performance for new TV shows. In the second part, given the popularity of a show in certain regions, we predict its popularity for other regions. The details are given in the following sections.

In the first setup, we selected 9 shows and their corresponding location-aware social network discussions as training data. The remaining 5 shows are considered as testing data. The aim is to predict the popularity of these shows across different locations. We randomly selected “Beautiful Time”, “The Voice of China”, “Happy Camp”, “I am Singer”, and “Longmen Express” as test set, and the remaining shows as training set.

The quality was measured by looking at the number of hits and their position within the \({\rm top}-N\) locations that were recommended by a particular scheme [4]. The number of hits is the number of \({\rm top}-N\) locations in the test set that were also present in the \({\rm top}-N\) recommended locations returned for each TV show. We will refer to the quality measure as the hit-rate @ \(p\) (HR@\(p\)) that is defined as follows. If \(p\) is the \({\rm top}-p\) locations in the test set, the hit-rate @ \(p\) of the recommendation algorithm was computed as:

$$\begin{aligned} {\rm HR}@p = \frac{\text {Number of hits}}{p} \end{aligned}$$
(18)

An HR value of 1.0 indicates that the algorithm was able to always recommend the \({\rm top}-p\) popular locations for a TV show, whereas an HR value of 0.0 indicates that the algorithm was not able to recommend any of the popular locations.

The results are shown in Table 1 and Fig. 3.

Table 1 The hit-rate of the proposed method for different TV shows

As can be seen in the results, the \({\rm top}-1\) hit-rate for all the test shows are 0. It means that the proposed method can hardly predict the most popular locations for the particular shows. It indicates the popularity of the shows is affected by not only the content but also many other factors. However, the hit-rate increases as \(p\) increases. When \(p=10\), the proposed method can predict around 30 % popular locations. It indicates that the content and culture do have a strong effect on the popularity of the shows across different locations.

Fig. 3
figure 3

The hit-rate of the proposed method for different TV shows

In the second setup, we randomly remove 20 % of the popularity information, and utilize our proposed method to recover the information. The widely used mean square error of prediction is adopted as the performance evaluation measure. If \(\hat{X}\) is the predictions of \(n\) missing values, and \(X\) is the corresponding true values, then the MSE of the proposed method is:

$$\begin{aligned} {\rm MSE} = \frac{1}{n} \sum _{i=1}^{n} \left( \hat{X_i} - X_i \right)^2 \end{aligned}$$
(19)

The prediction error in the experiment is 0.0011. It shows that the proposed method can achieve reasonable error. That is, given a show played at certain locations, we may roughly estimate the popularity for locations where the show has not been broadcast yet with the proposed method.

6 Conclusion

In this paper, a preliminary location-aware TV show recommendation scheme is proposed. By incorporating the social network information of users from different locations, a location profile is obtained. Location similarity is then calculated by combining location profile and geographical information. Video similarity is obtained by considering both visual and audio information. By taking the location similarity and video similarity as the regularizer, the scheme makes prediction on TV show popularity for different regions via graph regularized non-negative matrix factorization. A TV show dataset with location-aware social network information is collected and the proposed method achieves promising results on it. In the future, more information will be investigated for better recommendation performance.