Keywords

1 Introduction

How to manage personal information based on knowledge is an important research issue. Personal information shows different types like documents, emails, photos and so on. In this paper we studied the problem by focusing on how to classify personal photos to make re-finding easy. With the wide use of mobile phone, people tend to use it to take photos for different purposes. Taking photos is not only a way to record good moments, but also a method to record important information for reusing later. For example, a person sometimes takes a photo to record the username and password of a website for reusing laer. So the number of photos taken by mobile phones is increasing day by day. How to effectively manage the photos and improve the re-finding efficiency are valuable research issues, and the classification of personal photos is a basic topic.

There are some challenges for people to study the problem. People have some personalized features, and the number of each person’s photos is different. It is not easy to get a general method to classify personal photos. Because some personal photos have some privacy information, how to get a data set for experiment is also a hard problem.

This paper studied the problem and proposed a solution. The main contributions can be summarized as follows:

  1. (1)

    We conducted a survey on this problem, and observed some features about personal photo classifying and re-finding. We propose the idea of re-finding personal photos by classifying them based on specific events. We also propose and define some relevant concepts.

  2. (2)

    We studied the classification method based on specific events, and propose a photo classification method based on shooting time and location with high accuracy and low computational cost.

  3. (3)

    Because there are no relevant data sets and benchmark, we established the experimental data sets based on the real personal photo collections and tested the proposed method from two aspects of the classification accuracy and the efficiency of re-finding personal photos. The results verify the effectiveness of our method.

2 Related Work

The existing classification methods of photos are divided into four categories. The first is based on geographical information, the second is face recognition and identity recognition; the third is based on annotations; the last one is based on events.

There are some works based on geographical information. Lo et al. [1] designed a clustering algorithm that clusters geo-tagged photos in accordance to thresholds of different scales. Papadopoulos et al. [2] proposed a way to automatically detect landmarks and events using visual features and tag similarity in a set of images with tags.

About face recognition and identity recognition, Kumar et al. [3] proposed a semi-supervised framework for recognizing faces. Brenner et al. [4] used sparse Markov Random Fields for face recognition, Xu et al. [5] and O’Hare et al. [6] implemented techniques to organize photos based on people’s identity. This classification method mainly depends on the accuracy of image recognition technology. The image matching algorithm’s time complexity is very high.

There are some works on annotations, Andrade et al. [7] focused on people annotation, location annotation and event annotation. Kim et al. [8] developed a system collecting and storing annotations about photos to manage and search for photos. Ulges et al. [9] and Zigkolis et al. [10] both presented an annotation framework to facilitate event classification. The method of annotation requires the image recognition technology to recognize the image content with high computational cost and time complexity, and the added annotations may not accurately express the photo semantics.

There are also some works to classify personal photos based on events. Bacha et al. [11] and Cao et al. [12] both use the connection between scenes and events to identify events. Because activities and events in our lives are structural, Bosselut et al. [13] presented a data-driven approach to learning event knowledge. Wang et al. [14] and Tang et al. [15] utilized metadata and visual features for event recognition. Yuan et al. [16] mined some features from GPS and visual cues for event recognition. There are a lot of events going on, and even the same event contains many scenes and objects. It is difficult to complete the scenes and objects statistics of all events. The same scene or object often corresponds to different events, such as hiking and beach, which may include the sky. Therefore, it is not accurate to use scenes and objects for image event recognition, and the time complexity and computational cost are very high.

According to our findings, these kinds of classification methods are only applicable in specific conditions. People taking a lot of photos in their resident locations, the classification based on geographical information is not effective to deal with this situation. For the classification methods based on face recognition and identity recognition, they only work well to the photos with human face. The annotations for classification may not accurately express the photo semantics. It is inaccurate to judge events by the scenes and objects, and the classification based on events is likely to separate photos taken in continuous time. The classification methods of photos should not only take the features of mobile phones into account, but also have high efficiency. Therefore, we need a general and efficient method to classify personal photos for re-finding.

3 Conceptual Model

3.1 Survey and Findings

We made a survey on the problem referring to 200 users, and have the following observations: (1) Current mobile phone albums use a flat storage structure to store the photos made by mobile phones, displaying thumbnails of all photos. However, the screen of the mobile phone is very small, and many photo thumbnails of the same event are all displayed. Then it is inconvenient for people to re-find photos, so it is necessary to have a suitable method for mobile phones classification and re-finding. (2) Most of the photos are made when people go out to take part in activities, and they often make dozens or even hundreds of photos, but these photos are not often re-used by people. People often reuse the photos recording some important information. Such photos are usually taken a small number at a time, and are mixed into those photos taken when people go out for fun or to take part in activities. (3) The main clues used by people to re-find a photo are the approximate shooting time and the specific event related to it. (4) Some mobile phone albums already have the function of classification based on image content. But this function destroys the memory info about time, and they often have a low performance for content analysis.

3.2 Definitions

Based on the survey and findings, this paper proposes a photo classification strategy based on specific events. To make our approach clear, we propose some concepts and give their definitions as below.

Definition 1. Specific event.

A specific event can be taken as a series of actions taking place for the same target subject in a relatively continuous period of time and relatively neighboring location. For example, attending a wedding of a friend at noon one day is a specific event.

The concept of “specific event” is not the same as the concept of “event”. An event may include two or more different specific events, such as the “wedding” event may include several different people’s weddings. A specific person’s wedding is regarded as a specific event.

Definition 2. Personal Photo Sequence.

A personal photo sequence is list of personal photos ordered by their shooting time ascendingly.

Definition 3. Time Interval.

Let Pi and Pj be any two photos, the time interval of them is the time difference between the shooting time of Pi and Pj. The unit is second (s). Because most mobile phones have the function of taking picture and the shooting time can be recorded automatically, to compute time interval of two given photos is not a hard thing.

Definition 4. Photo Distance.

Let Pi and Pj be any two photos, the distance between the shooting location of Pi and Pj is the photo distance. The unit is meter (m). Now most mobile phones has GPS function and the location of shooting can be recorded easily, then to obtaining distance of two given photos is also not hard.

Definition 5. Rate of Location Change.

For any two photos Pi and Pj, let their time intervals be T, their photo distance be L, and the rate of location change of Pi and Pj is denoted as V, then V = L / T. The unit of it is m/s.

4 The Classification Method

Considering the characteristics of specific events, we intend to classify photos by specific events based on shooting time and location. At present, photos taken by mobile phones contain some metadata such as shooting time and GPS (latitude, longitude, and altitude). It is not difficult to get the shooting time and location of the photos taken by mobile phone.

4.1 A Sample of Personal Photo Sequence

To make our method clear, seven photos taken continuously by one participant were selected as an example, as shown in Table 1. The P1 and P2 are the photos of two selfies made on the train. The P3 is a photo for recording some information in the laboratory. The P4 and P5 are the photos of the flowers on the laboratory table. The P6 and P7 are two photos of eyes taken when the eyes were allergic. Although the time interval between P6 and P7 is long, the photos’ content is same, and they belong to the same specific event. Therefore, the correct photo sets of the seven photos based on specific events are: {(P1, P2), P3, (P4, P5), (P6, P7)}.

Table 1. Representative seven photos of personal photo sequence

4.2 Specific Event-Based Classification Method

After manually classifying experimental training data sets, it was found that the shooting time and location of the same specific events were relatively continuous, and that the changes of shooting time and location within the same specific events were significantly different from the ones between different specific events. After observing some regulations about the relationship between specific events and the changing of shooting time and location, this paper proposes a method to classify photos based on shooting time and location. Algorithm 1 shows the process of our method.

figure a

As Algorithm 1 shown, we firstly obtain the photos’ shooting time and location, and the location information is represented by a 3-ary (W, J, H), where the three parameters’ meaning are latitude, longitude, and altitude. The unit of latitude and longitude is degree (°), and the unit of altitude is meter (m). Then, arrange the photos as a personal photo sequence, and afterwards calculate the time interval T and the photo distance L of any two adjacent photos. For the method to compute time interval is easy and obvious, we don’t detailed it here

  1. (1)

    Calculate photo distance

It is more accurate to consider three elements (W, J, H) than to only considering longitude and latitude for calculating distance. For example, there might be two photos with the same longitude and latitude, but different altitude. The commonly used method to calculate the geographical space distance is the spherical model, which regards the earth as a standard sphere, and the distance between two points on the sphere is the arc length. Any position point can be converted into three-dimensional coordinates of the sphere. The two locations are given (W1, J1, H1) and (W2, J2, H2).The steps to compute the distance of the two location is as below.

  1. a.

    To translate H1 and H2 into the distance of the location to the core of the earth, as shown with the formula (1).

    $$ \left\{ {\begin{array}{*{20}l} {h_{1} = H_{1} + 6371229m(the\,earth^{\prime} s\,radius)} \hfill \\ {h_{2} = H_{2} + 6371229m(the\,earth^{\prime} s\,radius)} \hfill \\ \end{array} } \right. $$
    (1)
  2. b.

    Convert longitude J1, J2 and latitude W1, W2 into radians j1, j2, w1 and w2, as shown with the formula (2).

    $$ \left\{ {\begin{array}{*{20}l} {w_{1} = W_{1} *\pi /180} \hfill \\ {w_{2} = W_{2} *\pi /180} \hfill \\ \end{array} } \right.,\quad \left\{ {\begin{array}{*{20}l} {j_{1} = J_{1} *\pi /180} \hfill \\ {j_{2} = J_{2} *\pi /180} \hfill \\ \end{array} } \right. $$
    (2)
  3. c.

    Convert the position information of the two photos into three-dimensional coordinates of the sphere, as shown by formula (3).

    $$ \left\{ {\begin{array}{*{20}l} {X_{1} = h_{1} *\cos (w_{1} )*\cos (j_{1} )} \hfill \\ {Y_{1} = h_{1} *\cos (w_{1} )*\sin (j_{1} )} \hfill \\ {Z_{1} = h_{1} *\sin (w_{1} )} \hfill \\ \end{array} } \right.,\quad \left\{ {\begin{array}{*{20}l} {X_{2} = h_{2} *\cos (w_{2} )*\cos (j_{2} )} \hfill \\ {Y_{2} = h{}_{2}*\cos (w_{2} )*\sin (j_{2} )} \hfill \\ {Z_{2} = h_{2} *\sin (w_{2} )} \hfill \\ \end{array} } \right. $$
    (3)
  4. d.

    The formula of the distance L of the two locations is as formula (4), by which we can obtain the distance of two give locations.

    $$ L = \sqrt {\left( {{\text{X}}_{ 2} - {\text{X}}_{ 1} } \right)^{2} + \left( {{\text{Y}}_{ 2} - {\text{Y}}_{ 1} } \right)^{2} + \left( {{\text{Z}}_{ 2} - {\text{Z}}_{ 1} } \right)^{2} } $$
    (4)
  5. (2)

    The fusion of time and distance

We consider the effect of time interval and photo distance on specific events. Before we do that, we first have to think about a situation. Some specific events happened by means of running or transportation, such as the {(P1, P2)} in the sample of Sect. 4.1, who are made on the train, and the rate of location change is fast, so the photo distance is relatively large. For example, the time interval between P1 and P2 is just 39 s, the photo distance is 1057.3 m, and the rate of location change is 27 m/s. It may lead to wrong effect to the fusion of time and distance. Considering actual situations, when the speed is faster than 2 m/s, it can generally be concluded that the object has moved by means of running or vehicles. Therefore, after calculating the rate of location change V of two adjacent photos, if V >2 m/s, the photo distance L will be reduced in proportion L= L / V.

Then, we consider the fusion of time and distance. Because the units are inconsistent, we must normalize them at first. The normalization method used is the Min-Max Normalization method. The formula is as follows:

$$ X = \frac{x - \hbox{min} }{\hbox{max} - \hbox{min} } $$
(5)

The x is the value before normalization, and the X is the value after normalization. Find the maximum and minimum values from the sequence of time intervals and photo distances. Then use the normalization method to convert the time interval T to T’ and the photo distance L to L’ between [0, 1].

After that we use time and distance to calculate spatio-temporal distance. Some photos are taken at the same location, but at large time intervals. These photos belong to different specific events, such as P3 and P4. Some photos with large changes in location but small changes in time belong to the same specific event, such as P1 and P2. So the weight of time and location for classification are likely to be different. We set weights to the time and location for calculating the spatio-temporal distance ST_D. We detail how to select the weight in Sect. 5.2.

$$ ST\_D = \lambda *T^{{\prime }} + ( 1 { - }\lambda )*L^{{\prime }} $$
(6)
  1. (3)

    Choose the best spatio-temporal distance as a classifier

When the best weights are selected, calculate the accuracy of classification by different spatio-temporal distances. When the accuracy is the highest, the corresponding spatio-temporal distance are the best spatio-temporal distance BST_D. Take the best spatio-temporal distance BST_D as a classifier.

Then calculate the spatio-temporal distance ST_D of two adjacent photos in the personal photo sequence. Compare ST_D and BST_D, if ST_D is not bigger than BST_D, these two photos belong to the same specific event; if ST_D is bigger than BST_D, these two photos belong to different specific events.

5 Evaluation

5.1 The Training Data Sets

The object of this study is to classify personal photo collections taken by mobile phones, but there are no relevant public data sets. The public photo data sets don’t contain all photos taken by a user with the mobile phone, such as Flickr. Only the photos that the user wants to share out, and the time interval of photos may be long apart. During this time interval, users actually took a lot of other photos with their mobile phones, but didn’t share them. In addition, photos shared by users may not be the photos taken by their own mobile phones. But our classification method is for personal photo collections taken by mobile phones, so the public photo data sets cannot be used for experiment and evaluation.

To ensure the validity of our classification method, we collected several personal photo collections. Considering time-length of photo collections may affect experimental results, six representative photo collections with different time-length were selected as the training data sets. The six personal photo collections were manually classified based on specific events, and the information showing in Table 2.

Table 2. Representative users’ information

5.2 Experiments

  1. (1)

    Determine the weight in the spatio-temporal distance ST_D

We experimented with six training data sets, that is, a total of 4126 photos. When we calculated the spatio-temporal distances of these photos, we found that most of spatio-temporal distances of the two adjacent photos in the same specific events are less than 0.001, and most of spatio-temporal distances between different specific events are greater than 0.01. Therefore, the range of spatio-temporal distances we experimentally selected for classification is 0.001~0.01. We selected different values of the weight \( \lambda \) to calculate the spatio-temporal distance. The value of \( \lambda \) is chosen from 0.1 to 0.9 with a step size of 0.1. Then we delineate the line charts of accuracy that select different spatio-temporal distances as the standard to classify photos when the weight \( \lambda \) takes different values.

As can be seen from the Fig. 1, when the accuracy is the highest in training data set, the weight \( \lambda \) is 0.7. Therefore, the spatio-temporal distances is calculated as ST_D = 0.7* T’+0.3* L’.

Fig. 1.
figure 1

The accuracy of classification by spatio-temporal distances with the weight \( \lambda \) taking different values

  1. (2)

    Determine the best spatio-temporal distance BST_D

From the Fig. 1, it can be seen that the data set has the highest sum of accuracy when classify photos by the spatio-temporal distance of 0.002. So the best spatio-temporal distance BST_D is selected as 0.002.

5.3 The Test Data Sets

Three users are selected randomly to test whether the method is effective, and it is also tested whether the album classifying photos based on this method can increase the speed of re-finding photos. The three users’ information is as following Table 3.

Table 3. The three test users’ information

5.4 Evaluations of the Classification Method

In order to verify the accuracy of the method, we use the three test users’ photo collections to classify photos with the spatio-temporal distance 0.002. Accuracy is a measure widely used in the field of information retrieval and statistics to evaluate the quality of results. We use the accuracy of the classification results to evaluate the methods. The accuracy is shown in Fig. 2. Since there is no relevant method of classifying based on specific events, we can’t conduct comparative experiments with other methods in terms of classification accuracy.

Fig. 2.
figure 2

The accuracy of classification by three users with the spatio-temporal distance 0.002

As shown by the Fig. 2, the accuracy of the classification based on shooting time and location is even over 95%. The biggest advantage of the proposed method is the time complexity is low, closing to O(n).

The reasons why the accuracy of the experimental results is less than 100% are that a few specific events’ time interval is really small and the change in location is little. For example, one person takes a photo of the related knowledge of the papers, and then takes a selfie next minute in the laboratory. Or photos with long time intervals and little change in location, but still are the same specific event, such as {(P6, P7)} in the sample of Sect.  4.1. Such situations can’t be identified by this method. So this method has some limitations.

5.5 Evaluations of the Specific Event-Based Album

To verify whether the strategy classifying photos based on specific events can improve the efficiency of re-finding photos, we used the Android Studio software and the Recyclerview framework to complete the photo album based on this classification approach. In order to compare with the current photo albums in re-finding photos, the layout of design album imitated the current photo albums’ structure. But the outermost layer of the album we designed is the representative photos of specific events, not all the photos. Each photo represents a specific event. Clicking the photo will jump to another page with all photos of this specific event. Currently, the characteristics of representative photos have not been studied yet, so we selected the first photos taken in specific events as the representative photos.

At present, there are a few mobile phones contains the photos classification by geographic information, events or face recognition. But there are some photo album applications including relatively complete photo classification, from which we have chosen one application with faster classification rate, and higher classification accuracy: ``time album’’. We compared the speed of re-finding photos in four ways storing photos taken by mobile phones: the classification based on specific events, the classification based on geographic information, the classification based on events and the current photo album. Fig. 3. shows the albums in four ways storing photos taken by mobile phone. Fig. 3.(a) shows the layout of the current photo album. Fig. 3.(b) and (c) respectively show the classification based on geographic information and the classification based on events in the time album. Fig. 3.(d) shows the album we designed classifying photos based on specific events. We let three test users use the four ways to re-find photos, and record the time spent on each searching. The re-finding results are shown in Table 4.

Fig. 3.
figure 3

(a) shows the layout of the current mobile phones’ album; (b) shows the classification based on geographic information in the time album; (c) shows the classification based on events in the time album; (d) shows outermost layer in the album we designed with each photo representing a specific event

Table 4. The users’ re-finding time

As can be seen from the results of finding photos, our classification method used to photo albums can greatly improve the speed of finding photos. But there are one photo’s re-finding time with our method is longer than that of other methods. By analysis, we find there are three main reasons:

  1. (1)

    During the experiment, each re-finding process used the classification based on specific events firstly. Because of the person’s own memory function, they would have some vague memory about the photos’ information. It would be favorable when users used the other three ways to re-find the same photos.

  2. (2)

    This is a new classification method. Users are not accustomed to using it, but they are skilled at using their own mobile phone’s original album or other existing photo classification methods.

  3. (3)

    The outermost representative photos selected by this album are not representative enough, and the user does not recall the specific event by this photo.

Even though there are some unfavorable factors to using this new classification to re-find photos, the results show that the classification approach based on specific events can greatly improve the speed of re-finding photos.

Thus, this classification strategy for albums can greatly improve the efficiency of re-finding photos. Because this approach can greatly reduce the time taken for sliding the screen up and down to re-find a photo, and the photos people often reuse are those ones recording information, which almost are on the outermost layer of the album. What’s more, this classification strategy conforms to people’s memory habits. Thus it saves time for re-finding.

6 Conclusions

This paper proposes a photo classification strategy which classifies photos based on specific events and meets the human memory habits. We also propose a classification method based on shooting time and location with high accuracy and low computational cost. It can be used for classifying photos and greatly improve the efficiency of re-finding photos. However, the accuracy of classification results cannot reach 100%, and a few photos’ re-finding time is longer than using the original photo album. The next step we plan to consider combining the shooting time, location and the photos’ semantic content to find a more accurate method classifying photos based on specific events. Besides, we’ll do more research about the classification strategy to improve the re-finding efficiency.