Keywords

1 Introduction

In the process of creating pedestrian dynamics models (or their adaptation to new situations), the determination of parameters for these models and their validation are very important elements. For this to be possible, it is necessary to perform measurements in real pedestrian behavior situations [6]. Typically, in order to obtain the results of such measurements, researchers plan and conduct experiments with groups of pedestrians. An alternative method, presented in this paper, is to use data from publicly available web cameras (webcams).

This paper presents examples of extracting characteristics for a pedestrian dynamics model obtained thanks to a system we implemented that processes data from publicly available web cameras.

In times of the pandemic, methods of assessing social distance based on camera images have gained a lot of importance. In most cases when the dynamics of pedestrians is monitored using the image from cameras, we do not have additional tools (e.g. sensors indicating additional positioning) that would enable a more precise assessment of the configuration of people within the range of a camera. One can point out different approaches to crowd analysis using video cameras. Tran et al. [12] proposed a graph-based framework of grouped pedestrians based on Social Distances Model (SDM) [14] - a Cellular Automata (CA) based model of crowd based on proxemics rules, where social distances around a pedestrian are applied. CA based methods of crowd dynamics can use data-driven paradigm and adaptable lattice [1]. One can also point out continuous Social Force [4] base crowd dynamics simulations [13]. Another trend is based on using bio inspired methods in crowd analysis [2], while the task of crowd counting and density maps estimating from videos [10] is often carried out using convolutional neural networks [5, 9].

2 Application of Web Cameras

Advantages of using publicly available web cameras in comparison with conducting pedestrian experiments include: no costs needed for organizing the experiment and using the equipment, access to video data located in different locations, which gives the possibility to acquire data e.g. from different cultures and in different social situations (e.g. pandemic condition).

This approach also has significant drawbacks in the area of both planning and conducting experiments. The most important of these stem from the inability to stage a specific situation. This results in, among other things, an inability to measure parameters for situations that rarely occur in reality (such as evacuation associated with the appearance of a threat) and an inability to repeat measurements for a given situation. In addition, it is not possible to set the position and area of registration of the camera, and the measurement is limited to video data (it is not possible to obtain data from other sensors and to use the sensor fusion technique).

2.1 Object Identification

YOLOv3 [7, 8] object detector trained on the COCO dataset was used to identify objects in the video frame and determine their position. An example of the results of recognition of different types of objects by YOLO for a webcam showing a view of Grodzka Street in Krakow are presented in Fig. 1. As can be seen, most pedestrians were correctly recognized even when they were far from the camera.

2.2 Mapping from 2D Camera Space to 3D World Space

Some characteristics related to pedestrian dynamics can be determined without knowing metric relations in 3D world space, directly from the 2D image from the camera; they include, for example:

  • a number of pedestrians in the space observed by the camera

  • space utilization (determines how often a location is occupied by a pedestrian)

  • the location of Points of Interest (POI)

Fig. 1.
figure 1

The upper picture shows the view from the webcam on Grodzka Street in Krakow. The bottom picture shows the same image with the objects recognized by YOLO, where: p - person (orange frame), b - bicycle (green frame), d - dog (yellow frame), h - handbag (magenta frame), u - umbrella (brown frame), c - car (brown frame). (Color figure online)

Fig. 2.
figure 2

Grodzka Street relative space utilization. The upper left part of the image: the palette used to indicate the degree of space utilization: red color - the highest space utilization, blue color - the lowest. (Color figure online)

Figure 2 shows an example of calculated space utilization for the visible portion of Grodzka Street. Points of Interest can be determined on the basis of space utilization distribution. They correspond to the areas with the highest values of space utilization (i.e. places where people stay the most often and for the longest time).

There are, however, many characteristics that require the determination of metric spatial relationships, such as:

  • density of people in a given area (number of people/area)

  • absolute speed of movement of individuals

  • distribution of absolute and relative pedestrian speeds

  • minimum distances between pedestrians

In order to determine spatial relationships between objects (e.g. people), it is necessary to define the mapping from 2D camera space to 3D world space. The mapping is possible thanks to the camera calibration process, i.e. the projection matrix from 3D points (in world space) to 2D points (in camera space). Camera calibration methods can be classified into two main categories [11]:

  • methods based on known calibration objects

  • methods that do not depend on prior knowledge of camera scenes, so called ‘camera self-calibration’

In the context of using public webcams to determine parameters of pedestrian dynamics models, a particularly useful method is camera self-calibration based on the video of walking persons [3, 11].

In the case of the camera showing the view of the AGH Main Street, a procedure was carried out to determine the mapping from 2D camera space to 3D world space. The procedure was performed based on metric measurements for characteristic elements present in the image, such as street width, the height and distance between the street signs, the size of the pedestrian crossings, the sizes of the benches, etc.

The method for determining minimum distances between pedestrians is shown in Fig. 3 and 4. Figure 3 shows two pedestrians passing each other on the sidewalk. The successive pedestrian’s positions are presented with an interval of 4 s (every 100 frames; the frame rate for this camera is 25 frames per second).

Fig. 3.
figure 3

Two pedestrians passing each other on the sidewalk of the AGH Main Street. The image shows the frames from a video sequence recorded every 4 s.

Figure 4 shows the moment of the greatest proximity between the pedestrians in Fig. 3. The yellow and blue frames present the area in which the pedestrians are located recognized by YOLO. The red markers are used to indicate the pedestrian center points on the ground plane. The distance in the image between the red markers (pedestrian center points) is 14 pixels, which corresponds to about 150 cm after taking into account the mapping from 3D world space to 2D camera space, while the distance between the pedestrian occupied areas (represented by the yellow and blue frames) in the image is 14 pixels, which corresponds to about 67 cm in 3D space.

Fig. 4.
figure 4

The closest approximation between two passing pedestrians. The yellow and blue frames were generated by YOLO and show the area in which the pedestrian was identified. The red markers show the calculated center point of the pedestrian on the sidewalk plane. (Color figure online)

Another characteristic which is important in the context of calibration and validation of pedestrian dynamics models is related to the motion paths for individual pedestrians. These paths allow us, among other things, to verify if the applied model of interaction between pedestrians (based on e.g. social distances model) is correctly determined. An example of determined motion paths based on images from webcams is shown in Fig. 5.

Fig. 5.
figure 5

Example of the motion paths detected for two pedestrians (highlighted in red and green, respectively). The time between successive positions of the pedestrians presented in the image is 2 s. (Color figure online)

3 Conclusions

As part of the project, we created and tested an application that allows tracking the trajectory of people, determining the distances between people and indicating space utilization and some configuration patterns in the crowd. Thus, we have an important element in the data-driven modeling scheme based on the images from web-cameras. Calibrating and validating pedestrian dynamics models using data from public web cameras has many advantages. These undoubtedly include the high availability of data from different geographical locations (from different cultures) and social situations (e.g. under pandemic conditions). Additionally, thanks to widely available object detectors such as YOLO (even pre-trained on datasets such as COCO), the cost of implementing such a solution is relatively low. However, this approach also has disadvantages with respect to planning and executing experiments involving groups of individuals, the most important of which are related to the inability to plan the observed situations. Therefore, it seems that the best approach for model calibration and validation purposes is to combine these two techniques, which will allow obtaining synergistic effects.