1 Introduction

3D scanning is an epoch-making technology for digitally preserving real-world three-dimensional shapes. In the digital archiving of cultural heritage, various projects are underway to digitally preserve historically significant tangible cultural assets via 3D scanning (Parry 2005; Zorich 2003). The digital preservation data of tangible cultural assets are valuable not only for preserving cultural heritage but also for computer-based visualization and visual analysis. However, many culturally valuable tangible cultural assets have highly complex internal three-dimensional structures. Therefore, for visual analysis, high-quality transparent visualization is needed to observe the details by examining the internal structures.

In this study, we propose a method for "transparent multi-view 3D visualization" to further enhance the effectiveness of transparent visualization. In multi-view 3D visualization, a multi-view 3D display is used, allowing the observation of distinct images from three or more viewpoints arranged horizontally. In this visualization, depth perception is enhanced through binocular disparity and motion parallax, which are described below. In this study, we integrate multi-view 3D visualization with transparent visualization and introduce novel visual guides. Through this approach, we aim to improve the accuracy of the perceived depth of the generated 3D images. We specifically apply our approach to tangible cultural assets with large depth and complex internal structures to examine the effectiveness of our approach.

Transparent multi-view 3D visualization is advantageous for intuitively understanding the complex three-dimensional structures of a 3D shape both externally and internally. This is because of the effects of binocular disparity and motion parallax in multi-view 3D visualization. However, in the case of transparent multi-view 3D visualization, objects overlapping in the foreground and background become transparent. As a result, the occlusion mentioned later is missing, often making it difficult to accurately determine the positions of individual objects. This leads to a perceptual underestimation of depth (Kitaura et al. 2017). This depth underestimation is not limited to transparent multi-view 3D visualization but can also manifest in various styles of virtual reality (VR) environments that incorporate transparent visualization.

Recently, we reported that multi-view 3D visualization effectively reduces the complexity of transparent viewing (Kitaura et al. 2017; Sakano et al. 2018; Aoi et al. 2024). This paper introduces a novel visual guide for enhancing depth perception, especially for visualizing tangible cultural assets. The proposed visual guide consists of 3D edges extracted from the 3D scanning point cloud. Building upon Weinman et al.’s statistical edge extraction method (Weinmann et al. 2013, 2014), we introduced opacity-based edge highlighting to visualize the extracted 3D edges as sharp lines. In our recent study (Aoi et al. 2024), we employed these sharp lines as a visual guide for multi-view 3D structures and conducted a psychophysical experiment. We found that such edge highlighting mitigates the depth underestimation. In the present study, we further take advantage of the ability of the SPBR to adjust opacity by implementing "opacity gradation" along the depth direction when visualizing 3D edges. This opacity gradation manifests as a "luminance gradient" in the generated images, enhancing the effectiveness of the visual guide for depth perception. The contribution of this paper is to provide edge highlighting with depth-dependent opacity gradation as a visual guide to improve the accuracy of depth perception. We introduced depth-dependent opacity gradation to edge highlighting to clarify the front-back relationship of the target object through the effects of occlusion and shading. We found that this method mitigates the depth underestimation.

In this paper, we choose the "Yama-hoko floats," the festival floats used in the Gion Festival in Kyoto, Japan, as the tangible cultural properties for our visualization targets. Constructed by assembling numerous prism-shaped pieces of wood, these floats exhibit many 3D edges. They also have sufficient depths suitable for our experiments of depth perception. These characteristics make the Yama-hoko floats well suited for our study.

2 Related work

The data obtained through 3D scanning are represented as a point cloud. As a result, visualizations that accurately represent the raw data have traditionally been achieved through point-based rendering, which directly employs points as the fundamental rendering primitives (Guidi et al. 2005; Ikeuchi et al. 2007). However, most of these visualizations are opaque, and transparent visualization is seldom used. The reason is that 3D scanning data are usually massive, ranging from tens of millions to billions of points. Several transparent point rendering methods proposed in the 2000 s (Zwicker et al. 2002; Zhang and Pajarola 2006; Zwicker et al. 2007) struggled to visualize point clouds of such sizes at interactive speeds. Additionally, in 2016, "stochastic point-based rendering (SPBR)" was proposed as a method capable of transparently visualizing point clouds with hundreds of millions of points (Tanaka et al. 2016). The SPBR has been successfully applied to 3D scanning data of various cultural heritage sites ((Hasegawa et al. 2018; Uchida et al. 2020)). In this study, in the transparent visualization of 3D scanning data with SPBR, we also utilize "opacity-based edge highlighting" (Kawakami et al. 2020), highlighting the 3D edges of the scanned object as sharp lines.

The technique of extracting 3D edges from a 3D scanning point cloud has been actively researched recently (see the review in (Rusu 2013) for details). Statistical methods utilizing the eigenvalues of the covariance matrix of local point distributions have gained popularity in recent years (West et al. 2004; Jutzi and Gross 2009; Rusu 2010; Toshev et al. 2010; Demantké et al. 2011; Mallet et al. 2011; Weinmann et al. 2013, 2014; Dittrich et al. 2017; He et al. 2017).

It is well known that humans use various depth cues to perceive depth. Representative depth cues include binocular disparity and motion parallax (Howard and Rogers 2002). Binocular disparity is the difference in the positions of the retinal images in both eyes when observing an object. Motion parallax is the temporal change in the retinal images of an object caused by changes in the head position. Several studies have reported the effects of binocular disparity and motion parallax on the three-dimensional perception of visualized transparent objects (Marta et al. 2006, 2014; Hsu et al. 1995; Calhoun et al. 1999; Mora and Ebert 2004). For example, when observing an optically pure absorptive cylinder from a viewpoint along its vertical axis, it has been reported that binocular disparity assists in determining the direction of rotation (Marta et al. 2006). Another important cue is occlusion, which occurs when an object is in front of or behind another object, and the object in the back is partially occluded by the object in the front. Occlusion also serves as a crucial depth cue (Heine 1905). Additionally, humans perceive depth through linear perspective. The linear perspective is another depth cue where, for instance, parallel lines extending in the depth direction appear narrower as they go deeper.

The human visual system tends to interpret object with low luminance contrast as far from the observer (i.e., depth perception from the aerial perspective cue). Finally, it is well known that depth can be seen from shading (i.e., shape-from-shading). There are two types of shape-from-shading (Sakano et al. 2018). The first is based on the light-from-above assumption (Mamassian and Goutcher 2001; Ramachandran 1998; Sakano and Ando 2012; Sun and Perona 1998). In that case, for instance, when a circle has a luminance gradient in a way that the upper part is bright and the lower part is dark, the circle appears convex. The second is based on the dark-is-deep rule, where the dark part of an object is seen farther (Chen and Tyler 2015; Christou and Koenderink 1997; Langer and Bulthoff 2000; Sakano et al. 2018; Schofield et al. 2006; Sun and Schofield 2012; Tyler 1998). The second type of shading and occlusion were utilized in the present study to improve the accuracy of perceived depth as described in Sect. 3.3. For more comprehensive description of depth cues, see (Marriott et al. 2018; Howard and Rogers 2002).

3 Visualization method

In this section, we explain the SPBR used as a transparent visualization method. Next, we explain the extraction method using feature value of edges as visual guide. Then, we explain the method of extracting thin edges and the method of changing the opacity of edges according to the depth, noting that the opacity is controlled by the point density.

3.1 Stochastic point-based rendering (SPBR)

SPBR is a transparent visualization method applicable to a variety of point clouds (Tanaka et al. 2016; Hasegawa et al. 2018; Uchida et al. 2020). The rendering procedure is as follows: (1) The given point cloud is randomly divided into subgroups with the same number of points. (2) For each subgroup, the points are projected onto the image plane to create intermediate images. (3) An average image is created by performing averages over the pixels in the set of intermediate images. These three steps enable the generation of a high-resolution transparent image from a given point cloud (Fig. 1). When creating a transparent image, the local opacity \(\alpha\) is controllable via the local point density.

Fig. 1
figure 1

Rendering procedure in SPBR

We let \(s_{ \text{p} }\) be the cross-sectional area of a point corresponding to one pixel in the image plane and \(s_{ \text{A} }\) be the local surface area in which the local opacity is adjusted, n is the number of points in \(s_{ \text{A} }\), and L is the number of subgroups. Then, the opacity \(\alpha\) of the local surface \(s_{ \text{A} }\) obeys the following opacity formula:

$$\begin{aligned} \alpha = 1 - \left( 1 - \frac{ s_{ \text{A}} }{ s_{\text{p}} } \right) ^ { \frac{ n }{ L } } \ . \end{aligned}$$
(1)

3.2 Opacity-based edge thinning

The extraction of 3D edges from point-based objects is usually executed based on principal component analysis (PCA) (Weinmann et al. 2013, 2014). We define a local sphere centered at each point and calculate the covariance matrix of the local point distribution inside the sphere. The feature values that extract the 3D edges can be defined by using the eigenvalues \(\lambda _1\), \(\lambda _2\), and \(\lambda _3\) of the obtained covariance matrix. In our study, we use “linearity” as the feature value:

$$\begin{aligned} L_{\lambda } = \frac{ { {\lambda }_{2} }-{ {\lambda }_{1} }}{ {\lambda }_{1} }, \end{aligned}$$
(2)

which becomes large in high-curvature areas such as the 3D edges and corners of a point-based object. We normalize \({L_{\lambda }}\) such that it has values between 0 and 1. The normalized feature value is denoted by f below.

The simplest feature-region extraction consists of using feature f to extract points with feature values greater than a given threshold \(f_\text{th}\) and saving the obtained points as feature points. Then, feature-highlighting visualization is executed by giving an appropriate highlight color to the feature points. However, the visualized 3D edges tend to be thick when this simple method is used, and their widths are often nonuniform. Therefore, we adopt the opacity-based thinning of 3D edges proposed in (Kawakami et al. 2020). The idea is to assign a higher opacity to the feature points with a high feature value f. The opacity \(\alpha\) is represented as a function \(\alpha (f)\) of the feature f (in this study, "linearity") and is called the opacity function. The relation between f and \(\alpha\) is realized by defining the following function \(\alpha (f)\):

$$\begin{aligned} \alpha (f) = \left( {\alpha _\text{ max }} - {\alpha _\text{ min }} \right) \left( \frac{f-f_\text{th}}{1.0- f_\text{th}}\right) ^ {d} + {\alpha _\text{ min }} \, \end{aligned}$$
(3)

d is the power to control the speed of the opacity growing. On the extracted 3D edges, the minimum opacity is \({\alpha _\text{ min }}\), and the maximum opacity is \({\alpha _\text{ max }}\) on the periphery. Then, the brighter regions will be concentrated around the points with higher feature values, which results in the thinning of the 3D edges. This opacity-based thinning makes the 3D edges look thin and sharp during visualization. The visualized 3D edges can show greater linear perspective effects. When adjusting the opacity according to formula (3), we make upsampling/downsampling of points on the 3D edges to realize the required point densities. Below, the point proliferation ratio corresponding to \({\alpha _\text{ min }}\) and \({\alpha _\text{ max }}\) are denoted as \(a_\text{min}\), \(a_\text{max}\), respectively.

3.3 Depth-dependent opacity gradation

We found that edge highlighting mitigates the underestimation of depth (Aoi et al. 2024). However, for objects with complex shapes, the front-back relationship of edges becomes unclear. Therefore, the proposed method mitigates the ambiguity of the front-back relationship by utilizing the depth-dependent opacity gradation. Edges are extracted by performing the opacity-based edge thinning as described in Sect. 3.2, and then the depth-oriented opacity gradation is applied. The depth value z is obtained from the target data, observed from the viewing direction. The smallest depth is defined as \({{z}_\text{near}}\), and the largest depth is defined as \({{z}_\text{far}}\). The opacity values of points at the smallest and largest depths are defined as \(\alpha _\text{ near }\) and \(\alpha _\text{ far }\), respectively. The relation between z and \(\alpha\) can be obtained by defining the following function \(\alpha ({z})\):

$$\begin{aligned} \alpha (z) = \alpha _\text{ near } + \left( \frac{z-{z_\text{near}}}{{{z}_\text{far}}- {{z}_\text{near}}} \right) ^ q \left( {\alpha _\text{ far }} - {\alpha _\text{ near }} \right) \, \end{aligned}$$
(4)

q is the power to control the rate of opacity gradation. In the experiments of Sect. 4, opacities \(\alpha _\text{ near }\) and \(\alpha _\text{ far }\) are indirectly controlled through proliferating/thinning points at the depths \({{z}_\text{near}}\) and \({{z}_\text{far}}\). The point proliferation ratios are denoted as \(a_\text{near}\) and \(a_\text{far}\), respectively.

By manipulating the opacity of black objects on a white background, two visual effects are expected: occlusion effect and shading effect. Figure 2 shows two examples using black point clouds, demonstrating the opacity gradient controlled based on depth. When the opacity of the object in the front is high, the occlusion effect (Heine 1905) is increased by making the edges of the foregrounds objects clearly visible, thereby leading to a comprehensive front-back relationship. The left side is the case where the opacity is decreased along the direction of the line of sight. The surface located in the front has a higher opacity, thus appearing dark, which partially occludes the surface behind and creates the occlusion effect. On the other hand, on the right side of Fig. 2, the opacity of the front surface is low, which obscures the occlusion effect. Thus, the surfaces become darker with increasing opacity toward the back. The accuracy of perceived depth is expected to increase by introducing the shading effect. As described above, by introducing depth-dependent opacity to edge highlighting, the front-back relationship of target objects can be clarified with the effects of occlusion and shading. Therefore, we expected that edge highlighting with depth-dependent opacity gradation could improve the accuracy of perceived depth.

Fig. 2
figure 2

Examples of depth-dependent opacity gradation

4 Experiment

4.1 Experimental setup

A 42-inch multi-view autostereoscopic 3D display with a parallax barrier (TRIDELITY Display Solutions LLC, New Jersey, USA) was used to visualize the stimuli. This device provides binocular disparity and motion parallax by utilizing five preset viewpoint images, allowing observers to see 3D images without the need for special glasses (Dodgson et al. 1999; Dodgson 2005; Son and Javidi 2005; Hill and Jacobs 2006; Konrad and Halle 2007; Jain and Konrad 2007). The resolution for each preset image was 1920 \(\times\) 1080 pixels. The display was designed so that the optimal viewing distance for achieving the best image quality was 350 cm. In the experiments, participants were instructed to sit at this distance.

The experimental conditions included the following: (1) monocular without motion parallax, (2) binocular without motion parallax, (3) monocular with motion parallax, and (4) binocular with motion parallax. In the condition with motion parallax, participants were instructed to move their heads horizontally so that they could see all five images of the multi-view 3D display. In the condition without motion parallax, participants were instructed to keep their chin on a chin rest and observe the stimuli without any head movement. Irrespective of the presence of motion parallax, the participants were instructed to keep their eyes at the level of the center of the display. For the condition without binocular disparity, a blinder was used to cover the non-dominant eye, and participants viewed the stimuli with only one eye.

We prepared a test stimulus image and verified that the participants were able to accurately perceive the image in 3D with multiple viewpoints by horizontally moving their heads. The order of the stimulus presentations was random for each participant and was independent among the participants. The exposure time for each experimental image was 15 s. All participants had either normal visual acuity or corrected-to-normal visual acuity and possessed normal stereo vision (Vancleef and Read 2019).

4.2 Experimental data

The 3D point clouds used in the experiment were "Taishi-yama" and "Fune-hoko", which are the parade floats in the Kyoto Gion Festival. The Taishi-yama data was used in the experiment as Data A. Data B was obtained by extracting relevant points from the decomposed Fune-hoko data. The transparent visualization results with edge highlighting of uniform opacity overlaid on the entire point cloud for each of Data A and Data B is shown in Fig. 3a and b. For Data A and Data B, we set \(f_\text{th}=0.3\), \(a_\text{min}=1\), and \(a_\text{max}=5\). The values of \(f_\text{th}\), \(a_\text{min}\), and \(a_\text{max}\) were adjusted so that the edges were not broken and not too thick. Opaque images of Data A and B are shown in Fig. 4. Data A has a complicated shape and is surrounded by a frame. Data B had a shape in which parts were combined vertically and horizontally.

Fig. 3
figure 3

Transparent visualization of Data A and B with edge highlighting and uniform opacity

Fig. 4
figure 4

Front, side, and top views of Data A and Data B used in the experiments. Each side of the squares, which surround the visualized objects, is given the same color as the coordinate axis pointing in the same direction

We tested three types of direction of opacity gradation in edge highlighting; the opacity was uniform (Fig. 3), increased (Figs. 5, 6), or decreased (Figs. 7, 8) along the gaze direction. The opacity gradation function was linear (\(q=1\)) or cubic (\(q=3\)). In Figs. 5, 6, 7 and 8, the point proliferation ratios to indirectly control \(\alpha _\text{ near }\) and \(\alpha _\text{ far }\) are set as follows. In Fig. 5, we set \(a_\text{near}=1.0\times 10^{-3}\) and \(a_\text{far}=40\). In Fig. 6, we set \(a_\text{near}=0.5\) and \(a_\text{far}=40\). In Fig. 7, we set \(a_\text{near}=10\) and \(a_\text{far}=1.0\times 10^{-6}\). In Fig. 8, we set \(a_\text{near}=10\) and \(a_\text{far}=1.0\times 10^{-4}\). Note that point proliferation with \(a = 40\) corresponds to \(\alpha \simeq 1.0\), while \(a = 1.0 \times 1.0^ {-6}\) to \(\alpha \simeq 0\). In the case of increasing opacity, \(\alpha _\text{ near }\) was set so that the horizontal bar in the foreground was just barely visible, and \(a_\text{far}\) was set so that opacity did not converge to 1 in the middle. In the case of decreasing opacity, \(\alpha _\text{ near }\) was set so that the opacity did not converge to 1 so that the back horizontal bar could be seen. In addition, \(a_\text{far}\) was set so that the back horizontal bar was just barely visible. The value of q were simply set to two patterns: one was the case of a constant changing rate of opacity gradation, and the other was the case of a drastic changing rate of opacity gradation.

Fig. 5
figure 5

Data A with the edge highlighting with opacity increasing along the line of sight, \(a_\text{near}=1.0\times 10^{-3}\) and \(a_\text{far}=40\)

Fig. 6
figure 6

Data B with the edge highlighting with opacity increasing along the line of sight, \(a_\text{near}=0.5\) and \(a_\text{far}=40\)

Fig. 7
figure 7

Data A with the edge highlighting with opacity decreasing along the line of sight, \(a_\text{near}=10\) and \(a_\text{far}=1.0\times 10^{-6}\)

Fig. 8
figure 8

Data B with the edge highlighting with opacity decreasing along the line of sight, \(a_\text{near}=10\) and \(a_\text{far}=1.0\times 10^{-4}\)

4.3 Experimental conditions

The participants were 31 males and 3 females in their 20 s to 40 s. A total of 40 experimental cases (two types of data (2), edge highlighting with uniform, increased, or decreased opacity along the gaze direction in linear or cubic function (1+2+2), with and without binocular disparity (2), and with and without motion parallax (2)) were examined. The participants were instructed to estimate the perceived distance between two objects indicated by a circle and a square, using the length of the right vertical reference line as a unit value (Fig. 9). Note that the circle and the square were presented only on a questionnaire (i.e., instruction) image. Therefore, the perceived values were non-dimensional and their units were ratios. The depth range selected was limited to the range in which the 3D image could be presented clearly with the multi-view 3D display used. In addition, the participants were asked to report a positive value when the circle was perceived in front of the square and a negative value when the circle was perceived to be behind the square. The correct values (i.e., simulated depths) for Data A and B were 3.45 and 6.00, respectively.

Fig. 9
figure 9

Examples of the reference line (the vertical line on the right side) presented in the experiment

4.4 Experimental results and discussion

Figure 10 shows the experimental results. The reported depth was lower than the simulated depth in all cases (t-test, \(p<0.05\)). Most importantly, the reported depths of all images with opacity gradation in edge highlighting were larger than those of images with uniform opacity in edge highlighting (paired t-test, Fig. 10). This result suggests that the opacity gradation in edge highlighting mitigates the perceptual underestimation from an average of 69.4% to 35.5% of depth in transparent multi-view 3D visualizations.

Effects of the direction of gradation opacity in edge highlighting were statistical significant (\(p<0.05\)) for Data B with binocular disparity and motion parallax no matter whether the opacity changed linearly or cubically (Fig. 10). Specifically, perceived depths of the images with opacity decreasing along the gaze direction were larger (i.e., closer to the simulated depth) than those of the images with increasing opacity. Figure 10 shows similar tendency also in the other conditions although the effects did not reach the statistical significance (\(p>0.05\)).

Effects of the function of gradation opacity in edge highlighting (i.e., whether linear or cubic) did not reach the statistical significance (\(p>0.05\)). However, for Data B, the combination of the direction and the function of gradation opacity in edge highlighting were statistically significant no matter whether binocular disparity and motion parallax were available (Fig. 10). That is, perceived depths of the images with opacity decreasing along the gaze direction linearly were significantly larger than those of the images with the opacity increasing cubically.

Therefore, for Data B, the images that induced perceived depths closest to the simulated depth were those with opacity decreasing linearly along the gaze direction. Similarly, for Data A, the images that induced perceived depths closest to the simulated depth were those with opacity decreasing along the gaze direction.

For Data A, the effects of the function of gradation opacity in edge highlighting (i.e., whether linear or cubic) were not clear (Fig. 10). The reason why only for Data B, the function of gradation opacity in edge highlighting (i.e., whether linear or cubic) were observed might be that for Data B, the luminance contrast of the bars lying from left to right was prominent. Specifically, when the opacity increased along the gaze direction (Fig. 6), the luminance contrast of the front bar was lower for the cubic condition (Fig. 6b) than for the linear condition (Fig. 6a). Because the human visual system tends to interpret object with low luminance contrast as far from the observer (i.e., the aerial perspective depth cue, (Howard and Rogers 2002)), the front bar in the cubic condition might have been seen farther than that in the linear condition. Similarly, when the opacity decreased along the gaze direction (Fig. 8), the luminance contrast of the back bar was lower for the linear condition (Fig. 8a) than for the cubic condition (Fig. 8b). Hence, the back bar in the linear condition might have been seen farther than that in the cubic condition. On the other hand, for Data A, the structure was much more complex, thereby making it more difficult to see the differences in luminance contrast of the bars lying from left to right. This might be the reason why the effects of the function of gradation opacity in edge highlighting (i.e., whether linear or cubic) were not observed for Data A. However, because we tested only these two structures in the present study, further study is required to clarify the effects of the visualized 3D structure on perceived depth.

The perceived depth was larger in the increasing opacity condition than in the uniform opacity condition (Fig. 10). As described in Sect. 3.3, we attribute this result to the effect of shading depth cue (Sakano et al. 2018; Chen and Tyler 2015; Langer and Bulthoff 2000; Tyler 1998; Schofield et al. 2006; Sun and Schofield 2012). On the other hand, considering the aerial perspective depth cue, one might suppose that in some conditions which we did not employ in the present study, perceived depth might be larger in the uniform opacity condition than in the increasing opacity condition. Since the conditions we examined in the present study were limited and the two depth cues (i.e., aerial perspective and shading cues) are in conflict in the opacity gradation, further study is required to clarify whether there are conditions in which perceived depth is larger in the uniform opacity condition than in the increasing opacity condition.

Binocular disparity and motion parallax also increased the perceived depths (\(p\text{s}<0.05\)). These results suggest that binocular disparity and motion parallax produced by a multi-view 3D display can also mitigate the perceptual underestimation of depths of cultural heritage visualized with edge highlighting and depth-dependent opacity gradation.

As shown in Fig. 3, the contrast of the images of the whole presented objects in the uniform condition was somewhat low. This is because we controlled the opacity of the objects by changing the point density, keeping the opacity of the whole objects uniform, and avoiding the convergence of the opacity to one. Therefore, this somewhat low contrast was inevitable. However, this somewhat low contrast could have induced smaller perceived depth. Further study is required to clarify the effect of the contrast of the whole object on perceived depth of 3D structure visualized by point clouds with edge highlighting.

Fig. 10
figure 10

Experimental results for Depth-Dependent Opacity Gradation. The error bars indicate the standard error of the mean (SEM)

5 Conclusions

In our recent study, we proposed edge highlighting as a visual guide for multi-view 3D structures and found that such edge highlighting mitigates the depth underestimation. In the present study, we proposed adding depth-dependent opacity gradation to the edge highlighting and conducted a psychophysical experiment. We found that such opacity gradation in edge highlighting further mitigates the depth underestimation from an average of 69.4% to 35.5%.

Further research is needed to clarify the effects of shape and color of the structure and the background. The effects of settings parameters (e.g., opacity) and those of participants’ age and sex also need to be clarified. In the future, we suggest excluding the display of the back edges covered by the front edges when utilizing edge highlighting.