1 Introduction

Akin to the real world, interaction brings realism in Virtual Environments (VE). Out of the four interaction tasks as categorized by Gabbard [1], navigation is more important because most often than not, it is the first step to perform any selection and/or manipulation. The far-off spreading of virtual scenes cannot be viewed from a single static point of view. Therefore user should have the ability to navigate in the VEs and to explore different parts and portion it encloses. Furthermore, to manipulate an object of a 3D world, we have to select the object and before selection we have to access the object. Thus, navigation becomes a preliminary task even if it is not desired. Although, a number of navigation techniques have been proposed so far to make this frequently used interaction errors-free and cost-effective, but naturalism and intuition are still the challenges to be covered. Low-cost vision based tracking on the other hand provides an applicable platform to devise a flexible interface for Human Computer Interactions (HCI). This research work is an attempt to make navigation simple and realistic by bridging real and virtual world while keeping accuracy and cost in checks. VEN-3DVE tracks real world movement of index finger using an ordinary camera to walk inside the designed synthetic world. At the back-end, from a scanned dynamic image, the system traces position of index finger. Area of the tip-of-index finger is calculated to predict position of hand on z-axis. The position and area of index finger are then forwarded to Front-end of the system for actual operation. Front-end of the system renders the scene accordingly with an avatar representing user’s position in the VE. A fingertip-thimble made of simple piece of paper is used to segment out index from the rest of a scanned image frame. The thimble is marked green for unambiguous and hasty segmentation. Forward and backward gesture of index finger about the z-axis performs navigation while horizontal and vertical movement of the finger performs panning about x-axis and y-axis respectively. The system can be used for all the three sub-navigation tasks; exploration, searching and inspection as discussed by Tan et al. [2].

VR based Computer Aided Design(VR-CAD) is on the rise to make designing process more simple and natural. Navigation and Panning are used in VR-CAD particularly in the designing of large Digital Mock-Up (DMU) [3]. The proposed approach can be easily extended to gesture based VR-CAD without using any extra tool or toolkit [4]. Furthermore, the issue of pointing imprecision [5] can be avoided to a satisfactory extent.

This research paper is structured into six sections. Section 2 is about related work and Sect. 3 explains details of the proposed system. Section 4 illustrates implementation and evaluation details while Sect. 5 is about applicability of the approach in VR-CAD. The last section is about conclusion and future work.

2 Related work

The continual advancement in storage and processing of computer system aids in the emerging virtual reality applications. However, the improvement made so far in 3D interaction fails to follow the ceaseless progress. Being the most oftenly used interaction task, navigation is in the focus of different research works to make exploration of VEs feasible and flexible. Most of the techniques for direct navigation make use of traditional mouse/keyboard or HMD [2]. Where the former lacks intuition in interaction and engineering design [7], the later remains a second choice because of high cost. The Multi-Finger gestural navigation of Malik et al. [8] works on a constrained tabletop surface. Moreover, the system is bi-manual and is therefore not suitable for hand-held devices. The NuNav3D approach of navigation necessitates whole body pose estimation before hand gesture recognition [9], furthermore the system is 79% slower than joy-pad based navigation. The technique of Lee et al. [10] consumes a large amount of processing for finger action recognition as the system ought to pass from three hefty stages; skin-color detection, k-cosine based angle detection and contour’s analysis each for finger’s state, position and direction. Furthermore, due to variable finger’s thickness, accuracy of the system varies from user to user. In the Drag’n Go [11] the screen cursors position casts ray to the target for navigation. As a straight path is to be followed, therefore the technique suites well for navigation inside a large empty space but unadoptable for zigzag navigation. Similarly in the system of Tan et al. [12], user have to drag a mouse in a particular direction to move virtual camera. For a long travel user have to repeat the practice over and over again. Furthermore, if the ray collides with an object in the way, the system mistakenly inspects the collided object instead of proceeding navigation. With head-directed navigation approach [13], navigation speed and direction are calculated from head pose. The estimation may mistakenly interpret casual head’s movement. The MC (Management Cabin) in FmF (Follow my Finger) model [14] projects view of 3D world on 2D table-top device suffers from disorientation. Hand gesture recognition using different trackers [15, 16], are the navigation techniques where virtual scene is treated as a big object and are grabbed/moved with both hands. The techniques are interesting but the hand-worn overload and the complex nature of use make them a rare option. The walking in place approach [17] relates user walking pace with navigation speed in virtual environment. The approach cannot be used while setting on chair. The system needs a large number of sensors and is implementable only inside a dedicated lab. Similarly, the foot-based interface for navigation suggested by [18] requires a cumbersome setup of waist-based magnetic trackers with conveyer belt for navigation. Fuidicial markers have also been utilized effectively for navigation. The DeskCube [19] as a passive input device is used where different markers on different faces of a cube are glued. The system fails to cope with occlusion and blurring of markers with gentle movement of markers cube. The leap motion based technique [20] is not fully immersive as users have to activate commands by clicking buttons within a limited space. Fingers based locomotion; FWIP(Finger Walk In Place) [21] for mobile based VEs necessitates touch of the fingers on display screen for navigation. The pointing technique of Radkowsk et al. [22] uses two fingers; one for viewing and the other for direction. Though it successfully avoids the mistakes of gaze-directed navigation, speed controlling is its main challenge.

Fig. 1
figure 1

Schematic of the system a backend (image processing) and b frontend (VE)

3 VEN-3DVE

The steering based navigation metaphor [23, 24] is considered a standard for its close resemblance to real world navigation. Keeping the insight of steering, the proposed system follows index finger movement to interact dynamically with the designed VE. Being a dominant finger, algorithm of the system works on dynamic movement of index finger. To interact with VE using dynamic movement of the finger, it is necessary to precisely extract the part and portion of hand on which interaction routine depends. A green color finger-cap made of paper is used for this purpose to flawlessly detect the index finger. Furthermore, lest the system wrongly suppose a green color object, first the detection of skin color is made to extract segment of scanned image containing the finger’s pose. A range of Hue, Saturation and Values (HSV) space for green color is set to detect index in a balanced lighting condition. At the detection of index-cap, first a Central Zone (CZ) is specified from 2D position of index finger in initial frames. Panning in either direction is performed until the finger’s movement is inside the CZ. Like a real camera in hand, navigation is performed by moving the finger along z-axis beyond the CZ.

3.1 Architecture design

The system starts with a virtual environment comprises different 3D objects. The real world scanned images are thresholded dynamically for a broader range of green color at the back-end. Once thimble of index finger is traced, coordinates mapping, for the index-tip is performed to locate virtual camera and users position in the virtual environment. As long as the index cap is visible, the virtual camera can move freely with index finger movement to travel or pan. As conceivable, outside the CZ, forward movement of index finger navigates user inside the look vector, while the backward movement is to zoom out. Horizontal and vertical finger’s move changes the eye coordinates of virtual camera accordingly. Schematic of the proposed system is shown in Fig. 1.

Fig. 2
figure 2

Gesture with CZ and navigation regions

3.2 Details of the algorithm

Based on skin color, a ROI-Img (Region Of Interest Image) is extracted from the whole scanned Frame Image (Fr-Img). ROI-Img is then thresholded for green color to trace tip of the index finger. From the 2D position of the finger in ROI-Img, Index Central Points (ICPx, ICPy) and ICA (Index Central Area) are saved while CZ is made set from ICPx, ICPy and ICA. CZ is a limited distance over the look-vector but covering the entire horizontal and vertical range of real camera, see Fig. 2. Navigation is performed only behind and beyond the CZ. Within CZ, finger movement on xy-plane is reserved for horizontal and vertical panning.

3.2.1 Image segmentation

To avoid possibility of false detection of the background green color, first ROI_Img is extracted. ROI_Img which is the most probable area of Fr_Img to have index finger, is segmented out on the basis of skin color. As the YCbCr space provides best discrimination between skin and non-skin colors [25], therefore skin color is extracted by using the YCbCr model. The binary Fr_Img from scanned RGB image frame is obtained as,

$$\begin{aligned}&Fr\_Img\left[ {{\begin{array}{l} \hbox {Y} \\ {\hbox {Cb}} \\ {\hbox {Cr}} \\ \end{array} }} \right] \nonumber \\&\quad = \left[ {{\begin{array}{l} {16} \\ {128} \\ {128} \\ \end{array} }} \right] + \left[ {{\begin{array}{l@{\quad }l@{\quad }l} {65.1}&{} {128}&{} {24.8} \\ {-37.3}&{} {-74}&{} {110} \\ {110}&{} {-93.2}&{} {-18.2} \\ \end{array} }} \right] \left[ {{\begin{array}{l} \hbox {R} \\ \hbox {G} \\ \hbox {B} \\ \end{array}}} \right] \end{aligned}$$
(1)

After getting the binary image of Fr_Img, see Fig. 3, ROI_Img with rows ’m’ and column ’n’ is extracted from Fr_Img using our designed algorithm [26] as,

$$\begin{aligned}&\hbox {ROI}\_\hbox {Img}\left( {\hbox {m},\hbox {n}} \right) =\nonumber \\&\left( {\mathop \bigcup \nolimits _{\mathrm{r}={\mathrm{Dm}}}^{{\mathrm{Fr}}\_{\mathrm{Img}}.{\mathrm{Row}}\left( 0 \right) } \left( {\hbox {Fr}\_\hbox {Img}} \right) _{}, \mathop \bigcup \nolimits _{{\mathrm{c}}={\mathrm{Lm}}}^{{\mathrm{Fr}}\_{\mathrm{Img}}.{\mathrm{Colum}}\left( {{\mathrm{Rm}}} \right) } \left( {\hbox {Fr}\_\hbox {Img}} \right) } \right) \end{aligned}$$
(2)

Where Lm, Rm and Dm represents Left-most, Righ- most and Dow- most skin pixels

Fig. 3
figure 3

Fr_Img in aRGB and in b binary after conversion into YCbCr

The segmented image ROI_Img is then thresholded for greecolor using HSV color space as,

$$\begin{aligned}&ROI\_Img\left( {x,y} \right) =\nonumber \\&\quad \left\{ {{\begin{array}{l} {1\,\textit{if}\,ROI\_Img.H\left( {x,y} \right) \ge 47\bigwedge \hbox {ROI}\_\hbox {Img}.\hbox {H}\left( {\hbox {x},\hbox {y}} \right) \le 94} \\ {1\,\textit{if}\,ROI\_Img.H\left( {x,y} \right) \ge 100\bigwedge \hbox {ROI}\_\hbox {Img}.\hbox {H}\left( {\hbox {x},\hbox {y}} \right) \le 187} \\ {1\,\textit{if}\,ROI\_Img.H\left( {x,y} \right) \ge 102\bigwedge \hbox {ROI}\_\hbox {Img}.\hbox {H}\left( {\hbox {x},\hbox {y}} \right) \le 255} \\ {0,\textit{Otherwise}} \\ \end{array} }} \right. . \end{aligned}$$

3.2.2 Coordinates mapping

One of the tedious challenges in the implementation was to harmonize image pixels representing tip-of-index to locate virtual camera in the VE. As in OpenCV, image frame starts with (0,0) at top left while in OpenGL, (0,0) lies at the center of virtual environment hence the coordinates are entirely different. To synchronize the dissimilar coordinate systems, we devise our four mapping functions \(\mathrm{m}_{1}\), \(\mathrm{m}_{2}\),\(\mathrm{m}_{3}\), \(\mathrm{m}_{4}\). The image frame is virtually split into four regions \(\hbox {R}_{1}\) to \(\hbox {R}_{4}\) as shown in Fig. 4, where mapping for a region \(\hbox {R}_{n}\) is made by the corresponding \(\hbox {m}_{n}\) taking x and y of a pixel of \(\hbox {R}_{n}\) as independent variables.

$$\begin{aligned} m_1 \left( {x,y} \right)&=\left( \left( {Px-\left( {Tc/2} \right) /\left( {Tc/2} \right) } \right) ,\right. \nonumber \\&\quad \left. \left( {Tr/2-Py} \right) /\left( {Tr/2} \right) \right) \end{aligned}$$
(3)
$$\begin{aligned} m_2 \left( {x,y} \right)&=\left( {\left( {Px/Tc} \right) ,\left( {Tr/2-Py} \right) /\left( {Tr/2} \right) } \right) \end{aligned}$$
(4)
$$\begin{aligned} m_3 \left( {x,y} \right)&=\left( {\left( {Px-\left( {Tc/2} \right) /\left( {Tc/2} \right) } \right) ,\left( {Py/Tr} \right) } \right) \end{aligned}$$
(5)
$$\begin{aligned} m_4 \left( {x,y} \right)&=\left( {\left( {Px-\left( {Tc/2} \right) } \right) /Tc/2} \right) ,\left( {Py/Tr} \right) ) \end{aligned}$$
(6)

In the above functions, Px and Py represent ‘x’ and ‘y’ positions of the traced pixels in image frame. ‘Tc’ and ‘Tr’ represent Total columns and Total rows respectively. Rendering of the virtual scene after the mapping is shown in Fig. 5.

Fig. 4
figure 4

Virtual division of the image frame

Fig. 5
figure 5

a The finger move in image frame and b Avatar’s moving inside the VE

Fig. 6
figure 6

The forward move of finger beyond CZ for navigation-in

Fig. 7
figure 7

a The index-tip ICA in an initial image frame and b the IDA after forward hand movement

Position of the finger at the time of initial detection is supposed to be in appropriate range from the real camera. To make ICPx and ICPy precise, arithmetic mean of ‘x’ and ‘y’ positions of the pixels representing center of finger-tip are calculated from the first five images. Hence, the zone; CZ is set as soon as index-tip is recognized by the system. CZ represents all positions of the index until the Index Dynamic Area (IDA) doesn’t exceed the ICA.

$$\begin{aligned} CZ=\left( {\mathop \bigcup \nolimits _{{\mathrm{i}}=0}^{tr} \hbox {xi}_{,} \mathop \bigcup \nolimits _{{\mathrm{j}}=0}^{\mathrm{{tc}}} \hbox {yj}} \right) \leftrightarrow IDA\le ICA \end{aligned}$$
(7)

3.3 Navigation

Navigation in 3D VE is the moving of virtual camera towards or away from a look-at point. In the proposed system, forward movement, beyond the CZ shifts virtual camera towards the look-at point as shown in Fig. 6. Over the look vector, moving the finger behind the CZ, zoom out the virtual scene.

Since the image frames scanned are in 2D, forward or backward hand movement on z-axis is deduced from variation in area of the index-tip. IDA of the index-tip is calculated on the fly for each thresholded frame and is compared with ICA. Beyond the forward CZ limit, IDA is increased by k, see Fig. 7b.

$$\begin{aligned} IDA=\left( k \right) \left( {ICA} \right) \end{aligned}$$
(8)

where \(k>1\)

Fig. 8
figure 8

First three speed sectors of forward navigation

Fig. 9
figure 9

CZ for panning and navigation area over the look-vector

The unintentional increase/decrease possibility is avoided by checking dynamic positions of index finger on x-axis (IDx) and y-axis (IDy) with ICPx and ICPy along-with ICA and IDA. This is clear from the following pseudo-code where a range of ten pixels is set to accurately deduce navigation.

$$\begin{aligned}&\hbox {if}(\hbox {IDA}>ICA)AND\left( {ID\hbox {x}\le \hbox {ICPx}+10\hbox {AND IDx}}\right. \\&\quad \left. \ge \hbox {ICPx}-10 \right) \\&\quad \quad \hbox {AND}\left( {\hbox {IDy}\le \hbox {ICPy}+10\hbox {AND IDy} \ge \hbox {ICPy} -10} \right) \\&\qquad \qquad \qquad \qquad \qquad \quad { Forward\,Navigation} \\&\hbox {if}(\hbox {IDA}<ICA)AND\left( {ID\hbox {x} \le \hbox {ICPx}+10\hbox {AND IDx}}\right. \\&\quad \left. \ge \hbox {ICPx}-10 \right) \\&\quad \quad \hbox {AND}\left( {\hbox {IDy}\le \hbox {ICPy}+10\hbox {AND IDy} \ge \hbox {ICPy}-10} \right) \\&\qquad \qquad \qquad \qquad \qquad \quad { Backward\,Navigation} \end{aligned}$$

3.3.1 Speed control

Depending on the quality of camera and the position of user hand from it, different finite speed sectors Sn can be set where,

$$\begin{aligned} n=1,2,3,\ldots ,k \end{aligned}$$

Navigation speed in sector \(Sn+1\) is double of the speed in Sn. Constant Speed SPn in a particular sector Sn is retained till entrance into next sector \(Sn+1\), where Spn is calculated as,

$$\begin{aligned} SPn=\mathop \sum \limits _{k=2}^n SP_{\left( {k-1} \right) } \end{aligned}$$
(9)

To determine the entrance of index-tip in a particular sector Sn, shown in Fig. 8, the following condition is checked on the fly,

$$\begin{aligned} IDA>=\left( n \right) ICA \end{aligned}$$
Fig. 10
figure 10

The finger’s move over x-axis for right-panning

3.4 Panning

Panning is to translate horizontally or vertically the eye position and look-at point of the virtual camera. To prevent any possibility of disorientation and confusion, panning is enabled only inside CZ, see Fig. 9. Panning over x-axis is performed by horizontal hand movement, while along y-axis by vertical movement. Like the camera-in-hand metaphor, the virtual camera follows index finger or hand movement. Movement along the \(+ve\) x-axis shifts the x-coordinate of camera on \(+ve\) axis while the point of view (POV) towards \(-ve\) x-axis accordingly. Similarly, vertical hand movement changes the y-coordinate of virtual camera with look-at point on the opposite y-axis (Fig. 10).

Fig. 11
figure 11

Virtual scene after left panning activation

If IP\(_{(x,y)}\) and FP\(_{(x,y)}\) represents Initial Point and Final Point of the index-tip in panning area then, horizontal change dx and vertical change dy are calculated as;

$$\begin{aligned} dx= & {} \sqrt{\left( {FP_x -{ IP}_x } \right) ^{2}} \end{aligned}$$
(10)
$$\begin{aligned} dy= & {} \sqrt{\left( {FP_y -{ IP}_y } \right) ^{2}} \end{aligned}$$
(11)

The following algorithm is followed for horizontal and vertical panning.

$$\begin{aligned}&Ifdx>dy\\&\quad { If}(IP\left( x \right)<{ FP}\left( x \right) )\\&\quad \quad Panning\,along-ve~x-axis\\&\quad { If}({ IP}\left( x \right)>{ FP}\left( x \right) )\\&\quad \quad Panning\,along+ve~x-axis\\&{} { If}dy>dx\\&\quad { If}({ IP}\left( y \right) <{ FP}\left( y \right) )\\&\quad \quad Panning\,along-ve~y-axis\\&\quad { If}({ IP}\left( y \right) >{ FP}\left( y \right) )\\&\quad \quad Panning\,along+ve~y-axis \end{aligned}$$

If Ix and Fx are the pixels representing the horizontal initial and final position of index-tip then the virtual camera’s x (Cx) and z (Cz) coordinates of for panning are calculated as,

$$\begin{aligned}&dx=Fx-Ix \end{aligned}$$
(12)
$$\begin{aligned}&\theta =2*\sin ^{-1}\left( {{dx}/{Fy}} \right) \end{aligned}$$
(13)
$$\begin{aligned}&Cx=sin\left( \theta \right) \end{aligned}$$
(14)
$$\begin{aligned}&Cz=cos\left( \theta \right) \end{aligned}$$
(15)

4 System implementation and evaluation

The system was implemented in Visual Studio 2015 using a Corei3 laptop running with 2.30 GHz processor and 4GB RAM. Resolution of the built-in camera was set at 640x480. Tracing of the Index finger cap, its area calculation and detection of horizontal and vertical coordinates was carried out by the OpenCV routines at the backend. The interactive frontend virtual scene, was designed in OpenGL. The system remains active only when index cap is visible. About the system activation, user is constantly made informed by the text “Detected” displayed in upper center part of the scene. Similarly, with left or right panning user is notified with text “Turning” at the respective side of the scene as shown in Fig. 11.

The system was tested by fourteen male participants having ages between 25 and 40. Two trials were performed by each participant for each of the four pre-defined tasks. Before actual trials, participants were introduced to the system and pre-trials were performed by each tester for both navigation and panning.

4.1 Testing environment

The 3D environment designed for the evaluation of the algorithm contained four routes as shown in Fig. 12 where an avatar made of cubes represents the user’s position in the environment. For easy noticing, end point of the scene is marked by a board with text “Stop”. Each route leads to the one Stop-board. To immerse users and to have a perception of navigation and panning, the scene was designed with different 3D objects at different positions. For each new trial, users were asked to press Enter key to reset the system.

  • Route-1: Straight pathway leading to Stop-board.

  • Route-2: Right-Straight-Left pathway.

  • Route-3: Left-Straight-Right pathway.

  • Route-4: Up-Straight-Down pathway to have a flying effect to navigate over the bandstand. Lest the scene became complicated, this route was intentionally not rendered in the VE.

Fig. 12
figure 12

2D model of the routes

4.2 Interaction tasks

Participants were asked to perform the following four interaction tasks in the designed 3D environment.

  • Task_1: Touching the Stop-board using Route-1 and then back to starting point following the same route.

  • Task_2: Touching the Stop-board using Rout-2.

  • Task_3: Touching the Stop-board using Rout-3.

  • Task_4: Touching the Stop-board using Route-4.

Navigation-In is the forward while Navigation-Out is the backward movements inside the VE. Left/Right panning is turning, with virtual camera shift, towards the respective direction. Task-1 tests Navigation-In and Navigation-Out. Task-2 and Task-3 are to evaluate Navigation-In, Right panning and Left panning. Task-4 assesses Up panning, Down panning and Navigation. Missed detection or false detection of the system after posing the required gestures were counted as errors. With this setup, overall accuracy rate for all the 336 trials, as shown in Table 1, is 69.9%.

Table 1 Statistics of the evaluation

As shown in Fig. 13, mean of navigation is comparatively high than panning; Fig. 14. The obvious reason is the crossing the detection limits of camera.

Fig. 13
figure 13

Mean with standard deviation of Navigation

Fig. 14
figure 14

Mean and standard deviation of Panning

A questionnaire measuring the four factors; Ease of Use, Suitability in VE, Naturalism and Fatigue, was presented to the users at the end of valuation session. Percentage of user’s response to the four factors is shown in Fig. 15.

Fig. 15
figure 15

Participant’s response about the proposed approach

The learning effect was measured from the error occurrence rate. The graph in Fig. 16 indicates performance increases while error rate decreases in subsequent trials.

Fig. 16
figure 16

Graph showing learning effect of the system

Fig. 17
figure 17

The required virtual scene

5 Applicability of the approach in interactive designing

Interactive design is all in all designing interactive digital products or environment offering an effective and expressive interface for interaction. A design artifact based on interactive approach, is not only friendlier and natural than traditional GUI [27] but can also assess user to analyze complex data and function [28]. The mingling of VR technology with CAD ensures such natural and multimodal interfaces; hence, VR-CAD is replacing the conventional CAD system [29]. Traditionally, haptic devices were used for interactive Designing [30] and Virtual Assembly (VA) [31]. However the cumbersome setup of wires restrict such systems to a controlled environment. LCM based systems have been developed to avoid the hindrances of haptics, but recognition accuracy of LCM is good only within limited zones [32]. As only intuitive interface with universal applicability is preferred [28], the NBIT project was enhanced to NBIT-SD (NBIT for Simple Designing)to see applicability of the proposed method in the realm of VR-CAD. With NBIT-SD three interaction tasks; Selection, DeSelection and Translation were introduced with a Virtual Hand (VH) to represent finger’s position in the VE. Selection and DeSelection were performed by keep hovering VH over an object for a threshold time; hover-time (HT) of 3 seconds. To make translation of a selected object possible, panning in NBIT-SD was performed only when VH move beyond the boundaries of interface window. Five skilled engineers (two Auto-CAD experts and three software engineers) were invited to evaluate the system. Two trials (one pre-trials and one true-trial) were performed by each engineer to design the scene shown in Fig. 17 by Selecting, Translating and DeSelecting virtual objects presented in the NBIT-SD application, shown in Fig. 18.

Fig. 18
figure 18

The NBIT-SD interface with virtual objects

Based on their subjective analysis, shown in Fig. 19, the approach can be easily enhanced to make it applicable in VR-CAD technology particularly for semantic zooming [33] and analyses of widely spanned complex 3D objects [27]. Most of the experts suggested that selection and DeSelection should be performed by distinct gestures to make the approach applicable.

Fig. 19
figure 19

Response of the experts about applicability of the approach in VR-CAD

6 Conclusion and future work

Navigation is often required in many 3D interactive virtual spaces. The emergence of virtual and augmented reality applications in different fields necessitates natural and simple way of navigation. With this contribution we proposed a novel navigation technique which needs no extra device other than ordinary camera and a piece of paper. Intuitive gestures of index finger are used for panning and navigation to ensure naturalism. Experimental results show that the proposed approach has reliable recognition and accuracy rates. As the system needs mere conversion of scanned images to HSV color space rather than feature extractions, therefore time is not wasted in computation. Furthermore, as single finger is used for interaction; therefore possibility of occlusion is reduced. The proposed system is equally applicable in a wide spectrum of HCI including CAD, 3D gaming, engineering, medical and simulation. The work also covers the smooth integration of image processing and virtual environment which can leads to the design of more sophisticated virtual and augmented reality applications.

This research work is a fraction of our aim of making possible interaction outside virtual reality labs. Although, we have succeeded for navigation, rotation via single finger gestures is yet to be covered. In future, we are determined to enhance the system for rotation as well. Furthermore, we are planning to make the system capable to be used in complex virtual collaborative environment.