1 Introduction

Non-photorealistic rendering (NPR) is an emerging technology of the nineteenth century and emerged as a branch of image processing and computer graphics, when the science and technology got advanced. Today, it is possible to adopt NPR techniques which are used in the area of computer vision and visualization of computer interaction domain. Non-photorealistic rendering techniques are being used to produce abstracted and stylized images [39]. Non-photorealistic rendering techniques involve image abstraction, artistic stylization, line drawing, pen-and-ink illustration, HDR/aerial image analysis, engraving, image analogy, color enhancement, pencil drawing, dithering, stippling, halftoning, hatching, and mosaicking.

The history of non-photorealistic rendering dates back to the year 1963, when Johnson [1] developed the 3D graphical shapes using cathode-ray tube (CRT) and a light pen (stylus) via real-time bilateral communication between man and computer. The light pen consisted of photodiode which emits light on the CRT screen used to draw the 3D shape on the CRT. Finally, 3D graphics objects were displayed over CRT screen. In 1967, Appel [2] developed the machine rendering solids using line drawing and quantitative invisibility approach. These two were the major attempts in the field of rendering and computer graphics. These two approaches were the best methodology for producing the 3D solids using line drawing until 1969 when Baecker [3] proposed a picture-driven animation. His method used still images which represented only static complex information. Animation dynamic process presented the complex information through a set of sequence images, pictures location, color, and the scale changed throughout the process. Whereas the traditional animation pictures were written by hand and exhibited on screen, work of Baecker [3] involved the usage of GENESYS animator system that supported for dynamic picture animation. GENESYS system includes sketch, erase, copy, rotation, and translation facility for picture animation. GENESYS system was based on the rhythm description and dynamic hierarchy. These are the three major remarkable works done in graphics and rendering domain before the 1990s. Texture synthesis noise model was used to develop the Perlin model which comprises of multi-scale spot noise and, thus, triggers numerous investigations on the procedural commotion. Perlin models can be used for realistic texture synthesis [4].

Image abstraction and artistic stylization techniques are also called non-photorealistic rendering [5]. After the 1990s, non-photorealistic came with new force and expanded its research area in visual communication and stroke-based rendering. NPR techniques are focusing on image abstraction and stylization of two-dimensional image content which we refer as Image Abstraction and Artistic Stylization (IA–AS). IA–AS are amongst the most frequently discussed topics in non-photorealistic rendering (NPR), computer graphics processing, and other fields of multimedia. The concept of image abstraction and artistic stylization has, indeed, revolutionized the field of NPR. Existence of several NPR filtering techniques such as Anisotropic filtering, Kuwahara filter, Guided filter, Flow-based difference of Gaussian (FDoG) filtering, Structure tensor flow, flow-based bilateral filter (FBL), Z-buffering, photography lighting techniques, shadow techniques, and vivid brush strokes have further enhanced the field of NPR. IA–AS, thus, has become almost an indispensable component of any NPR and image processing system.

IA–AS addressed the NPR issues associated with early human–computer interaction (HCI), manual brush strokes, semi-automatic approach, and automatic image abstraction and stylization. The main goal of IA–AS is to develop an effective automatic tool for image abstraction and stylization. IA–AS rectifies the varies interdisciplinary domains issues like defocusing, pose estimation, context-based image retrieval, energy optimization, stereo-matching, super-resolution, compression, emotional and visual attention assessment, effective learning, and training in the education domain [39]. The early literature clearly highlighted the traditional approaches which failed to provide optimistic results for exploring the interdisciplinary applications. In this regard, the best practice would be to use NPR or IA–AS techniques.

The survey work was focused on identifying the significant NPR domain papers between 1960 and 2017. Haeberli et al. [6] proposed Paint by Numbers technique. Abstraction and Stylization of Photographs were presented first, by DeCarlo et al. [7]. Winnemoller et al. [8] proposed a Real-time Video Abstraction technique and Kang et al. [9] proposed a flow-based image abstraction technique. Apart from the above-mentioned works, a few more noteworthy articles from Eurographics, SPIE, Springer, IEEE transitions, MDPI, and Elsevier journals were also studied.

Non-photorealistic rendering (NPR) techniques are grouped into low-level approach and global-level approach. Low-level approach mainly used the non-linear filters, segmentation techniques, edge detection techniques, and placement of strokes over images either by user or computer system. However, it produces low-level image artifacts and results depend on the size of the brush strokes, image color, brush orientation, image background, and foregrounds features. Global-level approaches principally include the linear filters, color correction techniques, depth map calculation, and gradient approach. Global approaches produce high-quality rendering effect to the user.

NPR image abstraction provides the major contribution to computer graphics domain. Role of computer graphics is to create the best rendering and illustrative artifacts [10]. In many cases, computer graphics adopt the traditional manual interaction techniques. Manual interaction techniques are not suitable for handling the HDR images, as these images have high illumination and complex background. Computer graphics failed to produce the best lighting and rendering illustration to HDR images through the traditional/manual approaches. This drawback could be overcome using NPR filtering techniques.

Since 2013 onwards, NPR research domain is tending towards graphical processing unit (GPU)-based IA–AS. For the processing of image and video in a faster manner, GPU device is the most sought after device than the traditional CPU.

1.1 Evolution of NPR interactive system to NPR IA–AS

Figure 1 illustrates the evaluation of NPR included interactive system, brush stroke approaches, semi-automatic approaches, and fully automatic abstraction and stylization techniques. The NPR approach that is being proposed is motivated by the work mentioned from State of the “Art”: a taxonomy of artistic stylization techniques for images and video [5], where the chronology hierarchy from 1980 to 2010 has been demonstrated, although their work was limited to few samples.

Fig. 1
figure 1figure 1figure 1

Evolution of IA-AR system

Whereas the state of the art: a taxonomy of artistic stylization techniques for images and video [5] work was confined to the period between 1980 and 2010 involving a few samples, the proposed work attempts to look into the work between 1963 and 2017, taking more samples with complexity.

We presented more precise evolution milestone from 1963 to 2017 which is a significant work-related IA–AS framework. In the early 1960s, 3D drawings were called graphics. From 1990s onwards, non-photorealistic rendering term was used significantly in computer graphics domain. After 2002 onwards, term image abstraction and stylization was widely used in image processing and visualization domain. Figure 1 effectively represents significant results related to journal publication on year-wise basis.

2 Classifications of NPR techniques

NPR techniques have been classified to fall into following categories, based on the work entitled “State of the Art”: a taxonomy of artistic stylization techniques for images and video [5], stylization and abstraction of photographs [7], and painterly rendering techniques [11].

  1. 1.

    Stroke-based rendering.

  2. 2.

    Image analogy.

  3. 3.

    Region-based techniques.

  4. 4.

    Image processing and filtering techniques.

The above four categories include linear filtering and non-linear filtering techniques.

However, a few other approaches, viz, neural style transfer, segmentation, and feature-based filtering approach which came into existence after 2013, also have been included in the present work. Overall picture highlighting the classification of NPR techniques is presented in Fig. 2.

Fig. 2
figure 2

Classification of NPR techniques

The survey carried out has identified the different approaches of NPR techniques involving mosaic rendering, texture transferring, color correction, stippling, pencil drawing, halftoning, motion video rendering, local brush stroke approaches and global brush stroke approaches, image abstraction, and stylization.

The present work proposes to provide the benchmark guidelines, data set and image statistical property assessment techniques to assess the image in terms of color [153], image complexity [154], contrast [155], sharpness [156], edge strength [157], and image noise [158].

The present work adopts the quality assessment techniques like SSIM, PSNR, FoM, and MSE, and recommends the same to others to assess the algorithm performance. The paper ends with highlighting the NPR application and the challenges thereon.

2.1 Stroke-based rendering (SBR)

IA–AS non-photorealistic rendering mainly involves stroke-based rendering algorithms (SBR) [5, 11]. SBR rendering algorithms include the various size brush strokes, color, texture, and different orientation. The brush strokes which is a part of SBR algorithm facilitates the user to create a painting impression like line drawing, stippling, mosaicing, halftoning, dithering, image abstraction, and stylization. In real-time interaction, SBR employs small brush strokes to modify the small area, while big brushes are employed to modify the background areas. While different size brush strokes render a variety of output, it is the usage of suitable brush stroke with recursive application, which will give desire image surface level details, thus rendering stunning images.

The development of algorithms proposed for SBR overcomes the difficulty in manually selecting the proper style, color, orientation, and position of the image, for which requisite skill which varies from person to person leading to divergent output. Viewed from this context, development of the SBR algorithm is a step towards rendering the images to be somewhat skill independent.

Mosaic is a process involved in the SBR technique, which employs section of stones, colored glasses, and colored grain, which facilitates the decoration of arts. In a few cases, a set of small portrait images are embossed over large original portrait image, thus making it more attractive through this approach. Tonal illustration is another process of SBR, assisting the generation of pattern with different degree of solidity or shade using dots of varying size and this process is called as stippling. This process calls for a skilled artist. SBR technique along with this feature enables the user to give special effects which traditionally used to be emulated by a skilled artist.

Hatching is another process involved in the tonal illustration which provides shading effects by providing light and dark shades to images to create an abstract composition. Tonal illustration operation mainly includes halftoning and dithering. The tonal illustration was also used in HDR image analysis.

  1. 1.

    Halftoning is a technique of reprographic, which produces the graphical tone images using dots of varying size and spacing, and produces gradient like output. This technique is prevailing in the printing industry for more than 100 years. This technique displays the information in black-and-white dots.

  2. 2.

    Dithering is a technique used to create the misapprehension/non-realistic color depth in an image with a limited color palette. The operation of dithering is done to decrease the number of colors and to provide an artistic effect to an image without affecting the human eye. It converts the original image into a cartoon-like image.

The main goal of SBR algorithms is to provide a procedural tool which has automate parts of the image creation and not to replace the artist [11]. SBR is further classified into global and local brush stroke approach. A local brush approach was again classified into interactive and automatic local rendering approach. Local brush stroke approach mainly deals with direct interaction with the pixels in the given input image. SBR technique attempts to minimize the objective functions in this way, thereby, effectively rendering the images.

The following narration highlights the gist of work on SBR techniques. Gooch et al. proposed [12] geograftals editing system. It has the facility of generating the geograftals, editing geograftals, shading, and line drawing. Framework rendering surface models of complex scenes with an assortment of masterful styles utilize an intelligently editable molecule framework called as geograftals. Salesin et al. [13] proposed a user and computer interactive system. This approach provides the user with a facility to generate the desired texture using mouse interface. Collections of texture strokes are arranged in different types of pattern to generate the best texture and tone effect. The user recursively applies the curved brush strokes over an image and produce the painting with the desired tone. The computer system applies the strokes individually and produces the rendering image. The mouse interfaces which provide the bush stoke which reduces the burden on user. The usage of mouse interface in the NPR providing the brush strokes is also suitable for creation of mosaics drawing and line drawing. Hertzmann [14] proposed a curved brush stroke of multiple sizes based on cubic B-spline and is responsible for the creation of an image with a hand-painted appearance from a photograph, thus avoiding anti-aliasing effect.

Haeberli [6] proposed paint-by-number system, based on color and order orientation methodology for image abstraction. The approach allows the users to choose stroke order, the shape of brush, color sample, and orientation in the source image. RGB noises are added to the output obtained at this stage to provide the abstracted and stylized image (Fig. 3).

Fig. 3
figure 3

a Paul Haeberli approach. b Pseudo-random stroke of (Haggerty) compared to Haggerty approach–Haeberli approach produces the best output without lose the significant information (courtesy: Haeberli [6])

Salesin et al. [15] proposed an orientable texture for image-based pen-and-ink illustration for grayscale images. Their work presents an interactive system which can transfer texture and provide brush stroke over grayscale image. However, this work is limited to gray image with strong features and it not suitable for color image (Fig. 4).

Fig. 4
figure 4

a Result of orientable textures for image-based pen-and-ink illustration. b Enhanced EYE Portion (courtesy: Salisbury et al. [15])

Gooch et al. [16] proposed caricature human facial illustrations of system-based image-wrapping technique. It highlights and exaggerates representative facial features. It is useful for entertainment, education as well as low-bandwidth telecommunications and psychology studies.

Stroke-based rendering (SBR) techniques and WIMP interface-based interactive stroke-based NPR using hand postures on large display were proposed by Jens Grubert et al. [17]. The approach provided good editing facility during the rendering process and it permitted for placing new strokes over an image with a different orientation.

Santella et al. [19] proposed abstracted painterly renderings using eye-tracking data. It is a human perceptual approach. User observed the images using eye tracker (interpretation of sensory information to represent and understand the presented information). Interactive Sketch Generation-based B-spline and live wire contour paradigm were proposed by Hyung W. Kang et al. [20]. B-spline brush stroke model allows users to interact with strokes which render the user with increased level of detailing. Hertzmann and Peter [18] proposed AniPaint system which mainly uses WYSIWYG interface, which is helpful in animation of individual key frame from given input video into animated video sequence.

Zhao and Zhu [21] presented an abstract painting with interactive control of perceptual entropy. The proposed system generated abstract art from photographs. Perceptual entropy is a numerical measure for the level of perceptual ambiguity. This work segments an image into regions, corresponding to different objects via interactive manner, and represents them into hierarchical parse tree. Defined statistics measure the levels of perceptual ambiguities, named perceptual entropy and develop the algorithms to compute the entropy for images and predict their most likely perceptual paths. This method is cumbersome and involves more time to produce the abstract art from photographic images. When the images have too much complex information, the results obtained are difficult to assess as human perceptual ability is for more diverse/vivid than perceptual entropy.

Semmo et al. [22] proposed a user interactive system involving for oil paint filtering system for image abstraction and stylization. In the first step, this work involves color correction and quantization process to preserve the global color features. In the next step, non-linear filtering and adoptive structure tensor flow were applied to preserve the dominant structure. This system has full control over the visual output by adopting well-furnished parameter set for color correction. Apart from the above-mentioned works which may be considered as significant local SBR approaches, Matthew Kaplan et al. [23] proposed a better interactive water color rendering work which dealt with abstraction techniques like color region segmentation, cartoon shading, morphology, water color effect like paper effect techniques, edge darkening, and pigment density variation.

In addition, Sander et al. [24] proposed and presented another approach involving interactive painterly stylization of images, videos, and 3D animations. In this approach, strokes were place over an image based on local neighborhood information and optical flow technique. This works converts image, video, and 3D objects into artistic image stylization rapidly with help of graphical processing unit (GPU).

An interactive tensor field design and visualization on surfaces was proposed by Eugene Zhang et al. [25]. Very first approach of automatic brush rendering was proposed by Heggerty P [26]. It is an important sub-technique in stroke-based rendering (SBR). Algorithm for automatic painterly rendering based on local source image approximation was proposed by Yasushi Yamaguchi and Michio [27]. Optimization of paintbrush rendering of images by dynamic MCMC methods was proposed by Tamás Szirányi et al. [30]. Automatic rendering work for painting of images and video for an impressionist effect was done by Peter Litwinowicz [36]. This work converts ordinary image and video into (animated) hand-painted image. The work involved generation of intensity map during drawing process. Furthermore, intensity map was blurred using Gaussian algorithm; gradient detection and edge-preserving technique were applied. At the end, brush strokes were applied randomly to get impressionist effect.

Automatic painterly rendering using image salience was proposed by Collomosse [28] in 2002. Image salience and gradient techniques were used to achieve painterly rendering image. However, this work lacked clarity in getting cohesive results for all type of images, because system brush strokes could not be placed precisely. Furthermore, the placement of strokes over image canvas resulted in affecting the image background and thus affecting the important foreground image content.

Empathic painting was proposed by Shugrina et al. [29] based on applied computer vision which involved facial action coding scheme (FACS) and EDISON segmentation. This work facilitated the users to interact in real-time situation by mounting the camera on top of the computer monitor and using the painting tool simultaneously. Segmentation region was encoded using freeman (chain) code. By combining the real-time and source image results, painterly rendering image was obtained. However, due to various lighting exposure, visually pleasing results could not be obtained.

Tree-structured vector quantization-based fast texture synthesis was proposed by Wei [31]. This work presented an effective model for realistic texture synthesis based on Markov Random Field texture model.

L-shaped neighbored algorithm and coherent synthesis approach for fast texture transfer technique were proposed by Ashikhmin [32] in 2003. His work iteratively applied the L-shaped algorithm to synthesize the matched texture and this work is suitable for image enhancement and style transfer.

Paint by relaxation was proposed by Hertzmann [33] involving an energy function and placing the brush strokes on image canvas. Stroke relaxation procedure was applied by adopting the snake techniques. They create new strokes, reactivate the idle strokes, recolor the strokes, and enlarge the strokes to produce the rendering images.

Wen et al. [34] proposed color sketch generation system based on mean-shift algorithm as preprocessing technique. In the subsequent steps, luminance algorithm was applied for boundary editing purpose, and low-pass FFT was applied for boundary smoothening and shrinking purpose. At last, color shift operation was performed by taking the average color of an input image.

Automatic stained glass rendering was proposed by Setlur et al. [35] in 2006. The work pertains to merging and simplifying the segmented region of the image by applying segmentation and region simplification technique. A database for glass swatch images is used in tandem with the original image to be processed to get the optimal matching in terms of color and texture. The swatched images are mapped to original image, and thus, 3D rendering effect was produced.

Tresset and Leymarie [37] presented an automatic generative portrait sketching system based on shadows techniques like medial-axis, segmentation, and AIKON tool. Another automatic emotionally aware automated portrait painting was done by Colton et al. [38] based on short-duration video clips. The work recognized the facial features in video clips and passed the extracted features to NPR system. The system is mainly responsible for segmentation of image followed by rendering process like rendering segment layer, rendering curves, and rendering the painted layer. This approach could effectively express emotions such as anger, disgust, fear, happiness, sadness, and surprise.

Mould et al. [39] presented emotional response and visual attention to non-photorealistic images based on applying non-photorealistic rendering algorithms. Emotional and visual attention subjective study was conducted based on five NPR filtering techniques (Haeberli 1990 [6], Kang 2007 [131], Orzan 2007 [164], Mingtian Zhao 2010 [114], and Secord 2002 [165]) and two blurring techniques such as uniform blur and salience-based blur. This study considered 42 participants to assess the obtained results from NPR algorithm. They also assessed the user visual feedback using an eye tracker. Eye tracker gazes and records the data for further emotional analysis, and different ANOVA’s tests were also conducted to evaluate the emotional response and visual attention of the users.

Automatic genetic painting system was proposed by Collomosse and Hall [40] based on feature preserving techniques, classification algorithms, and Genetic Algorithm (GA). Genetic algorithm was iteratively applied to synthesize salient detail.

Collomosse et al. [41] proposed temporally coherent artistic animations from video based on segmentation, region matching, and smoothing technique. In this work, EDISON segmentation technique [166] was applied and smoothing operation was performed using piecewise-bicubic interpolation. This work may not be considered as robust to occlusion and per-frame snake relaxation procedure can cause animation to sparkle and may, thus, prove to be unsuitable.

Stippling techniques for artistic screening were proposed by Ostromoukhov et al. [42] based on the predefined dot contours and certain intensity level. Intensity level was obtained via interpolation. Mapping transformation technique was applied to get the various visual pleasing effects. Multi-level color halftoning algorithm was proposed by Ostromoukhov et al. [43] based on the elliptic dots and error diffusion. They were adopted for color conversion purpose to produce multi-level color halftoning. Multi-color and Artistic Dithering technique was proposed by Ostromoukhov [44] in 1999 based on multi-color dithering algorithm and it extended the previous proposed works [42, 43].

Digital facial engraving was proposed by V. Ostromoukhov in 1999 [45] based on developed engraving layer and placed the layer above the real photo. Separate layers were merged according to merging rules and range shift mask. Facial engraving technique was first introduced in fifteenth century when engraving technique was used in budding book printing. Engraving techniques were used for different purposes such as hollow painting, silk screen process, letter press, and lithography. Engraving was extremely popular in seventeenth and eighteenth century. This work resembled copperplate engraving (Fig. 5).

Fig. 5
figure 5

Example of facial engraving (courtesy: Victor Ostromoukhov [45])

Ulichney [46] presented a survey paper on halftoning. The literature addresses the different halftoning techniques like white noise, recursive tessellation, the classical screen, blue noise, neighborhood process, and point processes. This literature also covers multi-level dithering, ordered dithering; error diffusion, and bitonal dithering to achieve better rendering output.

Halftoning importance in industry was mentioned in importance driven halftoning [47] by Streit in 2001. This work was based on Bayesian pyramid approach to produce the pleasing output and to protect the features. Ostromoukhov et al. [48] proposed the rotated dispersed dither technique. It was a new technique for digital halftoning. This work was based on the discrete one-to-one rotation of a Bayer’s dispersed dot dither array. Discrete rotation has the significant effect of rotating and splitting of the important part of the frequency impulse present in Bayer’s halftone array into many low-amplitude distributed impulse. Rotated dispersed dither was most useful in inkjet printer. Halftoning with image-based dither screen was proposed by Verevka et al. [49] based on order dithering and halftone techniques. This work resulted in the production of artistic and photorealistic rendering output effectively.

Georges Seurat (1859–1891) introduced pointillism. Main aim of pointillist was human vision to blend the dots into colors, avoiding the muddiness and style of art work involved with post-impressionist work.

Hausner [50] proposed a pointillist halftoning based on Lloyd’s Voronoi approach, color quantization, traversal order of graph of pixel, and error diffusion techniques. These techniques were applied to suppress the anti-aliasing effect and most useful for color printing application.

Non-photorealistic rendering using an adaptive halftoning technique was proposed by Streit et al. [51]. This work addressed the difference between non-photorealistic rendering and halftoning. It was an extension work of multi-level color halftoning [43].

Floyd–Steinberg error diffusion technique and contrast-aware mask for contrast-aware halftoning were presented by Mould et al. [52]. This work preserved the image structure and tone, and enhanced the contrast. Sinusoidal and triangle wave form-based amplitude-modulated line-based halftoning was proposed by Ahmed [53]. This approach elaborately includes applications such as stylized rendering, halftoning in line-based devices, and useful for technical education.

Dongyeon et al. [54] proposed a automatic feature-guided image stippling based on tone mapping and line drawing techniques, to reduce the color, and production of line drawing. Feature extraction and dot optimization technique were used in subsequent stages to preserve the texture and uniform distribution of dots over image canvas (Fig. 6). Non-linear error diffusion and adjustment contrast-aware halftoning technique for fast automatic structure-preserving stippling was presented by David Mould et al. [55]. The work effectively preserved the high-quality texture in the rendered image (Fig. 7).

Fig. 6
figure 6

a Input image. b Result of feature-guided image stippling (courtesy: Dongyeon et al. [54])

Fig. 7
figure 7

a Input. b Result of fast automatic structure-preserving stippling (courtesy: David Mould et al. [55] )

Pang et al. [56] presented structure-aware halftoning. This work adopted the objective function that conserved both the tone and sensitive texture information. Result analysis purpose of Mean Structure Similarity Indexing Measurement (MSSIM) technique was used to find out image degradation and PSNR technique was applied to find out tonal consistency distortion.

Lee et al. [57] proposed structure grid for directional stippling based on the line map and structure grid technique, to protect smoothly deformed local features. At the end, primitive rendering algorithm was applied for best stippling. This method failed to render the image properly when it had complex patterns (Fig. 8).

Fig. 8
figure 8

a Input image. b Directional stippling (courtesy: Seungyong Lee et al. [57])

Another indispensable gathering in NPR order is mosaicking. A few noteworthy works related to mosaicking are presented here with.

Mosaicking photography composed propel mosaicking system. In mosaicing process no less than two pictures were taken, the photo of tiled zones, each one of which is supplanted with another photograph that matches the goal photo. Most cases, when the customer saw at low-level improvement, the individual pixels come into focus as the essential picture. While in close examination reveals that the photo was in truly involved an expansive number or a large number of little pictures.

In the year 2001, Hausner [58] proposed simulating decorative mosaics based on centroidal Voronoi diagram (CVD) technique. CVD was used for position of tile over an image to synthesis the decorative mosaic. Idea of CVD based of Lloyd’s calculation. Z-buffering technique was applied along with CVD to improve the decorative mosaics result. Elber and Wolberg [59] enhanced the simulating decorative mosaics approach [58] by adopting tile enhanced Z-buffering technique. Jigsaw Picture Mosaics (JIM) by Kim and Pellacini [60]. They use arbitrary container and arbitrary shapes. This work employed energy minimizing algorithm and simulated decorative mosaics algorithm, and modified the features of photo mosaics to produces the jigsaw mosaic. Thus, a user can effectively control the result by varying the energy weights in the energy algorithm.

Blasi et al. [61] proposed fast technique for mosaic rendering. This work consisted of three techniques (artificial mosaic, photo mosaic, and puzzle image mosaic) to change the raster input picture into immense quality mosaics. Computerized/artificial mosaics based on direction guideline, distance transform matrix, gradient matrix, and level-line matrix algorithm were applied sequentially to get pleasant result. In photo mosaic which is based on antipole tree data structure, feature vector techniques and least separation algorithm are employed.

Yet, another technique, namely, puzzle image mosaic, is based on direction guideline algorithm. This algorithm was adopted to preserve the image dominant edges and boundary. Morphological operation was applied to directional guideline output to ensure the good aesthetic result, and for clustering purpose, Voronoi algorithm was applied. Finally, merging operation was performed to merge the morphological result and Voronoi algorithm result to obtain the best mosaic rendering.

Nishita et al. [62] proposed a method for creating mosaic images using Voronoi diagrams. Similar kind of work done by Faustino and Luiz [63] presented a work on simple adaptive mosaic effects. Mosaic impacts takes over the exiting picture such as old mosaic or recolored glass windows to give aesthetic touches to them. These impacts were obtained by deteriorating the first picture into cells (additionally called tiles) and painting those cells with shading that approximates the shading appropriation inside the cell in that picture. For the creation of effective mosaic automatic, centroidal Voronoi diagram with a density function that emphasizes image features was considered (Fig. 9).

Fig. 9
figure 9

a Original image. b Simpler adaptive mosaic effect (courtesy: Faustino and Luiz [63])

Mould [64] presented a stained glass image filter. This work automatically converts image into recolored glass impact on a given picture. The picture was first segmented naturally using EDISON segmentation technique. Using morphology operation, segmented regions were smoothened and enough care was exercised to filter out the undesired shapes. Heraldic tincture techniques provide a set of colors, which was used to shade the image areas to replace the earlier medieval-stained glass technique. This strategy works extremely well for pictures that have clear outlines and create amazing recolored glass versions. Finally, displacement mapped plane approach was applied to create the stylized image.

Orchard and Kaplan [65] proposed a cut-out image mosaics. Fast-Fourier transform (FFT) was used to compute the fine-grained similarity matrix to produce the mosaic results. Further matching algorithm was applied to improve the mosaic result.

Choi et al. [66] proposed a template-based image mosaics. They used arbitrary-sized tiles (different shape and rotated tiles) rather than circle or square tiles. They implement the mask technique and adopted stackable technique to represent the fine detail of a source image automatically. This process has many steps for mosaic generation. It consumes more time than traditional mosaics generation [58,59,60,61].

Modified motion segmentation for animated classic mosaics from video was proposed by Liu and Veksler [67] in 2009. It was the first attempt to make an animated classic mosaic from video. This work effectively gives the animated mosaic effect to real-video stream, and proposed algorithm effectively handles the occlusion and disocclusion with the minimal user interaction.

Surface mosaic synthesis with irregular tiles was proposed by Wang et al. [68]. It was the most recent and effective work in the domain of NPR and image vision. They developed continuous and combinatorial optimization techniques to improve tile arrangement. Continuous optimization iteratively partitions the base surface into estimated Voronoi regions of the tiles (Tight fit of tiles). To increase surface coverage and bring in more diversified tile, combinatorial optimization was carried out. Tile permutation and hole filling for clear fitment of tiles and shape matching technique were applied for pleasing mosaic results.

2.2 Image analogy

Image analogy (color and texture transfer-based rendering) is another NPR technique which facilitates the transfer of color, and texture from source image to target image. Another technical name for transferring of color and texture is called as image analogy.

According to Chris Eliasmit, image analogy is defined as “A systematic comparison between structures that uses properties and relations between objects of a source structure and properties and relations of target structure” (Chris Eliasmith. dictionary of Philosophy of Mind. http.//artsci.wustl.edu/philos/MindDict/). Color or texture transfer-based rendering or image analogy technique was pioneered by Hertzmann [69]. The entire image analogy operation takes three input images. Image A is unfiltered, A’ Filtered image, and B is unfiltered target image. Filtered image was applied to targeted image to create image analogy. It was represented as A:A’:B:B’. By supplying different types of images as input, the image analogies framework can be used learning filters for different types of applications as follows.

  1. 1.

    After inception of image analogy, the image quality results got enhanced which gave improved results in terms of image properties, which, otherwise, use to be done using simple image processing filters such as blur filter and emboss filter taken from Adobe Photoshop.

  2. 2.

    Image analogies can be used for the application of super-resolution and to effectively hallucinate so as to render more detail in low-resolution images.

  3. 3.

    Image analogy is used for high-quality texture synthesis, texture transfer, and texture by numbers. Texture transfer could be accomplished through heuristics-based algorithms or analogies-based style transfer or by Neural Style Transfer.

In the following paragraph, important literature work on image analogy is highlighted.

Hertzmann et al. proposed an image analogies [69] based on Gaussian multi-scale representation in continuance with vector feature. For matching purpose, Approximate Nearest-Neighborhood search (ANN) and tree structure quantization was used to produce the good image analogy output. However, neural network approach for texture and style transfer is the most sought after techniques in the present time. It is the most emerging field which effectively transfers the style and texture on target image and effectively overcomes the drawback of traditional texture and color analogy.

Salesin et al. [70] proposed the computer-generated watercolor based on the Kubelka–Munk compositing model for simulating the optical effect of the superimposed glazes. This work impregnated three different applications, namely, interactive water paint system, automatic image water colorization, and rendering of 3D scene in one framework which, otherwise, is to be done discretely.

The work of Reinhard et al. [71] on survey of color mapping and its applications highlights the classification, color transferring, and mapping techniques. Color mapping had received a lot of attention in the computer vision, image processing, and computer graphics groups. In this survey paper, the author vividly highlighted the industrial application and challenges, in the domain of color mapping which renders the effective shapes from HDR or LDR images.

Gooch et al. [72] proposed color transfer between images. The work involved the manipulating RGB images and proposed a sensible method for converting RGB signals to Ruderman’s perception-based color space lαβ [73]. This conversion effectively transfers the feasible color between images.

Sykora et al. [74] proposed an unsupervised colorization of black-and-white cartoons by adopting outline detection technique. It was used to extract the boundary outline. Color-based example or image analogy techniques like unsupervised image segmentation technique was used to separate the foreground homogeneous regions and texture background region. Segmentation approach effectively preserves cartoon outline and edges. Color prediction algorithm was used to transfer the color from original colored regions to black-and-white target image. This work is especially suitable for cartoons digitized from classical celluloid films (Fig. 10).

Fig. 10
figure 10

Color has been applied on the grey-scale image in the middle without user intervention using left color image as an example (image courtesy: Komrz, Universal Production Partners & Digital Media Production [74])

Neumann and Neumann [75] proposed color style transfer techniques using hue, lightness, and saturation histogram matching based on 3D histogram matching and sequential chain of conditional probability density function. The gamut of parameter range was fairly accurate and basic perceptual attribute was used (hue, lightness, and saturation) to suppress the unwanted gradient and artifacts. Diverse histogram algorithm was applied to ensure the smoothening. Balanced color spreading and relative lightness were ensured by sub-linear function. It is an extended work of image analogy [69] and useful to produce the scientific visualization.

Li et al. [76] proposed user-guided fast colorization using edge and gradient constraint. This work mainly focuses on the boundary and edges which are the important features for fast colorization process. Edge and boundary colorization process takes much computation time. This work uses the YUV color space. U and V components are calculated, and Y component was used to calculate the intensity of grayscale image. This work resulted in the development of the novel-distance algorithm (2D dynamic programming graph search) which considered the edge, gradient, and boundary information. Novel-distance (2D dynamic programming graph search) algorithm suppresses the confusion between the two region color differences, and used to produce feasible and used to render feasible output production. This work can be also used for scientific colorization and colorization of black-and-white movies.

Greenfield and House [77] proposed a simple and fast image recoloring method based on segmentation, color balancing, and canonical representation of color palette. Youngha Chang et al. [78] proposed example-based color transformation of image and video using basic color categories based on perceptual color category technique. It categorized pixels into perceptually agreeable color categories, measuring characteristics of color spread in each category and finding matching color value from the same category of reference image for each pixel in input image, thus resulting in replacement of pixel color value. Color transformation is the most important feature of photo-editing or video post-production tools, because even slight modifications of colors in an image can strongly increase its visual appeal. Their goal was to develop a technique that transfers colors of an image with less user interaction.

Wang and Wang [79] presented a work on image-based color ink diffusion rendering based on three fundamental approaches such as feature extraction, Kubelka–Munk (KM) shading technique, and shading ink dispersion. This work has non-SBR approach, and can accept the color images and automatically convert the same to the diffused ink style with visually pleasing appearance and, thus, resolving the drawback of conventional Chinese ink simulation.

Bae et al. [80] presented a two-scale tone management for photographic look, based on bilateral filter (BLF) and tone adjustment technique. BLF preserves the boundary features and the tone adjustment technique manipulates the luminance channel to produce visually pleased stylish photography. Similar type of work was done by Lischinski et al. [81].

Pu et al. [82] proposed Yunnan stylized rendering based on Yunnan heavy color painting algorithm and set of texture library. It was substantial shaded painting originated from Yunnan, China. They prepared the set of photographic images and applied the segmentation to divide the photographic images into constituent regions. The work was focused on extracting the outline of the dominant object boundary using edge detection technique. Texture synthesis was performed based on the texture library and image segmentation. Finally, image fusion was performed by combining the several synthesized layer (outline extraction). Mean-shift filtering and brush strokes were applied to image fusion output, thus rendering effective stylized image.

Su et al. [83] proposed an optimization frame work for color transfer based on multi-scale gradient-aware decomposition and color distribution mapping technique. Formerly, gradient-aware decomposition technique was applied to separate the target image into base and detail layers. All mapped base layers were collected with corresponding boosted detail layers to get the final output. Color distribution technique was applied for the distribution of uniform color over an image. The work could produce the crumb free color mapping and effectively balance the color enhancement details. Main limitation of this work is in getting the balanced color transferred, when target image contains many areas with similar colors. Color transformation takes more iteration and requires huge computational power. This work overcomes the problems caused by the bilateral filter (BLF) [80].

Texture-based approach for hatching color photographs was proposed by Yang et al. [84], based on smoothing technique and triangular mesh. Triangular mesh reflected the structure of an input image as well as to construct the drawing direction. Hatching pattern was created simultaneously choosing suitable color, based on the input image color, and combines the hatching texture pattern and color adoption. Finally, the drawing direction and hatching pattern were merger and applied over the smoothened image to get better texture visibility.

Wang et al. [85] presented a data-driven image color theme enhancement. This work consisted of offline phase and runtime phase. In offline phase, prior knowledge of texture–color relationships was obtained using probability density function (PDF). Probability density function mainly involves texture clustering and building histogram for texture. Data-driven knowledge extraction techniques used the CIELAB color space. In runtime phase, segmentation operation was performed over input image and color theme database was chosen. Finally, merge the obtained results along with optimization solver. By this way, input image color theme was enhanced effectively.

Jeschke et al. [86] presented estimating color and texture parameters for vector graphics. Diffusion curves are a powerful vector graphic representation that stores an image as a set of 2D Bezier curves. This work was based on automatic Diffusion Curve Images (DCI) and automatic diffusion curve coloring algorithm. DCI preserved the dominant boundaries features, color gradient information, and shape features, and DCI stored an image as a set of 2D Bezier curves. The newly proposed automatic diffusion curve coloring algorithm smoothed the colored regions and protected the sharp border curves. At last, they applied the Gabor noise which represented the effective texture (Fig. 11).

Fig. 11
figure 11

Top row from left to right: original image, coloring result by Orzan et al. [167], new coloring result, automatic noise fitting result. Bottom row: input curves, the 665 color points by Orzan et al. [167], the 283 color points by the new algorithm, final result after manual editing (courtesy: Jeschke et al. [86])

Pouli and Reinhard [87] proposed a progressive color transfer for images of arbitrary dynamic range, based on histogram equalization and color transferring techniques. This work was suitable to transfer the color from HDR image to LDR target image. Via Histogram space equalization can effectively match the HDR input image color to LDR target image color. It produced the artistic tone reproduction effectively.

Kang et al. [88] proposed a perceptually based color assignment. In this work, the gray scale-segmented image and image region intensity were kept constant. Using color harmony model, colors were applied over an image. This algorithm allows the user to choose the color palette and automatically apply over gray scale-segmented image. Another color manipulation work done by Lin et al. [89] presented palette-based recoloring which manipulates the color using generated color palette. Shading vector art by color palettes was based on probabilistic model. Work anticipated the dispersion of properties such as immersion, delicacy, and differentiation for singular regions and nearby regions.

Rosin et al. [90] presented an example-based image colorization using locality consistent sparse representation. Main aspiration of any image colorization was to produce the natural and pleasing color images without intervention of human being. Sparse-based color transformation technique was first time adopted in colorization of images and super pixel segmentation was performed based on the color values. Gray scale image was segmented using luminance information. Low-level intensity feature and mid-level texture features were extracted, and high-level semantic and colored features were extracted from both gray scale and reference image. These features were taken to colorization of target image from reference image and for this purpose sparse matching technique was used to ensure the perfect colorization of target image or gray scale image. If the target and reference images were the same scenery and captured under different lighting condition, effective colorization is possible. If the target and reference image were captured under different lightening condition, corresponding output may not be feasible. Finally, luminance guided joint filter was applied to preserve chrominance image and edge structure features.

Okura et al. [91] proposed unifying color and texture transfer for predictive appearance manipulation. In this work, color transformation was done via per patch affine color transformation. Patch matching algorithm, texture synthesis, and segmentation algorithm were applied to produce the best result.

Alvarez et al. [92] presented an exploring the space of abstract textures by principles and random sampling. In this paper, they applied Multi-Layer (ML) texture. ML texture defines well-defined rules to position the various texels. Layer configuration parameters were defined using Multi-Layer Texture Sampler (MLTS). Parameters were fully specified for the multi-layered object structure. Geometric structure function was applied to render the image by specifying the image parameters such as image size, color palette image, transparency value for each object, and light spot distance. Output mainly depends on Texel type. This work is more suitable in entertainment and film industry.

Hristova et al. [93] presented a style-aware robust color transfer, based on two features such as light and color. Initially, CIELab color space was applied over input and reference images. Gaussian-distributed cluster was applied to output of CIELab color space to get main features from given image. Obtained features were merged using four mapping policies for different set of image pairs, namely, Light to Colors, Colors to Light, Light to Light, and Colors to Colors. Finally, to transfer the style and color from reference image into input image, parametric color transfer and a local chromatic adaptation transform technique were applied. This work is suitable for visual representation application (Fig. 12).

Fig. 12
figure 12

Output of style-aware robust color transfer (courtesy: Hristina Hristova et al. [93])

Risser et al. [94] presented stable and controllable neural texture synthesis and style transfer using histogram losses. This work mainly used the multi-scale pipeline-based convolution neural network (CNN) for texture synthesis. They developed the histogram losses technique to transfer the style to targeted image. Nowadays, deep convolution neural networks have been demonstrated. Vivid improvements in performance for texture synthesis, style transfer, computer vision applications such as object classification, detection, and segmentation were possible due to this technique.

Hensman and Aizawa [95] proposed cGAN-based manga colorization using a single training image. Manga is Japan comic format. This work automatically colorizes the manga image. They took monochromatic gray scale image and colored reference image, and using conditional generative adversarial networks (cGAN), manga images were generated. Screen tone technique was applied on target image for shading and preserving the texture. Furthermore, segmentation technique was applied to extract the dominant features of the image. Finally, quantization technique was applied to increase saturation of monochromatic image and removal of noise together with shading technique to smoothen the output. This work requires less computational power because of minimal database size.

Liao et al. [96] presented visual attribute transfer through deep image analogy. This work involves transferring of color texture, tone, and style from one image to another image. Input image A and filtered image Bʹ were taken, and VGG19 deep convolution neural network was applied over A and Bʹ for feature extraction. Nearest-neighbor field (NNF) and reverse nearest-neighbor field (NNF) were applied to selected features. Based on the obtained matching features via NNF, image was reconstructed and upsampling the NNF for refining the obtained result. For a few sets of image like cartoon dolls, the presented work may not produce the good result. This approach facilitates various visual transfer attribute applications such as photo to style, style to photo, style to style, and photo to photo. Performance evaluation was done through SNR and user opinion pool.

Elad and Milanfar [97] proposed a style transfer via texture synthesis based on Kwatra’s algorithm (include color transformation and segmentation) and CNN was adopted to produced the impressive result. This work fails to produce the good result due to poor palette transfer, poor segmentation of an image, poorly chosen the stylish image, and failure due to non-segmentation (Fig. 13).

Fig. 13
figure 13

Result of style transfer via texture synthesis by Elad and Milanfar [97]

Gatys et al. [98] proposed the style transfer problem based on artistic neural network algorithm. The features are segregated, image contents are merged, and stylization of natural images was done using this algorithm. It provided correlation map that was close to obtain the stylish image. Output was measured in the VGG feature domain. This work is most useful for image manipulation.

Artistic stylization of face photos based on a single exemplar was proposed by Yi et al. [99]. It is fully automatic artistic stylization process. Initially, they developed the semantics-aware color transfer technique, which effectively transfers the color from example image to input image. In the second phase, edge-preserving texture transfer method was used to preserve the edge and shape features, although this work is inexpensive, is suitable only for single image, and takes very long time to produces the result.

Rosin et al. [100] proposed automatic semantic style transfer using deep convolution neural networks and soft masks. They adopted the semantic segmentation rather than hard segmentation, which effectively preserves the more information. The proposed work was more robust, when image regions have similar chances of belonging to multiple object categories. Soft mask was developed using semantic segmentation. Semantic style transfer algorithm mainly included the optimization function which combines the markov random field (MRF) and a Deep CNN model. Semantic image prediction done using RNN-CRF to produce better artistic style transferred result. It can produce the best result for human faces and struggle to produce the best visual result where image consisted of natural scene.

Liu et al. [101] proposed a depth-aware neural style transfer. In this work, they adopted the deep neural network with a certain ground truth. Deep neural network preserves the feature and transfers the style from style image to content image effectively. Earlier approaches, namely, neural network, semantic segmentation, and edge-preserving exemplar approach, suffered due to enhanced time consumption and lake of ground truth.

Gatys et al. [168] presented a texture synthesis using convolution neural networks. This work presented texture model and it is based on VGG-19 convolution neural network (CNN). In the first stage, texture analysis was done using VGG-19 CNN and gram matrix G1. Second stage works on the total loss function and gradient descent technique. Input texture is passed to first stage for feature analysis, and from every CNN layer, gram matrix was generated and white noise is passed on to CNN and lose function. Finally, gradient descent on the total loss with respect to the pixel values is carried out and a new image is rendered that produces the same gram matrices G1 as in the original texture. This work is more useful in the domain of visual information processing in biological systems and physiological investigations.

Ulyanov et al. [169] presented a texture networks involving feed-forward synthesis of textures and stylized images. Earlier to this proposition involving synthesis and stylization, the same work used to be carried out using deep neural networks for generating images having complexities. Texture network comprises of two phases: in the first stage, texture synthesis generator network was created using convolution block (convolution block contains multiple convolution layers and non-linear activations and the join block upsampling and channel-wise concatenation) and texture synthesis was done using statistical features of references image and passing these texture features to descriptive network. Stylization is carried out in the descriptive network phase using deep learning network. Purpose of stylization is to match simultaneously the visual style of a first image, captured using some low-level statistics and the visual content of a second image, and captured using higher level statistics. Compared to Gatys et al. [168], the work requires less computational power and renders more attractive result.

Huang and Belongie [170] presented a work on arbitrary style transfer in real time with adaptive instance normalization. This work presented adaptive instance normalization (AdaIN) layer algorithm that aligns the mean and variance of the content features with those of the style features. Initially, input image and style image are fed to VGG-19 network to extract and encode the features, and then, these features were encoded to AdaIN layer (AdaIN layer is used to perform style transfer in the feature space) to transfer the style over an input image. Finally, the obtained result from AdaIN is given to decoder to perform style transfer in the final desired image. This work uses the VGG encoder to find out the content loss in the input image after style transfer. This work directly aligns statistics in the feature space without changing the image pixel in the first shot, and then inverts the features back to the pixel space. Compared to Gatys et al.’s [168] approach, this work is much faster and produces the better result.

Kudlur et al. [171] presented a work on learned representation for artistic style based on conditional instance normalization and VGG-16 style transfer network. Conditional instance normalization is simple, more efficient, and customized modification of style transfer network that allows them to model multiple styles at the same time is possible. This work was further enhanced by Postma and Van Noord [172]. They presented a learned representation of artist-specific colorization based on CNN and make use of conditional normalization scheme. This work takes the gray scale image as input and automatically produces the colorized image as output. Output of this work was based on CNN conditional normalization. CNN architecture comprises of convolution layer to encode the gray color feature and decoder does upsampling technique to integrate lower level features at a higher spatial resolution to upsampled higher level features. This work is most suitable for cultural heritage applications.

Wang et al. [173] presented multimodal transfer: a hierarchical deep convolution neural network for fast artistic style transfer. Multimodal transfer architecture was based on style subnet, enhance subnet, refine subnet, and after getting the results from these subnets, subnet total loss of individual subnets was calculated to assess the quality factor. Multimodal network mainly consists of RGB color and CIE luminance space. Style subnet is used for luminance-color joint learning for addressing the issue of preserving small intricate textures, extract the dominant features, and transfer the style over input image. The style subnet result is passed on to enhance subnet for boost-up the sampling image features and for further enhancement in the stylization. Furthermore, the output obtained from enhance subnet is passed on to refine subnet which minimize the losses and results in further improvement of earlier stage output.

Bala et al. [174] proposed deep photo style transfer technique to transfer the style from reference image to input image to produce Photographic style transfer. The proposed approach was generally motivated from Gatys et al. [168]. The work includes Matting Laplacian to constrain the transformation from the input to the output, to be locally affine in color space to avoid the distortion. Feature extraction purpose of VGG-19 deep learning convolution network was used for labeling purpose. Semantic segmentation further drives more meaningful style transfer, thus, yielding satisfying photorealistic results in a broad variety of scenarios, including transfer of time of the day, weather, season, and providing artistic edits of the same. When input image and reference image are too complex and diverge, this work may not fetch the reliable result.

2.3 Region-based techniques

Region-based techniques mainly focus on image abstraction. Abstraction opens the path in the field of stylization and rendering such as stained glass rendering, Voronoi methods, tessellation, pseudo-cubism, and visual stylization through segmentation and other NPR techniques. It is a low-level image processing process, started in the year 1990s and came to forefront in the year 2002. Region-based techniques include segmentation, intensity gradient, moments, optical flow, etc. This approach is effective in cartoon and artistic stylized images which were present in the empirical literatures [5,6,7,8,9]. High-level abstraction was adopted for image to achieve effective rendering of an image. High-level abstraction comes at the top of the image hierarchy or a particular emphasized region in an image. Hierarchical level of image was obtained by segmentation process.

Eye-tracking gazette was also adopted in these works to assess the important region seen by users and abstraction results produced by the NPR algorithm, thereby effectively comparing the algorithm effectiveness throughout the research work. An additional investigation method such as image pyramid, intelligent human interactive procedures [7], and significance maps [11] are utilized to obtain better result from low-resolution images. Region-based technique also involved in video stylization, 3D video stylization and appearance of image into cartoon and animated like video. Video stylization principally adopted the segmentation and SBR approaches to get best abstracted output. Region-based techniques for image abstraction and stylization effectively preserve the feature within the image boundary while retrieving the shape of an image, form, and composition of regions. Following paragraph highlights the significant literatures on region-based techniques.

Agarwala [102] presented a work on SnakeToonz. It was a semi-automatic approach to create cel animation from video. The traditional animation procedure requires highly skilled artist, but Snake Toonz interactive system facilitated to children and untrained professionals to create their own 2D cartoons from image and video streams. Work adopted gradient as preprocessing technique, image frame was extracted and de-interlaced it. Susan structure-preserving noise reduction filter was applied to each image frame, and by this way, strong edges were preserved. User sketch is the cartoon based on first frame of video and this sketch is mainly obtained by connecting piecewise cubic Bezier splines, aligning the adjacent edges, curves, and end points This procedure is called as snapping. Cel animation consumes less bandwidth while transferring the stream over the Internet. This proposed work may not be suitable when image contains the crowd scene and not suitable for realistic caricature representation.

Agarwala et al. [103] presented keyframe-based tracking for rotoscoping and animation. It was semi-automatic approach to create animation from video. Rotoscoping is an animation processes that animator use to trace the motion picture footage frame by frame to produce realistic action and used to create 2D animation from captured video. They used linear interpolation technique to generate roto-curves in between the key frames. After rotoscoping the first frame, user traces the next subsequent frames. Piecewise cubic Bezier curve approach was used to place the control points for rotation, translation, and scaling. Optimization takes place using Gaussian pyramid and user-guided animation was required to build the best system. Finally, colors were filled to rotoscoping region by applying a stylistic filter. This method requires minimum energy to produce the best result. However, it may fail to preserve the curved boundary.

Bousseau et al. [104] proposed interactive watercolor rendering with temporal coherence and abstraction. Water color was created by human being and significantly unique compared to the traditional painted color system and may be considered for exploring the different interactive watercolor rendering application. Work comprises of two steps, first being the image abstraction techniques like mean-shift segmentation, toon shading, and average smoothening techniques followed by water color techniques like morphological filtering, color correction, and pigment dispersion techniques to synthesize a water color rendering effect. The proposed work ensured temporal coherence and avoids the unwanted flickering. This work may not be compatible when image consists of too much complexity.

Bousseau et al. [105] proposed a video water colorization using bidirectional texture advection. It was extension work of previous Bousseau et al. [104]. The previous work was extended by combining texture advection and mathematical morphology. Finally, they applied optical flow feature technique to reduce the degree of distortion. Unlike previous method, this method is suitable for both image and video. Nevertheless, this method consume more time for producing the water-colored temporal coherence effect (Fig. 14).

Fig. 14
figure 14

a The original photographs. b The Watercolor-like result (courtesy: Adrien Bousseau et al. [104])

Wang et al. [106] presented video stylization for digital ambient display of home movies. This work inputs the small clips and creates the ambient display using video stylization techniques. They used a novel mean-shift segmentation algorithm to segment the frame to achieve video frames into temporally coherent colored regions. Motion flow was calculated using the prior information of previous frame and current frame information. Frame propagation was calculated by considering the current as well as previous frames using affine transform and using RANSAC search based on SIFT features. Gaussian filter was applied for the removal of noise and simple linear blending technique was used for transition. Finally, cartoon effect was achieved using novel mask and Laplacian of Gaussian (LoG) filter. Advantage of this work is automatic digital ambiance display frame work. This work may not be suitable for a very lengthy video.

Kagaya et al. [107] proposed video painting with space–time varying style parameters, based on painterly rendering algorithm that makes use of the segmentation, style parameters, optical flow, as well as the tensor field. Segmentation technique segmented the image into the foreground region and background region. A brush stroke from the previous frame is reused in the next frame after being moved according to the optical flow. Tensor field corresponds to a pattern that can be specified by the radial basis function. An advantage of this work is that certain object in scene could be emphasized or deemphasized.

Region-based technique focused on retaining the shape and extracting the salient features is done using hierarchical-based region segmentation techniques.

Salisbury et al. [108] presented scale-dependent reproduction of pen-and-ink illustrations, based on image reconstruction interpolation, attenuating function, and weighted average convolution technique. This work generates varies size low-resolution gray scale image into desired size rendered image in shortest time span. Image reconstruction interpolation method enhanced the image size together with the visual quality. Attenuating function reduced the noise and weighted average convolution was applied to preserve the shape of the object in an image.

Collomosse et al. [109] proposed arty shapes. In this work, multi-scale image segmentation and shape fitting techniques were applied to create abstract art. Simple shape fitting algorithm was used to fit the geometric shapes such as triangles, rectangles, circles, super ellipses, and convex hull, and transformed them into canonic form and inverse transformation was applied on them. Decision tree algorithm was used to automate the work and quality output depends on number of segmented regions obtained.

DeCarlo et al. [7] proposed stylization and abstraction of photographs, based on mean-shift color image segmentation. The technique transforms images into a line drawing style. Segmentation algorithms effectively represent the segmentations region into a meaningful hierarchy and also suppress the noise. Gaussian filtering was applied for further smoothing of edge of the resultant image. It is a novel approach to recognize the meaningful elements of the image structure using human perception model and recording the user eye movement using eye tracker (Fig. 15).

Fig. 15
figure 15

a A source image (1024X688) and fixations gathered by the eye tracker. b The resulting abstracted line drawing after smoothing [7]

Bangham et al. [110] presented the art of scale space. This work carefully uses the anisotropic diffusion filter, morphological filter, and Gaussian filter to suppress the unwanted details and explore the image conversion computer vision application such as 3D image into 2D image. Morphological filter preserves the edges and also provides the good connectivity between edges. Anisotropic diffusion filtering precisely suppresses the unwanted detail and preserves the dominant features. Gaussian blur scale space filter is used to explore the defocus application using NPR filtering.

Mould and Grant [111] presented automatic halftone stylized black-and-white images from photograph, based on halftone technique, adoptive thresholding, and graph-cut segmentation. Halftone converts rich graphical image into low-level black-and-white solid image pattern. Adoptive thresholding technique converts the gray scale image into binary image. Finally, they used graph-cut segmentation to remove the isolate elements and to build the composing layer. This work fails to produce the best result, when the image consists of rich texture and irregular patterns.

Zeng et al. [112] presented image parsing to painterly rendering system, based on graph-cut segmentation. It splits the given color image into foreground and background region. SIFT feature extraction was applied to classified the segmented regions. SIFT region was assigned to layers of different depth. Rendering sketch was done via primal sketch algorithm. Sketch orientation was achieved by linear integral convolution technique. They developed huge brush stroke library consisting of variety of texture, size, shape, shades, shadows, and opacity. Stroke placement algorithms were applied over an image. Finally, color enhancement was done by image-level color transfer algorithm [113]. In future, this work has to adopt more expressive brush dictionary to develop best rendering interactive system. Similar type of work has been done by Zhu and Zhao [114]. They presented an interactive system called as Sisley. This work is based on the work by Zeng et al. [112]. Although this work is similar to the earlier quoted work [112] representing the images into hierarchical manner and produce artistically rendered image, computationally, it is faster, and within less time (not more than 60 s), an amateur user can create best stylization paintings from given photographs (Fig. 16).

Fig. 16
figure 16

From image parsing to painterly rendering. a Artistic photo of a young lady’s. b Painting image with color enhancement (courtesy: Zeng et al. [112])

2.4 Image processing and filtering

Image processing and filtering techniques are used to develop image abstraction and stylization. Image abstraction is a common technique for an application, where it is necessary to reduce the unwanted detail and keep only relevant information for storage. Image abstraction and stylization are a vehicle to express the nature of image content in an effective manner. Image abstraction is the task of producing simplified images from given images by removing unwanted details while retaining meaningful information in it. Removal of irrelevant information in a scene image reduces the size of an image and enhances its clarity, as well. Using NPR filters, abstracted images were further refined into stylization coherent images. In this approach, user can integrate the filters serially and/or parallelly to produce best coherence abstracted and stylization result.

In a few cases, image enhancement and abstraction purpose make use of shock filter adopted in this work by Stanley Osher and his team [115]. It was based on non-linear partial deferential equation with independent timing.

Doellner and Nienhaus [116] proposed edge-enhancement NPR approach, based on G-Buffer, and it extracts edge map construction and edge discontinuity from 3D shapes. This work is most useful for representation of technical illustration, edge enhancement, and cartoon rendering.

Kang et al. [9] proposed a flow-based image abstraction technique, based on Flow-based Bilateral filter (FBL) and Flow-based Difference of Gaussian (FDoG) filters guided by Edge Tangent Flow (ETF) that describes the flow of salient features in the image. FBL filter conveys apparent and enhanced shape boundaries, and FDoG filter preserves the strong line and edges. This method may fail to produce the best result, when the image consisted of large number of irregularities and random noise in a poor intensity images.

Kang et al. [117] proposed image and video abstraction using Anisotropic Kuwahara filter. This filter effectively removes the details in high-contrast edges by preserving shape boundaries in low-contrast region. However, the Kuwahara filter is unstable in the presence of high noise in the source image and suffers from block of artifacts.

Raskar et al. [118] proposed an image fusion for context enhancement and video surrealism. This work effectively suppressed the aliasing, ghosting, and haloing by adopting gradient image fusion approach to constructed low-quality image regions. This method improves the low-quality picture or video by combining/manipulating the proper pixels.

Winnemoller et al. [8] proposed a real-time video abstraction. Initially, bilateral filter was applied iteratively during abstraction process to smoothen the image and to extract the boundary information. The feature space conversion from RGB to L × a × b was also accomplished through anisotropic diffusion. Feature space is essential for simplification of low-contrast region in an image and enhancement of rigid high-contrast region in an image to protect the image features. The Difference of Gaussian filter (DoG) was used to extract dominant lines, and luminance quantization was applied to reduce high-illumination contrast and noise in the image. The emerged base wrapping techniques were applied to merge low dominant edge with high dominant edges, and this technique improves the overall sharpness of an image and preserves the important edges. The method has shown good performance for images which do not have complex background and poor illumination.

Redmond et al. [119] proposed adoptive abstraction of 3D scenes in real time for producing stylistic abstraction of a 3D photograph. The method used Kuwahara filter in conjunction with Difference of Gaussian filter (DoG) and flat shading technique. Kuwahara filter simplifies shape, edge, and color simultaneously while preserving important features. Difference of Gaussian filter (DoG) preserves the edge information in a given image. The limitation of this method is that the Kuwahara filter sampled region may fail to preserve the important features, when an image was captured in poor light intensity. Specifically, flat shading technique highlights the saliency region of interest and rendered the each object with distinct color in an abstracted image and blurs the unwanted background information and contents. Finally, this work overlaid DoG edge image output over Kuwahara filtering image to produce the best 3D scene abstracted image.

Mould [120] proposed texture-preserving abstraction, based on variant of geodesic image filtering which preserved the local strongest edges and conserves the weak edges depending on the surrounding context.

Kyprianidis [121] proposed image and video abstraction by multi-scale Anisotropic Kuwahara filtering system. Anisotropic Kuwahara filter is an edge-preserving filter, useful for creating abstracted and stylized images or videos. In this work, two limitations of the Anisotropic Kuwahara filter are addressed. First, it is shown that, by adding thresholding to the weighting term computation of the sectors, artifacts are avoided and smooth results in noise-corrupted regions are accomplished. Second, multi-scale computation scheme was recommended that simultaneously propagates local-orientation estimates and filtering results up a low-pass filtered pyramid. This work allows for a strong image abstraction effect and evades artifacts in large low-contrast regions. Therefore, this approach needs more computational power.

Kyprianidis et al. [122] proposed Anisotropic Kuwahara filtering with polynomial weight functions for image and video abstraction. The method was based on Anisotropic filter, Kuwahara filter, and polynomial weight function. Anisotropic filter was applied during abstraction process to suppress the unwanted information and to preserve the local structure oriented information. Cartoon and stylization effect was accomplished through Kuwahara filter. Polynomial weight function was adopted to highlight the discontinued weak edges and simplifying the complex texture. This technique improves the abstraction performance in terms of stylization and feature enhancement.

Jung et al. [123] proposed a method that reproduces oil painting-like image from a source picture based on a virtual light and NPR technique. To generate oil painting-like image as an output, the system first performs stroke distribution to determine where brush was located for stroking. Then intermediate image was constructed using suitable color, orientation, and size of brush at each stroke point by applying edge detection and image segmentation on input picture. At last, light effect was rendered on the intermediate image resulting in the generation of oil paint-like image. This approach may not produce the best abstracted output for all kinds of image sets, specifically, GIF image format and noisy images.

Kolliopoulos et al. [124] proposed segmentation-based 3D artistic rendering. The method was based on normalized cut segmentation, temporal coherence, and toon model. These methods were applied to produce the 3D artistic rendering effect.

Kang et al. [125] proposed image and video abstraction by coherence-enhancing filtering. The proposed approach was based on adaptive line integral convolution and structure tensor smoothing in combination with directional shock filtering. The smoothing process regularizes directional image features, while the shock filter provides a sharpening effect. Both operations are guided by a flow field derived from the structure tensor. To obtain a high-quality flow field, this work presented a novel smoothing scheme for the structure tensor, based on Poisson’s equation. The approach while preserving the overall image structure effectively regularizes anisotropic image regions and results in achieving a consistent level of abstraction. Moreover, it is suitable for per-frame filtering of video and can be efficiently implemented to process content in real-time, but fails to produce the best output to HDR images.

Williams and Green [126] proposed Mr painting robot artistic system for producing stylistic abstraction of a photograph based on brush stroke methodology. The method used Laplacian edge detection algorithm in conjunction with snakes algorithm and artistic interpretation. The proposed paradigm produces the best visual rendering effect to human visual system, but fails to address the rigid images and image having low-contrast regions.

Kang and Lee [127] proposed shape simplifying image abstraction method for producing stylistic abstraction of a photograph. The method used mean curvature flow in conjunction with shock filter to simplify both shape and color simultaneously. However, an obvious limitation of this method is that the curvature flow contracts small circular shapes very quickly. If the circular shape is of high importance, it needs to be masked before running this algorithm.

Papari et al. [128] presented an artistic edge and corner enhancing smoothing. They adopted the adoptive filtering for preserving edges, corners, and suppression of noise was done using modified Gaussian Kuwahara filtering. This work ensured and generated the artistic images.

Yang [163] presented recursive bilateral filtering based on modified first-order recursive bilateral filter. Filter has preserved dominant edges, and most useful to explore the stylization, tone mapping, and structure enhancement applications.

Winnemoller [129] presented their work on XDoG filter and claimed to have enhanced the performance of previously proposed DoG filter [8]. XDoG was used for synthesizing the line drawings and cartoons. Furthermore, this work was extended and refined by Kyprianidis et al. [175].

Swamy and Pavan Kumar [130] proposed an integrated filter-based approaches for image abstraction and stylization. The proposed method integrates 2D anisotropic filtering, DOG filtering, modified shock filtering, mean curvature flow, and dithering with fixed deviation to produce the abstracted and stylized image.

Kang et al. [131] proposed a coherence line drawing system based on edge tangent flow and flow-based Difference of Gaussian filter (FDoG). Edge tangent flow (ETF) preserves and estimates the local orientation of an image. FDoG filter is iteratively applied to extract the lines effectively from images to produce best line drawing. This method may not produce effective line drawing when the image consists of low-contrast region and low-contrast edges. Similar type of work “Depth-Aware Coherent Line Drawings” has been done by Spicker et al. [132] 2015. They extend the coherent line drawing work of Kang et al. [131] and developed the adoptive ETF and FDOG filter to develop the line drawing. Particular aspect of ETF lies in adopting huge radius r which enables preservation of important details in image region.

Swamy and Kumar [133] proposed line drawing for conveying shapes in HDR images, based on global tone mapping operator, 2D anisotropic filter, coherence modified shock filter, weight parsing, and bilateral filtering. Global tone map operator is used to suppress the high-luminance image region and helpful to convert the HDR images to LDR images. 2D anisotropic filter preserves the local structure orientation features and suppresses the severe noise. Weight parsing technique was used to highlight the outline boundary. Bilateral filter was used to cluster the image back ground. The main drawback is that it consumes more time to produce the result.

Mould [134] proposed image and video abstraction using cumulative range geodesic filtering. He designed a dedicated mask for individual pixel by considering the nearest n pixel in cumulative range variation of geodesic filtering. System accepted the input image and produced the abstracted image and preserved the small- and medium-scale details. This approach preserved the texture in abstracted image where original image was textured. Quality of the output is depending on mask size n and weight. However, the technique took more time to produce the result (Fig. 17).

Fig. 17
figure 17

a Input image. b Output of cumulative range geodesic filtering (courtesy: Mould [134] )

Gomes et al. [135] proposed Non-photorealistic neural sketching. This work has involves three major stages. Initially, retinex and gray world algorithm was applied to achieve the color consistence of two color spaces like HIS and YCbCr which facilitated smoothing effect using mean-shift segmentation. In the second stage, edge and non-edge were detected using multi-layer perception-based neural network. Finally, brightness enhancement and histogram transformation were applied for coherent neural sketching. Quality assessment can be done using PSNR, SSIM, and FoM. They compared their output with other algorithms using the above quality assessment techniques. The major drawback of this work is that it takes very long time to produce the result.

Kyprianidis et al. [5] submit literature survey on State of the ‘Art’: a taxonomy of artistic stylization techniques for image and video. Survey provided the information to the field of non-photorealistic rendering and focusing on techniques for transforming 2D image and video into artistic stylization. Literature covers taxonomy of 2D NPR algorithms and filters developed over past 2 decades. Literature then describes the chronology of development from semi-automatic paint system to automatic painterly rendering system. Survey elaborated the image analysis to edge-preserving stylization techniques. This paper fails to address the abstraction and stylization evaluation difficulties and also precise applications. Nevertheless, State of the ‘Art’: A Taxonomy of Artistic Stylization Techniques for Images and Video [5] is the inspiration to write our paper.

User-guided line abstraction using coherence and structure analysis was proposed by Lee et al. [136]. In order to facilitate the line abstraction process, in the proposed work user predefined strokes and rough scribbles are provided on the regions of interest. FDoG filter and canny edge detector were applied to get detailed line drawing. In the second stage, stroke classification technique was applied which consist of coherence strokes and structure strokes. Coherence stroke reproduce the main shape of an object and structure stroke is used to indicate the regions of interest that contain repeated texture patterns, which otherwise is difficult to trace by hand. Finally, line matching algorithm based on graph construction, vertex-wise energy term and edge-wise energy term were applied. Algorithm takes the coherence and structure stroke results from stroke classification algorithm output and matches best with the generated data set/ detailed line drawing thus producing the coherent line drawing. This work fails to produce coherent results for rigid and curved patterns.

Zhang et al. [137] proposed online video stream abstraction and stylization. The proposed approach detects the importance map and dominant edges using canny detector. In the subsequent stages, the optical flow to know motion of edges is computed. Mean-shift segmentation and HSV color space conversion were applied. Finally, Difference of Gaussian (DoG) was applied for smoothing and noise removal. The proposed work was not suitable for occluded regions, repeated texture patterns, and high-contrast features.

Kang et al. [138] presented an art-photographic detail enhancement. This work effectively enhances the digital photography detail. They proposed the tonal model which significantly boosts the picture structured detail and elaborate the synthetic photographs. Tone model decompose the image into base layer and detailed layer. Tonal model shift operation reconstructs the base layer features and brings out the dark or bright region details. Tone scaling operation refines the detail layer features. L0 smoothening filter was applied to smoothen the image. Finally, objective function was applied to minimize or maximize the image details. This work was most useful for medical image enhancement and analysis.

Kang et al. [139] presented a bilateral texture filtering. This work will be most useful for cartooning, computational photography, and compression of image artifacts as well as halftoning. They adopted joint bilateral texture filtering iteratively to suppress the texture and preserved the structure of an image. Thus, entire bilateral texture filtering process may be considered as to have the following: In the first step, image was smoothened using predefined kernel. In the second step, dominant edges were preserved by applying the tonal range. Therefore, modified relative total variation (mRTV) was adopted to preserve the gradient and magnitude features in situations where tonal range could not be considered which failed to smoothen the texture features in an image. Filter was applied iteratively until user obtained the satisfactory result.

See et al. [140] presented an image abstraction using anisotropic diffusion symmetric nearest-neighbor filter. This work could change the image color space from RGB to CIE. Anisotropic diffusion was applied to suppress discontinued edge and to sharpen the irregular gradient edges. SNN filtering involves 3X3 filter mask. It takes the center pixel and the average of neighboring pixel to preserve the dominant features. Quantization was to be done in the subsequent step. Finally, Multi-image gradient and FDOG filter was applied to preserve the dominant edges.

Mould and Li [141] presented an image warping for a painterly effect. They adopted the simple linear iterative clustering technique (SLIC). SLIC combined the space color and find the distance between color pixels based on the KNN-mean clustering to create super pixel. Developed cluster is standardized in size and separate boundary was effectively visualized between clusters. Spring parameters assigned to the individual pixel for the purpose of quality improvement and minimize the processing time. Rest-length smoothing was applied to further the enhancement of quality of output. Mass spring simulation iteratively applied to alter the plane in an arbitrary way. Triangular wrapping technique further exaggerated the mass spring simulated image-wrapping result by changing the pixel location. Barycentric interpolation was applied to change the color pixels. Finally, non-linear filters such as anisotropic and box filter were applied to obtain the coherent structure-preserving artistic photography. In this work, main challenge lies in setting the spring parameters and deciding the number of times mass spring should be applied iteratively over an image. Parameter setting process is totally a trial-and-error basis.

Lee et al. [142] proposed scale-aware structure-preserving texture filtering. In this work, the proposed structure-preserving texture filtering included adoptive kernel, relative total variance (RTV), flatness value, and guidance imaging followed by joint bilateral filter with adoptive weight. Initially, Gaussian kernel was applied to preserve the sharp structural edges. Subsequently, directional RTV was applied for kernel scale calculation and texture smoothening. This work calculated the image flatness value to differentiate the texture and structure and passed the result to guidance filtering to smoothen and preserve the structure regions. Finally, joint bilateral filter was applied many number of times to preserve boundary and dominant edges. This work was most suitable for removal of unwanted dots on halftone images and gradient feature preserving applications.

Lee et al. [143] presented a structure–texture decomposition of images with interval gradient. They proposed interval gradient technique based on different types of kernels particularly adopted DOG kernel to discriminate texture from structure. Thus, gradient-based image filtering was designed to conserve the sharp edges and to remove redundant texture patterns. This system failed to produce the best result, when the image contains irregular texture patterns. This work was most useful to explore application like HDR compression, halftoning, photographic structure enhancement, and compositions.

Jing et al. [144] presented a second-order variation-based bilateral filter for image stylization and texture removal. This work consisted of two steps. Initially, Gaussian blurring was applied over an image to suppress the texture and high-frequency surface areas, and in the next step, modified bilateral filter was applied with predefined narrative weight function to produce the coherence structure-preserving edge and color tone. To smoothen and suppress the texture over an image surface, they applied general bilateral filter to the output of modified bilateral filter. Very consistent integration of filtering was achieved and parameter setting for filter are found to be very important to get the best result. However, parameter setting process is fully trial-and-error basis.

Deshpande and Raman [145] presented adaptive artistic stylization of images. They said that artistic stylization is nothing, but NPR image abstraction and stylization [7]. In this work, initially they define the threshold saliency algorithm that defines FG and BG mask to differentiate the foreground (FG) and back ground (BG) object. Earlier to this, OneCut segmentation algorithm used to be applied to get automatic FG scribbles and BG scribbles. This approach iteratively applied the guided filter over the obtained FG and BG scribbles. Guided filter protects and preserves the structure edge information. Yet, proper parameter setting must be defined over filter to produce best artistic stylization. Guided filter was non-uniform in nature and foreground image objects are effectively preserved rather than background objects. They also conducted subjective analysis of proposed work based on user’s visual feedback, average user rating, and standard deviation. Image contains embossing graphical foreground and background, and in the instance of foreground and back ground scene being not clear, the work may not give the best result.

Sadreazami et al. [146] proposed iterative graph-based filtering for image abstraction and stylization. Work has consisted of two stages. In the first stage, vertex domain filtering like graph-cut filter was iteratively adapted to preserve the dominant structure and to remove noises effectively. Spectral domain filtering was applied to the previous filter result to smoothen the whole region. Thus, applied filter was unaffected on high-contrast edges.

Mould and Azami [147] proposed detail and color enhancement in photo stylization. The proposed frame work contributes two aspects. First, brightness correction was performed using histogram algorithm. In the next step, structure preservation was done on an image. Initially, cumulative range geodesic filter was applied to smoothen the image. Next, residual map was created by subtracting the original image by filtered image. Based on the distance map, ridges of an image were calculated and to extract the edges canny detector was used. Using poisson solver, best boundary features were selected and elaborated in a precise manner. Color shift operation was done by joining the filtered image and poisson solution output. At the post-processing level, stick filter was applied to extended linear structures. Output of the filter was very bright and shiny, and therefore, they combined the filter result with detailed map to ensure the best abstracted and detail enhanced image. This level of structure preservation was not yet obtained in [142, 143]. Obtained results are too fair and even curved edges as well as sharp bending structures were accentuated. Trial-and-error basis parameters were set and data set images were taken from flicker. If the image size is 512 × 512, the proposed work will take 3 min to display the output. This was the major drawback of this work.

Shakeri et al. [148] proposed saliency-based artistic abstraction with deep learning and regression trees. In this work, image was passed to SALNET. SALNET was based on Random Forest Regressor and it has an alternative approach for neural network. It acts like deep learning neural network and splits the image into many number of small parts. Initially, it splits the image into three parts such as brighter region considered as highest saliency, and lighter region considered as lowest saliency and gray-tone information considered as mid-level saliency. SALNET also identifies the three tone range outlines and the obtained outline images were passed to potrace program to convert bitmapped images into vector graphics. In general, random forest regressor takes the small parts of image with different depth parts and this technique could produce the heat map. Furthermore, obtained heat map was used for image segmentation. After getting the saliency heat map, non-linear filter was applied to smoothen the dominant information and color correction technique which was to be applied. The combined results were passed on to ePainterly system which produces the cubism and artistic abstraction. Novelty of this work was based on regression tree approach rather than neural network approach. This minimizes the execution time. ePainterly system produces the cubism like output, and it may not always attract the end user. Also features may overlap during wrapping cubism.

Wu et al. [149] presented tangent-based binary image abstraction. This work initially converts the color space from RGB to LAB. Bilateral filter was applied to preserve the colored region. They convert the filtered output into grayscale image and parallel to image conversion tangent flow was calculated. Subsequently, LIC was applied to smoothen the region and preserve the edge boundary. DOG was applied to extract the edge–oriented information. They set global threshold value to get binary image abstraction. Finally, non-linear Gaussian smoothening filter was applied to smoothen the image. If the image consists too much complex information and best output results could not be expected.

3 Benchmark guidelines and quality assessment techniques

Benchmark consists of standard set of images. These images are taken as input to NPR algorithm, and then, it gives precise output. Benchmark consists of standard guidelines to define good data sets. Benchmark images are not specific for any particular matters. Preparation of benchmark image sets and quality assessment techniques are most useful for NPR community to evaluate the proposed algorithm performance, unless algorithm assessment become too much complex. In early days, there was no standardized benchmark set for images for subjective evaluation and no quality assessment techniques. When the technology got advanced, it enabled to prepare benchmark image sets with various subject matters. These images which we consider should be very generalize in nature, and they includes wide variety of subject matters such as texture, shape, color, noise, complexity, vivid contrast, and sharpness.

In image processing domain specifically in classification and recognition problem solving, lakhs of standard image sets are required. Whereas, NPR research such as image abstraction, stylization, and structure-preserving problem, proposed bench mark should consist of very few sets of image and covers broad range of scope/subjective matters. Proposed benchmark consists of aesthetically enriched pictures/images and more challenging picture with huge information. Image data sets should consist of animals, moving objects, ancient buildings, statues, HDR images, modern buildings, birds, nature pictures, picture of God, persons with complex background, embossing graphical images, vehicle, etc. Benchmark should also consist of photographic images with real-world scene with utmost complexity. Very important thing in benchmark preparation is to consider and taken the copyright free images and all images should have standard size, For large image, resolution should not be more than 1024 × 768, and for small image, resolution should not be less than 500 × 400, with best aspect ratio is required. Collected images for benchmark should have high-frequency dominant features, mixture of high- and low-contrast information, distinguished edge gradients, discontinued edge gradients, Rich color information and rich texture patterns include both regular structure as well as irregular texture patterns. Whereas image processing domains such as classification and recognition application techniques have the established ground truth, the NPR community has no such ground truth established till today.

In most of the cases, NPR community judges the results by human visual judgment rather than statistical assessment. Visual output may differ from investigator to investigator, for the same set of images taken for image abstraction and stylization. Therefore, it is very difficult to categorize the best visualization output. Thus, there is no standard procedure to compare the obtained abstraction and stylization, resulting in difficulty in judging from amongst various investigator approaches that renders the best abstraction and stylization output. To address this issue, David Mould and Paul Rosin attempted on this issue and came out with a result in their publication “A Benchmark Image Set for Evaluating Stylization” [150]. The proposed benchmark guidelines while taking the above said publication as basis would seek minimal changes in the image set and guidelines. In the presented literature/work, images were taken under various illuminations, vivid color, different background as well as different lightening condition. This, indeed, helps to assess the algorithm performance. The proposed benchmark data set involves sample of 32 images as shown in Fig. 18. 32 image sets are taken against to David Mould and Paul Rosin data set which composed of 20 images. Although, on comparative basis, there appears a difference of 12 images with huge complex type images, the complexity involved with architectural building picture, embossing graphical images and HDR images, nature image, and ancient images are considered, which was not present on the earlier David Mould benchmark concept.

Fig. 18
figure 18

The set of 32 benchmark images (benchmark images are publically available in: https://sites.google.com/site/pavankumarjnnce/google-form/benchmark-images)

We also quantify image property of our benchmark images based on principle guidelines set forth by Mould and Rosin [150,151,152] and their statistical measurements of the benchmark image properties are mentioned as follows.

  • Image natural colorfulness was calculated based on distribution of the image pixels in the CIELab color space and applied singular value decomposition (SVD) is given by Hasler and Sabine [153].

  • Complexity of the image was calculated based on image complexity based on the ratio of compression error and compression ratio as emulated by Machado and Cardoso [154].

  • Image contrast was calculated based on weighted average of local contrast by taking the absolute difference with neighboring pixels as presented by Neumann et al. [155].

  • Sharpness is calculated based on maximum local variation (MLV) of each pixel as the maximum intensity variation of the pixel with respect to its 8-neighbors followed by standard deviation emulated by Bahrami and Kot [156].

  • Edge strength is calculated based on SUSAN model set forth by Smith and Brady [157].

  • Noise is calculated based on local noise estimation and mean gaussian algorithm as given by Immerkær [158].

  • For calculating the standard deviation and mean, we adopted std2() and mean2() MatLab function. Table 1 consists of benchmark data set statistical measurement of image properties.

Table 1 Statistical measurement of image properties for each of the images in the benchmark

Image quality assessment will be much more important in NPR research domain. In this regard, we too strongly adopted five different techniques to assess the image quality. Image quality assessment techniques [135, 159, 160] include human visual perceptual analysis (HVPA), peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and figure of merits (FOM), and mean absolute error (MAE). All these techniques come under the ANOVA test. Human visual perceptual analysis includes people with expertise in media arts and graphics. While the rest of the people being amateur, give the feedback on abstracted and stylized output image. HVPA extort auxiliary data from the scene to estimate the picture quality. After observing the NPR image abstraction and stylization results, feedback on output was obtained from people concerned. Feedback range varies from 1 to 5. 1 is the least rank feedback and 5 is the highest rank feedback. Finally, standard deviation is applied to get consistency of user rating.

PSNR can be calculated with the help of mean square error (MSE). MSE can be measured by taking squared intensity of the input image and the resultant image pixels. PSNR is measured using decibels (db). Higher the PSNR decimal value, higher the resemblance (quality of image is conserved). Image pixel dissimilarity amongst two images is taken. PSNR computes and compares approximate quality of reconstruct/Filtering/resultant image compared with original input image. It can be mathematically expressed as follows:

$$ {\text{MSE}} = \frac{{\mathop \sum \nolimits_{\text{PQ}} [I_{1} \left( {p,q} \right) - I_{2} \left( {p,q} \right)}}{P \times Q}, $$
(1)

P and Q are number of rows and columns. I1(p, q) resultant image at p and q co-ordinates and I2(p, q) source image at p and q co-ordinates:

$$ {\text{PSNR}} = 10\log_{10} \left( {\frac{R}{\text{MSE}}} \right), $$
(2)

where R is the maximum oscillation in the input image, its range is 255 for 8 bit image.

Structural similarity indexing metric (SSIM) [159] is used to get the capacity of luminance assessment, contrast, and auxiliary correlation term. SSIM esteem lies from 0 to 1. SSIM is discernment-based techniques that consider the image deprivation as apparent change in basic structural data, where auxiliary data are the possibility that the pixels have solid interdependencies particularly when they are spatially close:

$$ {\text{SSIM}}\left( {x,y} \right) = \frac{{(2\mu_{x} \mu_{y } + C1)(2 {\text{Cov}}_{xy} + C2)}}{{\left( {\mu_{x}^{2} + \mu_{y}^{2} + C1} \right)(\sigma_{x}^{2} + \sigma_{y}^{2} + C2)}} , $$
(3)

where µx and µy are the mean of x and y, σxσy are the x and y variance, Covxy are the covariance of x and y to measures of similarity of two data samples. C1 and C2 are stabilization factors.

Figure of Merits (FOM) quantify the image statistical property [135, 160] by considering the image intensity, color, and texture.FOM is developed by considering product of the image resolution and PSNR. Initially, FOM finds perfect match among the source edge image and the ground-truth image. FOM interval range is 0 to 1. If FOM obtained result is 1, it indicates more similarity between source and resultant image.FOM can be mathematically expressed as follows:

$$ {\text{FOM}} = \frac{1}{{I_{t} }} \mathop \sum \limits_{i = 1}^{{I_{e} }} \frac{1}{{(1 + e X d^{2} (i)) }} , $$
(4)

where It represents the number of pixel in the original image and Ie number of pixel in the resultant image. It = Max(It, Ie) maximum pixel value will be taken for qualitative assessment. D2 distance between the source and resultant image, e is constant factor whose value will be 1/9 or 1/10.

Mean absolute error (MAE) technique can be used as traditional way to assess the quality of the resultant image. MAE is capable to find out the blurring image consequence in a resultant image. Practically, blurring effect is happening in medical and satellite image processing due to improper aperture and improper focus setting of camera lens. Due to this, MAE was widely used in the color image processing, medical image processing also in satellite image processing. Higher the MSE value in resultant image higher the image quality degradation. Lower MSE value indicates the image quality assurance [159]:

$$ {\text{MSE}} = \frac{1}{\text{PQ}}\mathop \sum \limits_{i = 1}^{P} \mathop \sum \limits_{j = 1}^{q} (F\left( {x,y} \right) - I(x,y), $$
(5)

where P, Q are pixels of rows and columns, I(x, y) original image, F(x, y) resultant image.

In NPR research, we can also adopted the Tukey’s test [135], which facilitates the comparison of the result obtained by different algorithms. Tukey test is most useful for one-to-one quality assessment technique. For NPR algorithm development and quality assessment purpose, MatLab and anakonda tool can be used.

4 Application of non-photorealistic rendering (NPR)

  • NPR techniques are useful in the domain of image processing and computer graphic applications. The gamut of NPR applications is not just restricted to image abstraction which, otherwise, could be done using the traditional methods as is performed in 3D scenes.

  • NPR image abstraction overcomes the complications during image analysis such as eliminating the unwanted features, preserving the dominant features. Nevertheless, it encounters many challenges during HDR image abstraction process such as line drawing to convey the shape from HDR/LDR images. NPR techniques facilitate better image compression operation without using the conventional compression approaches.

  • NPR techniques can serve as well-organized and effective tool in the process of artistic abstraction, when unwanted features need to be suppressed. It effectively represents and renders the images in scientific and technical illustration and resultant images conforms to handmade crafted drawings. NPR technique could be used in direction manuals, representing the reading material, reference book demonstration, and outlines of mechanical parts and specialized data. Advancement in NPR and WWW has inspired and propelled the development of GIS client server application and web mapping application such as Google maps.

  • NPR approach has opened new avenues in the realm of image processing. As an example in the development of e-books, numerous current books highlight hand-drawn illustrations and scanning them for advanced duplicates is not recommended as the duplicate cannot be subjected to further editing.

  • NPR technique could be used effectively for abstraction process when the scope of the abstraction is intended for artistic appearance or an optimized output in a form to convey the need-based requisite information. As an example, a typical NPR rendered medical X-ray or MRI image would facilitate the doctor to convincingly explain to his patients/persons concerned about the specific aspects of the report.

  • NPR techniques are being used for image and video stylization, and in a few cases, medical image processing also has been carried out using this technique. If images contain noise, low contrast, then difficulty in segmenting the edges arises. Under this circumstance, NPR technique stands as the most effective technique for segmenting the dominant features, which, otherwise, could not be effectively carried out using the conventional approaches.

  • NPR image abstraction has invaded the pose estimation technique (thrust area in the field of image processing), image defocusing, and text extraction from embossing graphics which was not possible effectively and accurately by the traditional approaches.

  • Optimized NPR image abstraction and stylization methodology are most useful applications in the field of clinical psychology, psychophysics, and neuroscience. From the survey conducted by Ramachandra and his associates, the role of importance and effectiveness of an image abstraction is strongly emphasized in the field of children’s education as well as interactive technical education. During the survey, they chose the children in the age group between 6 and 15 years, and displayed the real apple image and abstracted apple image on the wall. According to their findings, 93% of the children were attracted towards abstracted and stylized image display.

  • This survey may be viewed as a preamble and lot of other issues like stroke-based technique, image analogy, region-based technique, text extraction from embossing graphics, structure-preserving technique, and defocusing technique etc., need to be cohered to synthesize and make this approach more effective. Furthermore, the role of NPR image abstraction technique could be felt in the film industry for the purpose of style transfer, creation of augment reality, and virtual reality.

  • Image abstraction and stylization are very well suited in primary education and judicial sector applications. For example, in developed countries like America, exchange of court conversation is done through image abstraction and stylization. With respect to education sector, as emulated by Gooch in his work titled “Human facial illustrations: creation and psychophysical evaluation” [16], image abstraction technique reduces the demonstration complexity and facilitate easy conveyance of educational information.

  • Abstraction and stylization technique are useful to monitor the government development schemes like updating the progress of construction work, where abstraction and stylization are utmost felt when specific localized boundary could be effectively conveyed without broadband Internet facility. Similarly this technique is being used in representation of tourism sign boards and signaling the transport congestion. With the advent of NPR image abstraction, architectural visualization is being done using this technique. Draftsmen deliver their composed draft on the status of work completion through Internet to customers. Considerable saving in Internet bandwidth is achieved, and due to the effective transmission of required details pertaining to the work, progress is conveyed in a concise way due to the compression which reduces image size from MB to KB.

  • At last, the adequacy and expressiveness of non-photorealistic picture reflection portrayals depends especially on the context in which they are proposed and are to be passed on. Medical faculties can use NPR techniques for effective planning of surgery and training thereon as emulated by the work of “Combining silhouettes, shading, and volume rendering for surgery education and planning” and “Reducing effective responses to surgical images through color manipulation and stylization” [161, 162].

However, certain applications require non-NPR strategies which also may be thought of in conjunction with the NPR technique for effective results.

5 Challenges and future work in NPR research

  • From the literature survey carried out, it is evident that, till today, there is only one standard data set in NPR research domain [150, 151] and no effective structure-preserving NPR image abstraction algorithms is available. No generalized approach or techniques for image abstraction and further residual applications are also being felt. After applied NPR abstraction algorithm for beautification or stylization, purpose standards have not been clearly defined for visually gratifying output. From the literature survey, it may be inferred that visualization process has not been clearly understood.

  • From the survey, we can resolve that NPR research domain such as structure and texture preserving, pose estimation, style transferring, and image defocusing could be undertaken using NPR approaches. However, guidelines for generalized parameter setting in terms of kernel, standard deviation, smoothening factor, tone value, etc., need to be looked into, before the NPR approach could be considered for the above-mentioned domains. Yet another possibility is applying NPR technique as a preprocessing step for applying CBIR algorithm for image retrieval, text extraction from embossing graphics and image defocusing. Image abstraction may preserve the relevant information and remove the unwanted details and thus may be used as a better technique compared to the conventional approaches.

  • Survey also brings to light the need for the development of an automatic structure-preserving frame work in the development of image abstraction, artistic stylization, line drawing, and handling of high/low dynamic range images.

  • With respect to the above felt need, no perfect frame work or pipeline is available till today to preserve the dominant structures and scene enhancement. It is noteworthy that manipulation and preservation of global structures without affecting low/high-level features, indeed, are a big challenge in NPR research.

  • In future, NPR image abstraction techniques may become inevitable need in educational research to motivate and to improve the zeal for learning. In this direction, although a few attempts have been done by Ramachandra and associates, there is enough scope for further work on this subject.

  • Effective style transferring over an image in film industry is yet another application which could be considered, and in this perspective, effective algorithm development is very much desired.

  • Many challenges do exist in image processing domain such as stereo-matching, sign language interpretation, image defocusing medicinal leaves classification, text extraction from embossing graphics, agricultural crops’ grading, and adopting, NPR techniques may be considered in pre- and post-processing requirements for these tasks.

  • NPR techniques potentially can reach new horizons if attempts are made to assess people mind set during the shopping sessions at malls.

6 Conclusion

From the survey, we infer that no significant work has been carried out to look into the evolution of NPR. We have endeavored to extract the significant work pertaining to the literature on NPR from 1963 to 2017. This survey work is a result of extracting about 1000 papers related to NPR image abstraction and stylization technique and subsequently zeroing on 200 papers, which pertains to NPR work on technology, algorithms, designing, and parameter setting. Furthermore, adoption of NPR when exploiting the distinct advantages, this technique offers together with the challenges ahead of its effective implementation is highlighted.

Chronology hierarchy from 1963 to 2017 is presented lucidly in this paper via significant illustrative pictorials examples. An attempt has been made to present detailed nomenclatures of NPR techniques. During the literature survey, it has been observed that recent articles on neural network/deep learning mainly included image abstraction, image style transferring, video stylization, and video style transferring for animated movies.

Finally, benchmark guidelines are recommended for the preparation of data set. Benchmark is deduced from 32 images taken as samples with various subject matters and image properties were taken from David Mould publications [50, 51] and developed the MatLab code for calculating the image properties like colorfulness, complexity, sharpness, contrast, edge strength, noise, standard deviations, and mean. The chosen image properties are mentioned in Table 1. Adopting analysis of variance (ANOVA) test like PSNR, SSIM, FoM, and MSE for image quality assessment and TUKEY test for the purpose of one-to-one algorithm quality assessment is recommended. The applications of NPR are very effectively presented in this paper, and challenges and research gaps are clearly addressed.