Keywords

1 Introduction

1.1 Overview

This paper substantially covers contributions made by the author (PhD thesis, 2011 at the INRIA Nancy Grand Est, Université de Lorraine [52]). Starting with the lineal and/or obvious isolated symbol, the paper reaches the research standpoints that were contributed by focussing on Fresh FP-6 Strep European project. Therefore, this view may not cover the whole literature, we have in graphics recognition community. At this moment, a thorough review can be found in [37, 63, 64].

1.2 What Are Graphical Symbols?

Graphical symbols are referred to as visual images or designs, interpreting information about the context. They are generally 2D-graphical shapes, including their composition in the highest level of conceptual information. Overall, it plays a crucial role in a variety of applications such as automatic interpretation and recognition of circuit diagrams [22, 44], engineering drawings and architectural drawings [16, 20, 35, 79], line drawings [81], musical notations [46], maps [51], mathematical expressions [6], and optical characters [28, 59, 60, 84]. To avoid possible confusions, in this paper, other works – in the framework of graphics recognition – such as logo detection/spotting and music scores are not taken in this study.

1.3 Organization of the Paper

The remainder of the paper is organized as follows. Section 2 deals with the position of graphical symbol recognition in framework of document processing. It also includes regular contests (primarily organized together with the international workshops) and the real-world issues whether they have covered. Section 3 reviews research standpoints by categorizing them into three different approaches: statistical, structural and syntactic, and personal views based on more than a decade of experience. It also includes possible use of hybrid approaches (taking two or more approaches to develop) for graphics recognition. Section 4 concludes the paper by highlighting remarks.

2 Document Image Processing (DIP)

2.1 Graphical Symbol Recognition: Where Does It Lie in DIP?

Document analysis or processing is mainly related to texts and graphics. It includes text and or graphics separation, localization and recognition [24]. According to [42], document analysis is related to document image analysis (DIA) since both research works have been concerned with document image interpretation. In a similar manner, Kasturi et al. [31] categorize document image analysis into two domains:

  1. (1)

    textual processing and

  2. (2)

    graphics processing.

In both articles [31, 42], the basic tasks are image segmentation, layout understanding and graphics recognition. Graphical symbol recognition, in particular, has a long history since the 70’s, and it is considered as the core part of graphical document image analysis and recognition systems. In 1998 [70, 71], a prominent researcher has made a statement: ‘none of these methods works’ in general. Since then, it has been actively extended [17, 36, 71, 73, 74]. Very recently, the importance and the usage of graphics recognition have been reported [17, 37, 63, 64].

Graphics are often combined with text, illustration, and color. Therefore, in document image processing, graphical symbols, for instance, convey crucial cues about the context in comparison to texts. Beside generic approaches, text recognition is distinct from symbol recognition, even though their boundaries are not obvious. Thus, their solutions complement each other [7, 24].

The recognition of graphical symbols or any meaningful shapes has been the subject of numerous reviewed research articles [7, 18, 29, 30, 35, 38, 40]. Most of these proposed systems are roughly described using the following two major units: (1) data acquisition and preprocessing; and (2) data representation and recognition. The techniques used in data acquisition and preprocessing vary since they are problem dependent. Text/graphics separation aims at segmenting document into two layers so that one can focus on regions-of-interest, where graphical symbols lie. The usefulness of text/graphics separation can be found in [67]. Graphical symbols are represented either in the form of feature vectors by estimating the overall shape or in more structured forms (i.e., graphs) by using meaningful primitives that are extracted from the whole symbol. Again, primitive selection tools are application dependent. As a consequence, matching techniques follow the way we represent symbols, to be used in the decision process. In general, a good data representation is assumed to be compact and discriminant, and minimizes the intra-class distance and maximizes the inter-class distance [36]. Existing approaches, specifically those based on feature based matching, can mainly be split into three different categories: statistical, structural and syntactic (see Sect. 3).

Fig. 1.
figure 1

Graphical symbols. An example illustrating lineal and fully isolated symbols [26].

Fig. 2.
figure 2

Several different graphical symbols, appearing in the floor plans [15]. An example of how one can go for symbol spotting.

Fig. 3.
figure 3

An example of (a) a query and (b–d) graphical symbol/element spotting. This also illustrates the complexity of the FRESH dataset [52, 76], starting with an isolated graphical symbol (moving from left to right). This example shows how the known and significant part can be spotted based on the applied query.

2.2 Do Regular Contests Hit Real-World Issues?

This section aims to include how far regular contests cover real-world issues (or problems). Since 1915, international association of pattern recognition (IAPR) sponsored graphics recognition (GREC) workshops, supported by technical committee 10 (TC-10: http://iapr-tc10.univ-lr.fr/) have been organized together with several contests: graphical symbol recognition, retrieval and spotting. The primary objective of the GREC contests is to evaluate the state-of-the-art of graphics recognition techniques and to generate performance evaluation tools and datasets for future research [13,14,15, 48]. Figure 1 shows a few model symbols [26, 78]. Beside several other contests, in recent years, researchers figured out the significance of ‘end-to-end document analysis benchmarking’ and ‘open resource sharing repository’ to advance as well as to facilitate fair comparison [33, 34].

Considering a real-world problem, symbol recognition is not straightforward as shown in Fig. 1. In general, common applications are recognition and localization (in some cases, we call it - spotting) of graphical symbols in electronic documents, in architectural floor plans (see Fig. 2), wiring diagrams and network drawings [15, 36, 52]. More specifically, in this paper, a challenging problem has been addressed (see Fig. 3), where the dataset is composed of a variety of symbols such as lineal, complex and composite. These samples (entitled as FRESH dataset) are taken from [76]. The symbols may look either very similar in shape – and only differ by slight details – or completely different from a visual point of view [25, 50, 75]. Symbols may also be composed of other known and significant symbols and need not necessarily be connected. For such complex and composite symbols, as before, an isolated query symbol is applied not only to recognize similar symbols from the dataset, but also to detect known and significant parts (graphical elements) which are associated with the query symbol. Therefore, we are not just limited to symbol recognition problem. We also need to spot the visual elements (i.e., meaningful parts or regions). Besides, it is expected to see how two different symbols are similar to what extent. In the literature, the latter issue has been considered as the most challenging problem. On the whole, we refer to this task as either the parts or the whole graphical symbol recognition [15, 36, 39, 43, 49, 52]. Such a problem requires strong knowledge about how we can represent graphical symbols and recognize, detect or spot them.

3 Research Standpoints

In general, the whole symbol recognition process is based on either

  1. (1)

    matching features between a query and dataset symbols or

  2. (2)

    comparing decomposed parts (or meaningful regions/primitives) such as lines and arcs as well as the relations between them.

These are commonly categorized into three different approaches: statistical, structural and syntactic. In most of the cases, methods are particularly suited for isolated line symbols, not for composite symbols connected to a complex environment [10, 11, 35, 36].

3.1 Statistical Approaches

The techniques used in statistical approaches fairly computes the differences between two feature vectors [10, 41]. An overview of the performance of the most commonly used shape descriptors (refer to the statistical approaches) for symbol representation is provided in [80]. Global shape representation is widely used because of its implementation simplicity since it does not require extra pre-processing and segmentation, in contrast to local pattern representation [1, 85]. For more details about shape and symbol recognition, we refer to the works presented in [10, 11, 69], where usefulness of the shape descriptors for document analysis and a collection of techniques employed for graphical symbols recognition have been reported. Most methods are particularly suited for isolated line symbols, not for composed symbols connected to a complex environment [11, 36]. In statistical approaches, global signal-based descriptors [4, 32, 66, 84,85,86] are usually quite fault tolerant to image distortions, since they tend to filter out small change in details.

For a thorough shape analysis, in [82], authors computed a histogram for every pixel to figure out the distribution of constraints among the other pixels. These histograms are then statistically integrated to form a feature vector. In [87], authors proposed a similarity assessment of graphical symbols based on Kullback-Leibler divergence, where symbols are represented as 2D kernel densities. In a similar way, in [21], the authors deal with the changes in appearance (i.e., shape) from which these types of symbols differ. In [3], authors describe another framework to learn a model of shape variability in a set of patterns. Further, the Radon transform (RT) has also been widely used to globally describe the shape of any pattern [12, 66]. Motivated by this, the histogram of the RT has been used instead of compressing them (i.e., profiles) into a single vector [65], assuming that the studied patterns are equal in size. We remind the readers that the RTs are essentially a set of parametrized histograms. Therefore, in contrast to [65], in [54, 57], authors used dynamic time warping (DTW) to align every histogram for each projecting angle to absorb varying histogram sizes resulting from image signal variations. In a recent PhD thesis [27], author developed the bridge between the literature of sparse representation and the visual vocabulary construction by apply the learned dictionary algorithm for learning a visual vocabulary based on local descriptors of symbols. These technique are, unfortunately, inappropriate in case where symbols are composed of other known and significant symbols (need not necessarily be connected) as well as texts (see Fig. 3).

3.2 Structural Approaches and Possible Integration with Statistical Features

Structural approaches are particularly used when symbols are decomposed into several meaningful graphical elements [20, 47, 52], for instance. Their interpretations, however, depend on the studied application as well as on their specific local context. Therefore, primitive extraction and symbol recognition, on the whole, are the key steps toward understanding and interpreting content within a document. They mainly include embedded graph based classification problems like attributed relational graphs (ARG) [39], region adjacency graphs (RAG) [35] and constraint networks [2], in general. Structural techniques are able to handle all types of symbols (e.g., isolated, composite), and have a powerful relational representation. However, they suffer from intense computational complexity due to the general NP-hard problem of sub-graph matching resulting from the variation of graph structure with the level of noise, occlusion, distortion etc.

In case of composite documents (wiring diagram, a real-world industrial problem, see Fig. 3) that contain textual and graphical elements, one needs to be able to extract and formalize the links that exist between the images and the surrounding text, in order to exploit the information embedded in those documents. Therefore, correct extraction, representation of both visual data, textual and graphical structures, and organization are the first steps towards further automated knowledge, information discovery and information retrieval or data mining on more complex data than just text. Within this context, we primarily focus on three main items:

  1. (1)

    the extraction of visual elements (vocabulary) that compose an image [47];

  2. (2)

    the expression of visual relations between the elements;

  3. (3)

    knowledge discovery, formal learning techniques and classification using the vocabulary and relations mentioned above, including vocabulary shape analysis.

Having the aforementioned framework, we have tested the use spatial relational signatures between the possible pairs of labeled vocabulary types such as circles and corners. These are basically used as a basis for building an attributed relational graph (ARG) that fully describes the symbol [56, 61]. Thanks to our labeling of attribute types, corresponding relation alignments are possible between the two graphs while avoiding the general NP-hard graph matching problem. Further, very recently, we have introduced a new concept named bag-of-relations (BoRs) for symbol recognition, retrieval and spotting [62]. The key characteristic of the technique is to use topological relation information to categorize them in terms of bags and to guide directional relations. The method has been extended to a variety of datasets (symbols) in the domain. Further, usefulness of the method for symbol spotting and for user-friendly symbol retrieval applications has been attested.

Again, recognizing isolated symbols does not solve the real-world problem (wiring diagram, for instance), since symbols may appear either very similar in shape – and only differ by slight details – or completely different from a visual point of view [25, 50, 75]. In such a case, statistical signatures using shape descriptors, for instance, do not perform well because they take global appearance into account, and on the other side, usability of structural approaches may be limited due to intense computational complexity. In such a context, addressing the interest of integrating two different worlds would be a new scope of the work. Considering both approaches: structural and statistical, we have addressed the use of their best possible efficient combinations [52, 58], which has been highlighted in the GREC-2010 workshop [74]:

‘... the recurring wish for methods capable of efficiently combining structural and statistical methods’ and ‘the very structural and spatial nature of the information we work with makes structural methods quite natural in the community.’

In [56], the method is primarily based on the spatio-structural description of visual ‘vocabulary’. But, it lacks the information about their shape features. Therefore, keeping the ARG based symbol description as reported in [56], shape signatures are integrated in two different ways to improve the performance. First, shape signatures are for labeling vertices [55]. Second, shape features are applied only to the vocabulary which show significant shape variations, and are grouped them via unsupervised clustering [58]. In both cases, major set of well-known state-of-the-shape descriptors are integrated with spatial relations. Overall, we bring an attention to the use of a hybrid approach in symbol recognition, and try to avoid the shortcomings of each of them: structural and statistical.

The problem can further be extended to symbol spotting, but one can view this as a kind of retrieval [15, 45, 49, 62, 68], which is guided by user queries. Additionally, the recognition/retrieval process can be made with the help of local descriptors like scale invariant feature transform (SIFT) and with the use of techniques like bag-of-features so that either primitive or region extraction (segmentation) can be avoided. The question always remains the same, ‘what approach does what (i.e., performance) in which context?’.

3.3 Syntactic Approaches

The techniques based on graph grammar will be more suitable to search symbols in technical documents where information is close to a feature vector description that follows composition rules of primitives [8, 9, 19, 23, 53, 77, 83].

In the earlier framework (see Sect. 3.2 in second paragraph), very interestingly, we have presented the use of formal learning techniques to automatically learn non-trivial descriptions of symbols [53]. It means that we have transformed the vocabulary sets and their possible relations that exist between them into a first-order logic (FOL) description for a complete image. This representation is then used as an input to an inductive logic programming (ILP) solver, in order to deduce non obvious characteristics that may lead to a more semantic related recognition process. Considering the experience we have so far, the idea is appropriate for classifying a set of symbols (images) characterizing common behavior with respect to another set of symbols (images) or counter samples. But by definition, even though the reported concept in [53] can be extended to any image synthesis problem (for different application domain). It, however, is challenging to transform statistical signatures (quantified values) to the closer semantics.

4 Remarks

No doubt, graphics recognition has an extremely rich state-of-the-art literature in symbol recognition and localization [5, 11, 17, 36]. They are limited to solve specific problems, motivated by the limited posed by industrial partners. This means that major state-of-the-art methods for symbol recognition, do not conclude on the existence of a set of generic methods that can yield the best results, even though they are easy to implement (with fewer parameters) and are reproducible. This has been seen on three different approaches: statistical, structural and syntactic. Besides, very similar statement has been reported in [74], where the author pointed out: ‘which features distinguish graphics recognition from general pattern recognition problems?’. Therefore, there still exists growing interest in graphics recognition domain.