2.1 Graphical Symbols

Visual cues and/or designs that are interpreting information about specific contexts refer to graphical symbols. In general, they are two-dimensional shapes (in terms of geometry) in addition to their composition in the highest contextual level of information. It is required to have automatic graphics interpretation and recognition as it happens to be in a variety of applications, such as

  1. (a)

    engineering drawings and architectural drawings  [1,2,3,4,5,6,7],

  2. (b)

    electrical circuit diagrams  [8,9,10,11,12,13,14,15,16,17],

  3. (c)

    line drawings  [18,19,20,21],

  4. (d)

    musical notations  [22, 23],

  5. (e)

    maps (historical) and road signs  [24,25,26,27,28,29,30],

  6. (f)

    mathematical expressions  [31],

  7. (g)

    logos  [32,33,34], and

  8. (h)

    optical characters that are rich in graphics [35,36,37,38,39,40].

This book will not consider all topics (mentioned above) even though they fall under the graphics recognition framework. The book will be more focused on those graphical symbols used in electrical circuit diagrams, engineering and architectural drawings, and line drawings regardless of their versions: handwritten or machine-printed.

Following Chap. 1, graphics recognition has been one of the intensive research topics since the 70s in the pattern recognition (PR) and document image analysis (DIA) community [41,42,43,44]. In 1998, the following statement: “none of these methods works in general” influenced researches: what we have done so far and what/where we have/are now? [45, 46]. The statement helped move further [46, 50]. Further, the usefulness of graphics recognition has been reported in the year 2015 [50] and survey has been made in the same year [16].

2.2 Basics to Graphics Recognition

Not a surprise, graphics are combined with texts in addition to colors. This means that graphics provide more information, i.e., a picture speaks thousands of words. If we do not consider a few generic techniques that are under the DIA framework, text recognition can be taken as different side of the DIA work with respect to graphical symbol recognition. However, their boundary is not straightforward or separable. More often, researchers observed that their solutions complement each other [41, 44, 51]. Therefore, needless to mention, text analysis in graphics requires special attention [35]. To understand the importance of graphics recognition, one should be able to understand that the graphical symbol recognition (or any meaningful shapes/parts/regions) has been the subject of several different projects (as mentioned in Sect. 2.1) [2, 51,52,53,54,55,56]. Generally speaking, these proposed approaches are roughly categorized into the following:

  1. (a)

    data acquisition,

  2. (b)

    data preprocessing, and

  3. (c)

    data representation/description and recognition/classification.

The first two items: data acquisition and preprocessing techniques—which can be considered as a unit, in a broad sense—are application dependent. In some cases, where data are clean, preprocessing may not be required. Text/graphics separation refers to document image segmentation [57]; and they basically decompose document image into two layers so that one can consider the layer, where graphics lie. More detailed study on text/graphics separation can be found in [58]. In the framework of data description, graphical symbols are described either in terms of a set of numbers, i.e., feature vector by taking into account the overall shape (statistical data representation) or in terms of structured forms (graph representation) by taking visual cues/words that compose whole graphical symbol. Besides, the rule-based representation can describe the overall shape of the pattern. In both cases, visual cues/words are found to be application dependent. In the decision process, matching techniques often follow the way how graphical symbols are represented. In general, data description (or representation) is said to be good if it can maximize the interclass distance and minimize the intraclass distance [47]. The term good refers to how compact the feature vector is and how well two feature vectors can be discriminant. Existing approaches, broadly speaking, can be divided into three different categories: (i) statistical, (ii) structural, and (iii) syntactic. These categories are assumed to be based on feature-based matching concept. Before proceeding to upcoming chapters, it is found that neither of the techniques alone can help achieve expected performance. This means that, in the literature, we have observed a common trend, where authors combine different techniques from different categories: statistical, structural, and syntactic. Integrating/combining them (statistical and structural, for instance) aims at taking advantage of both techniques [11, 12, 15,16,17]. Meaning, it is worth to integrate if they compliment each other and satisfy the utility functions that can reach the goal. More detailed information can be found in Chap. 3.

2.3 Contests and Real-World Challenges in Graphics Recognition

In Chap. 1, an importance of graphics processing has been outlined in the framework DIA. Considering the same, this section aims to include graphics recognition contests and check whether they have been addressing real-world projects. Since 1995, the international association of pattern recognition (IAPR) sponsored graphics recognition (GREC) workshops, supported by technical committee 10 (TC-10: http://iapr-tc10.univ-lr.fr/) organized several contests in the framework of graphics recognition. The contests are not limited to graphical symbol recognition, retrieval, and spotting; they also came up with several other contests, such as arc and line segmentations.

While considering all contests, the observation can be summarized as follows. In brief, the primary objectives of the GREC contests are to evaluate the state of the art of graphics recognition techniques (plus other related works), to generate performance evaluation tools, techniques, and to provide datasets for future extensions [5, 59,60,61]. The contests do not just provide summary of results from the participated institutions/researchers but also provide datasets and guide for evaluating their tools, i.e., a comprehensive protocol.

In the following, the list of contests can be enumerated as follows:

  1. (a)

    GREC’13: Arc and line segmentation contest [64]

    Since geometric primitives, such as line and arc (see Fig. 2.1) helps in automatic conversion of line drawing document images into electronic form, their recognition and/or detection is important. As mentioned in the title, two challenges were proposed: arc segmentation and line segmentation. For these contests, engineering drawings (for arc segmentation challenge) and cadastral maps (for line segmentation challenge) were used. The reported highest possible segmentation accuracies were 54.10 and 66% for arc and line, respectively.

  2. (b)

    GREC’11: Arc segmentation contest: performance evaluation on multi-resolution scanned documents [65]

    The sixth edition of the arc segmentation contest was to work on document images with different scanning resolutions. In this contest, altogether nine document images were scanned with three resolutions each and the ground truth images were provided (annotated by the experts). It was observed that the tool that has vectorization techniques/algorithms produced better results on scanned images even with low resolution.

  3. (c)

    GREC’11: Symbol recognition and spotting contest [66]

    This contest followed the series started since the GREC’03 workshop (see item J, below). Unlike the previous ones, it also included symbol spotting problem in addition to the isolated symbol recognition.

  4. (d)

    GRECC’09: Arc segmentation contest: performance evaluation on old documents [67]

    This was focused on empirical performance evaluation of raster-to-vector algorithms in the area of graphics recognition. For the contest, old document images were used, where a few commercial software were participated. This helped us check whether automatic vectorization methods (prototypes) reached the maturity as if they could be taken as a commercial software.

  5. (e)

    GREC’07: Third contest on symbol recognition  [68]

    This contest followed the series started since the GREC’03 workshop (see item J, below). The main different between two contests is changes in test data.

  6. (f)

    GREC’07: Arc segmentation contest [69]

    As expected, the idea was to check/compare different state-of-the-art systems: arc segmentation. Four algorithms were tested.

  7. (g)

    GREC’05: Arc segmentation contest DBLP:conf/grec/Wenyin05

    In the sixth series of graphics recognition workshop organized by IAPR TC10, this was the third arc segmentation contest, where three tools were participated. In addition, second evaluation of the RANVEC and the arc segmentation contest was reported [70]. In the latter case, important facts are recalled and provided detailed information about changes made on the system compared to GREC‘01.

  8. (h)

    GREC’05: Symbol recognition contest [71]

    This was the second symbol recognition contest, and organizers brought general principles of both contests: GREC’03 and GREC’05.

  9. (i)

    GREC’03: Arc segmentation contest [72]

    In the fifth series of graphics recognition workshop organized by IAPR TC10, the arc segmentation contest provided rules, performance metrics and data.

  10. (j)

    GREC’03: Symbol recognition contest [63]

    This was the first international symbol recognition contest, where organizers described the framework of the contest: goals, symbol types and evaluation protocol. As mentioned in their report, the idea was to make participants ready for the upcoming contest. Organizers provided the way they have built the database and the methods they used to add noise. This helped researchers evaluate the robustness of their methods/algorithms.

  11. (k)

    GREC’01: Arc segmentation contest [73,74,75]

    As the fourth in the series of graphics recognition contests organized by IAPR TC10, the first arc segmentation contest was held in association with the GREC’01 workshop. In addition to general rules, organizers provided arcs and circles in engineering drawings and other scanned images containing line-work for the test. We find that the tool that has an algorithm to vectorize binary images smooths the vectors to a sequence of small straight-like lines received better results. We note that engineering drawings were mostly used.

  12. (l)

    GREC’97: International graphics recognition contest—raster-to-vector conversion [76, 77]

    It is important to note that vectorization techniques can help boost the performance of the further processes, such as arc segmentation. Based on the experience, GREC team started with the idea of raster-to-vector conversion in the second series of graphics recognition workshop.

    Further, they have defined a computational protocol to evaluate performance for systems that convert raster data to vector. In this contest, continuous and dashed lines, arcs, and circles and text regions were considered as the graphical entities.

  13. (m)

    GREC’95: Dashed line detection [78,79,80]

    The first graphics recognition contest was dashed line detection, where test image generator created random line patterns with a few constraints.

    At this point, it is important to note that visual cues, such as dashed line, are essential for high-level technical drawing understanding if we are able to detect/segment them. The idea was to automatically segment them since machine vision is required for a large amount of data. As a consequence, the contest was about automatic detection of dashed lines on test drawings at three difficulty levels: simple, medium, and complex. They basically have dashed and dash-dotted lines in straight and curved shapes, including interwoven texts.

In the year 2007 (GREC’07), Prof. Tombre highlighted an important issue that whether graphics recognition is an unidentified scientific object [81]. In this discussion, he has clearly mentioned the fact as follows. Since the day when Prof. Kasturi gave a new start to a technical committee of the IAPR, namely, TC10 on line drawing interpretation, researchers have focused on graphics-rich documents and more specific issues, such as raster-to-graphics conversion, text/graphics separation and symbol recognition/localization. To emphasize new focus, TC10 was titled as the technical committee of graphics recognition. Meaning, GREC started since then with a series of LNCS volumes.Footnote 1 No doubt that graphics recognition contests provide a clear benchmark for researchers and help proceed in reference to what has been done in the past.

Researchers do not really see any doubt on the growing interest/importance of the field: graphics recognition. A few specialized areas, such as telephone and power companies that hold huge numbers of drawings with the same syntax/format and/or appearance are interesting applications. Automatic data conversion helps develop processing tool cost-effective, since these data are rich graphics and graphical symbol as a query is possible. In other words, it is required to convert paper documents that contain graphics into electronic formats, which is becoming more and more useful in a variety of applications.

Besides, in recent years, we have observed the significance of “end-to-end document analysis benchmarking” and “open resource sharing repository” to advance as well as to facilitate fair comparison [82, 83]. More information can be gathered from the project called “Document Analysis and Exploitation” (DAE) .Footnote 2

Back to the real-world problems, symbol recognition is not straightforward as shown in Fig. 2.6. In general, common problems are recognition and localization (more often, we call it spotting) of graphical symbols in electronic documents, in architectural floor plans (see Figs. 2.2 and 2.3), wiring diagrams and network drawings (see Figs. 2.4 and 2.5) [5, 12, 47, 66].

Fig. 2.1
figure 1

A few test images from GREC’11: arc segmentation contest [65]

Fig. 2.2
figure 2

A few test images from GREC’11: symbol segmentation contest [66]

Fig. 2.3
figure 3

An example graphical symbol spotting/localization in the architectural floor plan [5, 66]

Fig. 2.4
figure 4

A few test images from GREC’11: symbol segmentation contest (electrical symbols) [66]

Fig. 2.5
figure 5

Few test images (electrical circuit diagram): GREC’11: symbol segmentation contest [66]. An interesting problem to see how one can go for symbol spotting/localization

Fig. 2.6
figure 6

GREC’03: illustrating lineal and fully isolated graphical symbols [62]

Beside the lineal and fully isolated graphical symbol recognition (see Fig. 2.6), in this book, a new challenging problem will be highlighted (see Fig. 2.7), where the dataset is composed of a variety of symbols, such as linear (fully isolated), complex, and composite (with texts in it). Note that the characteristics of the problem are not different than what have been addressed in a series of graphics recognition contests/workshops. Primarily, the difference lies in the dataset. These samples (called by the name FRESH dataset) are taken from the book [84]. Two different symbols from different classes look very similar in shape (with slight changes) [12, 85,86,87]. Graphical cues and/or texts can also be present. They do not always connect with the graphical symbols we are looking for; they can also be isolated in the same image. For such a case, an isolated graphical symbol (or known part of it) can be applied for two different reasons: (i) to recognize similar symbols; and (ii) to detect known and meaningful parts/regions [17]. Detecting meaningful parts/regions with respect to the applied query symbol refers to symbol spotting. Therefore, not to be confused, we are not just limited to symbol recognition problem. We are also required to spot the meaningful parts/regions that can convey contextual information about the graphical documents. Further, it is always interesting to check the similarity between two different symbols that are taken from different contexts. The latter issue is taken as one of the open challenging issues in the literature. On the whole, the task has been referred to as either the parts/regions or the complete symbol recognition [5, 12, 47, 88,89,90]. A priori knowledge about graphical symbol can help decide the techniques for data representation and recognition.

Fig. 2.7
figure 7

An example of a a query and be graphical symbol or meaningful parts/regions spotting. Further, it also illustrates the complexity of the dataset [12, 84]. Graphical elements in the red box the detected regions in accordance with what has been applied as a query

2.4 Graphical Symbol Recognition, Retrieval, and Spotting

Under the scope of pattern recognition, symbol recognition is a particular application, where test input patterns are classified as one of many classes that are predefined symbol types (ground truths) in the particular application domain. Graphical symbols do not necessarily be a complete symbol as shown in Figs. 2.2 and 2.4. It can be other visual cues or visual primitives, such as arc, lines, and circle that can be used to interpret complete document images. In a broad sense, in reference to [88], symbols can be defined as the graphical entities which hold a semantic meaning in any specific domain, where logos, silhouettes, musical notes, and simple line segment groups with an engineering, electronics, or architectural flair constitute are some examples of symbols that have been investigated recently by the graphics recognition community (see previous Sect. 2.1). Extracting/retrieving similar documents, based on visual cues (graphical primitives) can be considered as graphical symbol retrieval. This, of course, requires a clear knowledge of symbol spotting.

In what follows, the brief research standpoints on graphics recognition are summarized. More detailed information can be found in [16, 17].

2.5 Research Stand Points: A Quick Overview

Before we move to Chap. 3, generally speaking, the whole graphical symbol recognition process is based on either

  1. (a)

    alignment of features between a query and template symbols, i.e., computing distance between two feature vectors; or

  2. (b)

    comparing decomposed parts, i.e., meaningful visual cues/words, such as lines, arcs, and circles, and the relations (spatial relations) between them.

These are commonly described within the framework of statistical and structural approaches, respectively. A quick overview can be found in the previous work [17]. In statistical approach, shape descriptors are widely used. A quick overview the most commonly used shape descriptors for graphical symbol recognition is provided in [91]. On the other hand, structural approaches allow low-level primitives or visual cues analysis so that recognizing graphical symbols and/or localizing known visual parts are possible. Not to be confused, ROIs refer to meaningful parts. Like in other domain, the concept is in the scope of regions-of-interest (ROI) analysis and labeling. This means that one can take a graphical symbol as a set of visual cues or meaningful parts, such as arcs, lines, triangles and rectangles [3, 12, 92]. The set also includes higher level visual cues like loops. Their interpretations, however, depend on the dataset and the context. The context can be either local or global. Therefore, visual cues in graphical symbol recognition, on the whole, can be considered as one of the key steps toward document image understanding and content interpretation. Considering both approaches into account, we have observed the use of their best possible combination [12, 15]. For this, a clear statement can be taken from the GREC’10 [49] and a part of it is outlined as follows:

... the recurring wish for methods capable of efficiently combining structural and statistical methods’ and ‘the very structural and spatial nature of the information we work with makes structural methods quite natural in the community.

An extension, i.e., symbol spotting is possible, but one can view this as a kind of graphical symbol retrieval problem [5, 14, 88, 93, 94] that is basically user guided. Additionally, using the local descriptors like scale invariant feature transform (SIFT) and other techniques like bag-of-features (BOFs), recognition/retrieval process can be accomplished. In both cases, it is possible to avoid segmentation process, i.e., primitive and/or region extraction. The questions, such as “what technique does how much/well in which context?” has not been well answered yet.

Fig. 2.8
figure 8

Handwritten electrical circuit diagram

No doubt (see Sect. 2.3), graphics recognition has a rich literature with several different techniques [47, 50, 95, 96]. More often, symbol recognition methods are not generic enough to be used for different purposes and/or datasets. However, these methods not require a large set of parameters, and sometimes, they are parameter-free, i.e., easy to implement. This means that methods are data dependent. Another reason could be the restriction posed by the industrial needs. Industrial projects require automated systems with higher accuracy so that the cost of human intervention can be reduced. This will ensure its effectiveness as well. As a result, graphical symbol recognition techniques might be tuned into process data under several different circumstances. Industrial projects are related to information retrieval and/or document reverse engineering. Such projects require powerful computers (high-performance computing (HPC) machines in addition to huge storage capacity. Within this framework, scientific community provides serious attention in recognizing symbols in document images [96,97,98,99]. Note that the processed images are not necessarily be technical documents. For graphics recognition, it is required to have consistent advances in research so that scalability issue can be addressed. The scalable property can help reach the industrial needs and/or expectations. This also explains why well-known approaches were very specific and were guided by a priori knowledge. A priori knowledge can be either context or the source/complexity of the data. Both of them can be used as well. This will definitely help us move forward to other similar problems, such as digitization of the handwritten electrical circuit diagrams (see Fig. 2.8). Digitizing handwritten electrical circuit diagrams in accordance with the floor plan can help automate the full residence needs (depends on the regional variation, i.e., geography).

2.6 Summary

In this chapter, we have started with the conventional definition of graphical symbols, the location of graphics recognition in DIA and its major processing units, several international contests that are related to graphics recognition and their importance, and a quick overview of research standpoints (from the author’s perspective). On the whole, we have discussed the importance of graphics recognition in the DIA framework. Our next chapter will discuss graphics recognition systems and validation/evaluation protocols.