Keywords

1 Introduction

An automatic full transcriptions of historical handwritten documents is often negatively affected by both the degenerative conservation state of scanned documents and different writing styles. Thus, Keyword Spotting (KWS) as a more error-tolerant, flexible, and suitable approach has been proposed [1,2,3,4]. KWS refers to the task of retrieving any instance of a given query word in a document. This task is of high relevance due to a global trend towards digitalisation of paper-based archives and libraries. Similar to handwriting recognition, textual KWS can be divided into two different approaches online and offline KWS, respectively. The former has access to temporal information, while the latter is limited to spatial information only. The focus of this paper is on historical documents, and thus, offline KWS, referred to as KWS from now on, can be applied only.

KWS approaches can be divided into template-based or learning-based algorithms. Template-based matching algorithms such as for example Dynamic Time Warping (DTW) [2, 5, 6] directly match sample images of the keyword with document images. Learning-based algorithms [3, 4, 7], on the other hand, derive character or word models from learning samples. The latter typically achieve higher accuracies than template-based approaches but are limited by the need for a considerable amount of learning samples. Template-based approaches, in contrast, require only one or a few keyword instances and are thus more flexible. In this paper, we focus on template-based KWS using different graph representations of handwritten words.

Even though graphs gained noticeable attention in diverse applications [8, 9], we observe only limited attempts where graphs are used to represent handwriting for KWS [10,11,12,13,14]. This is particularly interesting as graphs are, in contrast with feature vectors, flexible enough to adapt their size to the size and complexity of the underlying handwriting. Moreover, graphs are capable to represent binary relationships in the handwriting (e.g. strokes between two keypoints). Overall, graphs offer a more natural and comprehensive way to represent handwritten characters or words when compared to feature vectors. Additionally, various procedure for efficiently evaluating the dissimilarity of graphs, commonly known as graph matching, have been proposed in the last decade [9].

Yet, in the case of searching n keywords in a certain document (represented by a set of graphs G), we need to match \(n\times |G|\) pairs of graphs. Even when a fast graph matching procedure is employed, this large amount of matchings can substantially slow down the complete KWS process. To speed up the KWS procedure the number of graph matchings actually carried out, can be reduced by efficiently filtering graphs from G with a low similarity to the current query graph q. This approach is known as fast rejection [3, 5, 7] and the focus of the present paper. That is, we introduce a novel heuristic for fast and accurate filtering of irrelevant document graphs given a certain query graph.

The remainder of this paper is organised as follows. In Sect. 2, the proposed fast rejection approach to speed up graph-based KWS is introduced. The datasets employed as well as the different graph representations are reviewed in Sect. 3. An experimental evaluation and comparison with the original framework is given in Sect. 4. Finally, Sect. 5 concludes the paper and outlines possible future research activities.

2 Fast Rejection of Document Graphs

Given a set of document graphs \(G=\{g_1,\ldots ,g_N\}\) as well as query graph q (used to represent a certain keyword), the process of KWS performs a matching of q with all graphs from G. We employ the Bipartite Graph Edit Distance (BP) [15], and thus observe cubic time complexity for these pairwise dissimilarity computations. The present paper introduces a fast rejection approach in order to substantially reduce the number of document graphs needed to be matched with q. The motivation is to filter document graphs without relevance to the given keyword and thus speeding up the KWS procedure without negatively affecting the retrieval accuracy.

The basic idea of our approach is as follows. Before actually carrying out the graph matching, we first measure the dissimilarity between histograms based on a specific segmentation of the graphs by means of a polar coordinate system. We denote this fast graph dissimilarity computation by Polar Graph Dissimilarity (PGD) from now on. If the PGD is below a certain threshold D for a pair of graphs \((q,g_i)\), we carry out the computationally more expensive BP matching procedure [13]. Otherwise, we define the distance between q and \(g_i\) to be \(\infty \). Formally,

$$\begin{aligned} d(q,g_i) {\left\{ \begin{array}{ll} \infty , &{} \text {if } PGD(q,g_i) > D \\ BP(q,g_i), &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$
(1)

where q and \(g_i\) denotes the query and document graph, respectively. Increasing the threshold D generally reduces the number of filtered document graphs. Likewise, the number of filtered graphs is increased when D is decreased. The overall aim is to find a good tradeoff between low matching time (due to many filterings) and high KWS accuracy.

Our novel dissimilarity model PGD has been inspired by the scale-invariant shape descriptor Contour Points Distribution Histogram (CPDH) for 2D-shape matching [16]. The basic idea behind this shape descriptor is to segment equidistant contour points by a specific polar coordinate system. A given shape image is formally described by a histogram \(\text {CPDH}=\{h_1,\ldots ,h_i,\ldots ,h_n\}\) where \(h_i\) basically consists of the number of contour points \(n_i\) in the corresponding segment.

We adopt this procedure in order to measure the dissimilarity between graphs in linear time. Rather than contour points, however, we make use of nodes as shown in Fig. 1. For all of our graphs that represent segmented words, nodes are labelled with two-dimensional numerical labels, while edges remain unlabelled (see Sect. 3 for details).

Fig. 1.
figure 1

Construction of the polar graph dissimilarity.

To create a histogram for a given graph g, we first calculate the centre of mass \((x_m,y_m)\) of g and then transform the (xy)-coordinates of each node label \(\mu (v)=(x,y) \in \mathbb {R}^2\) into polar coordinates (see Fig. 1a)Footnote 1. Formally,

$$\begin{aligned} \rho = \sqrt{(x-x_m)^2+(y-y_m)^2} \text { and } \theta _i = {{\mathrm{atan2}}}((y-y_m)/(x-x_m)), \end{aligned}$$

where \(\rho \) denotes the radius from the centre of g to the node position and \(-\pi \le \theta _i < \pi \) refers to the angle from the x-axis to the node position (computed via arctangent function with two arguments in order to return the correct quadrant). Next, we define a bounding circle C given by the maximum radius \(\rho _{\max }\) that surrounds all nodes of graph g. We segment C based on the number of different radii \(u_{\max }\) and angles \(v_{\max }\) into \(u_{\max } \times v_{\max }\) bins (in Fig. 1b \(u_{\max }=3\) and \(v_{\max }=8\) resulting in 24 bins). Every bin \(b_i\) is defined by two radii \(\rho _{i_{\min }}\) and \(\rho _{i_{\max }}\), and two angles \(\theta _{i_{\min }}\) and \(\theta _{i_{\max }}\), and thus every node \(v \in V\) with coordinates \((\rho ,\theta )\) can be assigned to the corresponding bin \(b_i\) with \(\rho _{i_{\min }} \le \rho < \rho _{i_{\max }}\) and \(\theta _{i_{\min }} \le \theta < \theta _{i_{\max }}\). Finally, we count the number of nodes of g in each bin and build a corresponding histogram \(H=\{h_1,\ldots ,h_n\}\) for graph g (see Fig. 1c). To measure the dissimilarity between two histograms \(H_1\) and \(H_2\), an arsenal of different distance measures have been proposed [17]. In the present paper, we make use of the \(\chi ^2\) distance.

We further refine the computation of our fast graph dissimilarity computation by implementing a recursive quadtree segmentation. The idea is formalised in Algorithm 1. First, the procedure is initialised by an external call with \(l=1\) (i.e. \(PGD(1,g_1,g_2)\)). On the basis of two graphs \(g_1\) and \(g_2\), the histograms \(H_1\) and \(H_2\) are created with respect to \(u_{\max }\) and \(v_{\max }\) (see line 2 of Algorithm 1)Footnote 2. Next, the \(\chi ^2\)-distance between the two histograms is measured (see line 2). If the current recursion level l is equal to the maximal recursion depth r, the distance is returned (see lines 4 and 5). Otherwise, both graphs \(g_1\) and \(g_2\) are segmented into four independent subgraphs. Each of these subgraphs represent the nodes and edges in one of the four quadrants in circle C (see line 6). Eventually, for each subgraph pair, the PGD is measured by means of a recursive function call (see line 7). This procedure is repeated until the current recursion level l is equal to the user-defined maximum depth r.

figure a

3 Handwriting Graphs

Our novel algorithm for fast rejection is evaluated in the context of KWS on two different manuscripts. First, the George Washington (GW) letters that are written in English and consist of twenty pages with a total of 4,894 words stemming from handwritten letters with only minor writing variations and signs of degradationFootnote 3. Second, the Parzival (PAR) manuscript that is written in Middle High German and consists of 45 pages with a total of 23,478 words stemming from handwritten letters with low writing variations but markable signs of degradationFootnote 4.

We extract graphs from segmented words of both documents by means of the following four graph extraction algorithms (originally presented in [14]).

  • Keypoint: The first graph extraction algorithm makes use of keypoints in the word images such as start, end, and junction points. These keypoints are represented as nodes that are labelled with the corresponding (xy)-coordinates. Between pairs of keypoints further intermediate points are converted to nodes and added to the graph in equidistant intervals. Finally, undirected edges are inserted into the graph for each pair of nodes that is directly connected by a stroke.

  • Grid: The second graph extraction algorithm is based on a grid-wise segmentation of the word images. For every segment, a node is inserted into the graph and labelled by the (xy)-coordinates of the centre of mass of this segment. Undirected edges are inserted between two neighbouring segments that are actually represented by a node. Eventually, the inserted edges are reduced by means of a Minimal Spanning Tree algorithm.

  • Projection: The next graph extraction algorithm works very similar to Grid. However, this methods is based on an adaptive segmentation of word images by means of projection profiles (using horizontal and vertical projection profiles).

  • Split: The last graph extraction algorithm is based on an iterative segmentation of word images. Segments are iteratively split into smaller subsegments until the width and height of all segments is below a certain threshold.

The dynamic range of the (xy)-coordinates of each node label \(\mu (v)\) is normalised with a z-score. Formally,

$$\begin{aligned} \hat{x} = \frac{x - \mu _x}{\sigma _x} \text { and } \hat{y} = \frac{y - \mu _y}{\sigma _y}, \end{aligned}$$
(2)

where \((\mu _x,\mu _y)\) and \((\sigma _x,\sigma _y)\) represent the mean and standard deviation of all (xy)-coordinates in the graph under consideration.

On the resulting sets of word graphs, ten different keywords are manually selected on both datasets to optimise several system parameters (see Sect. 4.2). For validation these keywords are matched against a validation set that consists of 1,000 different random words including at least 10 instances of all 10 keywords. The optimised systems are eventually evaluated on the same training and test sets as used in [4]. All templates of a keyword present in the training set are used for KWS. In Table 1 a summary of the datasets is given.

Table 1. The number of keywords as well as the size of the training and test sets for both documents.

4 Experimental Evaluation

4.1 Basic KWS Systems

For evaluating our proposed fast rejection heuristic, we consider the graph-based KWS system introduced in [13] and the four types of handwriting graphs described in Sect. 3. The original KWS system [13] is termed BP from now on, while our extended model with fast rejection is termed BP-FR.

To evaluate the KWS performance, two different metrics are used for global and local thresholds. In the case of global thresholds, the Average Precision (AP) is measured, which is the area under the Recall-Precision curve for all keywords given a single (global) threshold. In the case of local thresholds, we compute the Mean Average Precision (MAP), that is the mean of all APs for each individual keyword query. To measure the effects of our fast rejection filter, we compute the relative amount of pairwise matchings that is filtered by BP-FR (termed Filter Rate (FR) from now on).

4.2 Optimisation of the Parameters

For the basic KWS system BP and the four graph representations, we adopt parameters from previous work [13, 14]. For our extension BP-FR the following parameters are additionally optimised on the validation set.

First, the parameters of PGD are optimised with respect to MAP. That is, we employ PGD (rather than BP) as basic matching procedure in our KWS framework. On the validation set different polar segmentations (defined via \(u_{\max }\) and \(v_{\max }\)) are tested for two recursion levels (i.e. we define the maximal recursion depth to \(r=2\)). For \(l=1\), the parameter combinations \(u_{\max }=\{1,2,3,4,5,6\} \times v_{\max }=\{4,8,12,16,20,24,28,32,36,40\}\) are evaluated, while for \(l=2\) the parameter combinations \(u_{\max }=\{1,2,3,4\} \times v_{\max }=\{2,4,6,8,10\}\) are tested. Hence, we evaluate \(6\times 10\times 4\times 5={1,200}\) parameter combinations for every graph extraction method. In Table 2 the best performing parameters are presented for every graph extraction method and both datasets.

Table 2. Optimal \(u_{\max }\) and \(v_{\max }\) for PGD on both recursion levels l.
Fig. 2.
figure 2

Mean average precision (MAP) and filter rate (FR) as function of the threshold D.

For fast rejection in our extension BP-FR we evaluate different thresholds \(D=\{5,10,\ldots ,195,200\}\). In Fig. 2, the MAP and FR are shown for every tested threshold D. By increasing D we observe that the KWS performance is improved in general. Simultaneously, the number of filtered graphs is decreasing (making the KWS process slower in general). Threshold D is finally determined such that the MAP is maximal (or not further improved, when D is increased). In Table 3 the selected threshold D is given for each graph extraction method and both datasets.

Table 3. Optimal D for BP-FR and corresponding filter rate (FR).

4.3 Results and Discussion

We compare the optimised system BP-FR on the independent test sets with the original KWS framework BP [13] (without fast rejection). In Table 4 the mean average precision (MAP) for local thresholds, the average precision (AP) for global thresholds, as well as the filter rate (FR) is given for both BP and BP-FR. On the GW dataset we observe a filter rate between 50% and 70% (i.e. only 50% to 30% of all comparisons have to be carried out by the bipartite matching algorithm). Due to this filtering, we decrease the computation time of the complete KWS experiment by about 80 to 150 h on the different graph representations. Similar (or even better) filter rates can be observed on the second datasetFootnote 5.

Table 4. Mean average precision (MAP) using local thresholds, average precision (AP) using a global threshold, and filter rate (FR) for KWS using the original bipartite graph matching without rejection (BP) and with the proposed fast rejection (BP-FR). With ± we indicate the relative percental gain or loss in the accuracy of BP-FR when compared with BP.

Regarding the effects of our fast filtering on the KWS performance, we observe that the MAP is not negatively affected on both datasets. On the contrary, the filtering of irrelevant documents via PGD actually improves the MAP by about 5% and 10% on the GW and PAR dataset, respectively.

Regarding the AP (employed for global rather than local thresholds), we observe both deteriorations and improvements of BP-FR when compared with the original framework. Yet, most of the deviations are negligible. In particular on the GW dataset only small differences are observed on the resulting APs. On PAR we observe two substantial deteriorations of the AP. Yet, in these two cases we observe very high filter rates of about 60% and 70%.

Regarding the results in Table 4 the question arises whether the novel graph dissimilarity PGD would be able to achieve a competitive KWS accuracy. In order to answer this question, we employ the optimised PGD (rather than the bipartite matching) in the original KWS framework. In Table 5, the MAP and AP of this particular KWS system is shown on the Keypoint graphs (for the other graphs similar results are obtained). We observe that this system achieves worse results than BP on both datasets (regarding both MAP and AP). Hence, we conclude that PGD itself is not powerful enough to serve as basic dissimilarity model for graph-based KWS. Yet, as seen in the previous evaluation in Table 4, the PGD as fast rejection criterion in conjunction with BP is clearly beneficial.

Table 5. Mean average precision (MAP) using local thresholds, average precision (AP) using a global threshold for KWS using the original bipartite graph matching (BP), and the polar graph dissimilarity (PGD) on the Keypoint graphs.

5 Conclusion and Outlook

In the present paper a fast rejection approach for graph-based KWS is introduced. The rejection is based on a novel graph dissimilarity model, which compares the histograms of the node distributions in a polar coordinate system.

We compare our extended model with the original KWS framework without rejection ability on two benchmark datasets. We observe that our novel rejection approach reduces the amount of graph matchings by 50% or more on both datasets (in fact, filter rates of up to 80% are observed). Our rejection criterion is computed in linear time, while the actual graph matching needs cubic time. Hence, a dramatic speed up of the complete KWS process is achieved. Moreover, we can conclude that our novel extension for speeding up the existing KWS framework does not negatively influence the spotting accuracy.

In future work we aim at extending our novel graph dissimilarity model. For instance, we could consider not only nodes but also edges in the histograms.