Keywords

1 Introduction

The aim of the project was:

  1. (a)

    The surface analysis of the RWE gas line in the Czech Republic (CR), i.e. processing of classification data of storage of gas facilities below certain types of terrain surfaces. This analysis was done in order to determine reproductive values of gas facilities (pipelines) and a valuation of costs which would be necessary to spend for building new networks.

  2. (b)

    The surface analysis in three variations for the route of gas line: a high-pressure gas line, a main line and service pipes. The requirement was in this subject-division to classify a surface in three separate groups:

    • 2 groups of a surface type: paved and unpaved

    • 5 groups of a surface type: asphalt, main road, local road, unbound, unknown

    • 10 groups of a surface type: forest, grassland, bare soil, asphalt, roof-tile, roof-straight, shadow, main road, local road, path.

  3. (c)

    The output was realized in a graphic form (*.shp format) and in a text form as a structured table in XLSx format with classified data of surface types above the route of gas line.

  4. (d)

    The extent of the analysis contained 12 regions from standpoint of the administration division in the CR. There are these regions: Karlovarský, Plzeňský, Středočeský, Ústecký, Liberecký, Královéhradecký, Pardubický, Vysočina, Moravskoslezský, Olomoucký, Zlínský and Jihomoravský. So it was processed total 188 municipalities with extended state administration (ORP). The outputs of classification were required in a graphic and in a text form to a territorial detail of the part of municipality within a territorial identification register of the CR.

The solution was based on a pilot project and a developed technology for this purpose called “Classification of data about storage of utilities and facilities below certain types of terrain surfaces” (Bureš et al. 2013).

2 Related Works

Works that deal with the results of image classification can be divided according to the application domain and according to the used methods. The field of remote sensing is represented most frequently in terms of application (Pupin et al. 2014; da Silva et al. 2014; Debes et al. 2014; Iiames et al. 2013; Puertas et al. 2013; Mouelhi et al. 2013; Landgrebe 2003; Liang 2004). Another fields of image classification is used: medicine and biology (Gray and Song 2013), food industry (food quality testing) (Johnson 2013), agriculture (forecast of crop in the given field), ecology (Makarau et al. 2013) and geodesy (Zhong et al. 2012) (classification of orthophoto).

There are a number of methods for improvement of the results of image classification and these methods are the most common:

  • works dealing with the basic usage of image classification (Landgrebe 2003; Liang 2004),

  • classical methods of image enhancement (filtering, transformation of brightness, discrete convolution) (Pupin et al. 2014),

  • the use method of Support vector machine (SVM) (Iiames et al. 2013),

  • a combination of different methods of image capturing of the Earth’s surface (da Silva et al. 2014; Makarau et al. 2013),

  • processing of time sequence of images (Debes et al. 2014),

  • the use of graph theory (factors of graph) (Mouelhi et al. 2013),

  • application of correlation analysis with a matrix of errors (Puertas et al. 2013),

  • the use of metric spaces (Searcóid 2006) for evaluation of the results of image classification,

  • the use of image classification in geodesy and surveying (Černota et al. 2011; Dandoš et al. 2013).

3 Theoretical Foundations

Let a non-empty set of U   is the universe of discourse and the X is a subset (X  U). The U set represents in our case the entire solved area and the X subset is part of the solved area. The elements of the X set are geographic objects of given locality. Then, let a system of relation of equivalence exists {R i } (i = 1, 2, …, m), where each R i relation divides the X set into n subsets \(X/R_{i} = \left\{ {X_{i1} ,X_{i2} , \ldots ,X_{in} } \right\}\) such that for every i, j holds:

$$\begin{aligned} & X_{ij} \subseteq U,X_{ij} \ne \emptyset , \, for\;\left( {i = 1,2, \ldots ,m} \right), \\ & \left( {j = 1,2, \ldots ,n} \right){---}({\text{all}}\;{\text{subsets}}\;{\text{are}}\;{\text{non-empty}}) \\ \end{aligned}$$
(1)
$$\begin{aligned} & X_{ij1} \cap X_{ij2} \ne \emptyset , \, for \, \left( {i = 1,2, \ldots ,m} \right),\left( {j_{1} ,j_{2} = 1,2, \ldots ,n} \right), \\ & j_{1} \ne j_{2} {---} \, ({\text{intersection}}\,{\text{of}}\,{\text{all}}\,{\text{subsets}}\,{\text{for}}\,{\text{each}}\,{\text{decomposition}},\\ &{\text{defined}}\,{\text{by}}\,{\text{the}}\,R_{i} \,{\text{displays}},{\text{is}}\,{\text{empty}}) \\ \end{aligned}$$
(2)
$$\begin{aligned} & \bigcup\nolimits_{i = 1,j = 1}^{i = m,j = n} {X_{ij} } = X({\text{union}}\,{\text{of}}\,{\text{all}}\,{\text{subsets}}\,{\text{within}}\,{\text{each}}\,{\text{decomposition}}, \\ & {\text{which}}\,{\text{is}}\,{\text{defined}}\,{\text{by}}\,R_{i} \,{\text{displays}},{\text{is}}\,{\text{just}}\,{\text{the}}\,{\text{whole}}\,U\,{\text{set}}) \\ \end{aligned}$$
(3)

The R 1 relation of equivalence can be considered as territorial division of the CR according to an administrative arrangement that divides this area into lower administrative units (regions, districts, etc.). If the conditions in (1)–(3) are complied, each sub-area in terms of set theory has a character of the X ij class.

Suppose there is another relation R 2 such that defines decomposition according to the subject (thematic) principle, i.e. according to geographical objects, which are elements of the X set. In our case, geographical objects are gas facilities stored in the ground.

Then the intersection of relations S = R1 ∩ R2, S  X defines the modeled area in which we will conduct spatial analysis. Because this territory is defaultly smaller than the original solved territory, this combination of decomposition leads to optimize the entire process of analysis and time saving of solution.

4 Method of Solution

The solution was carried out in the following stages:

  1. 1.

    Data preparation—generate data sets for each ORP for the entire territory of the CR (creation of binding database tables and dials, data sorting, repairing of topological errors in input data, building data sets, debugging of generating scripts)

  2. 2.

    Determination of parameters—measuring of the widths of roads on orthophoto for setting buffers of roads, creation of classification keys, data reclassification according to the classification keys

  3. 3.

    Data classification—classification of pipeline sections above various surface types based on image of orthophoto, classification keys and parameterized communication

  4. 4.

    Editing of errors—checking of completeness of image classification, semi-automated editing of errors of image classification, batch editing of changes to database tables

  5. 5.

    Final inspection—batch sorting outputs and files, checking completeness of the outputs and checksums

  6. 6.

    Completion—completion of data sets for a transfer, creation of XLSs file, creation SHP files for regions, a checking the completeness.

The scheme of technology of data analysis in illustrated in Fig. 1.

Fig. 1
figure 1

The scheme of technology of data analysis

The data analysis was solved in the ESRI ArcGIS 10 environment. Total 3 powerful computers of the PC type and 5 notebooks were used for processing, which were connected in computer network (Fig. 2). The accessibility to the data was provided only for researchers so copyrights of data were protected. The processing of automated part was carry on computer machines of the PC type with parameters: CPU Intel Core i5 3.8 GHz, 16 GB RAM, NVIDIA GTX650 2 GB, HD of SDD type and VelociRaptor, OS of Microsoft Windows 7 (64 bit). Processing software was ArcGIS 10.0, Python 2.6 (both 32-bit).

Fig. 2
figure 2

The scheme of connection and parameters in computer network

5 Experimental Results

The project was implemented on a territory of the CR with the total area of 64,350 km2. Input data for the classification were sheets of orthophotos with the resolution of 25 cm/1 pixel. The improvement of results of image classification was achieved by filtration from communication layers of the ZABAGED database. Processing was carried out according to the methodology described in the previous chapter. Results of the data analysis have been developed in text form (a pivot table in Excel, see Fig. 3) and graphical form (SHP format, see Fig. 4) in the dividing from standpoint:

Fig. 3
figure 3

The example of a pivot table of output in XLSx format

Fig. 4
figure 4

Graphical output of a structured classified data in SHP format of the route of gas line

  1. 1.

    Object standpoint, i.e. gas lines of types: high-pressure, main line and service pipes

  2. 2.

    Territorial standpoint, i.e. according to administrative division of the territory of the CR (regions, districts, ORP, municipality, parts of municipalities).

The structured table in Fig. 3 is the row structured list of about 13,000 rows (1 row contains data about a part of municipality). This list is arranged in the form of pivot table with options of sorting by common and widespread filter according to categories of a part of municipality, municipality, ORP, district and region (yellow parts in the figure). Surface analysis was carried out in three variants for the course of high-pressure gas pipeline, main lines and service pipes for 2 groups, 5 groups or 10 groups of surface type. The result is cumulative lengths of high pressure gas lines, main lines and service pipes under individual classes of surface for defined selection by a filter. In terms of volume data, it is a very detailed division of extensive territory-bound geo-database. Results include a total of 3 × (2 + 5 + 10) × 13,000 = 663,000 items.

The graphical output of structured classified data in SHP format for 3 types of gas lines is in Fig. 4. This is a segmented course of gas line with an attribute table for 10 classes, which is georeferenced and viewable over any graphic base (orthophotos, cadastral map, etc.) in GIS of RWE Company.

GIS tools can conduct the function evaluation of the sum of the lengths of types of gas line under individual classes of surface in given categories (according to the definition by alphanumeric attribute) above any area in graphics, which is defined by e.g. fence, closed polygon, etc.

6 Analysis of Error Rate of Processing

The error rate of the F was evaluated as the ratio of the total length erroneously classified sections of the gas line to the total length of the gas line in percentage. The error rate was determined on a sample of 20 % of the territory (40 ORP) using locating and measuring lengths of erroneous sections of the S in a classified image. The orthophoto image was admitted as a reference fact. The error rate was assessed by the human factor by the following equation:

$$F = \frac{A}{C} * 100\left[ \% \right]$$
(4)

where

A:

is the length of incorrectly classified sections of the gas line

C:

is the total length of the gas line

The overview of the error rate for the high pressure gas line, the main series and service pipes is given in Table 1. The maximum error rate of sample does not exceed 5.4 %. The largest average error rate shows the main line.

Table 1 Error rate of processing

7 Conclusions

These following conclusions are from our solution:

  1. (a)

    The technology enables with high information ability (error rate less than 5.5 %) and relatively rapidly (in the period to 16 weeks) processing data analysis of lengths of gas line sections that lie under the surface of various types:

    • 2 groups of a surface type: paved and unpaved

    • 5 groups of a surface type: asphalt, main road, local road, unbound, unknown

    • 10 groups of a surface type: forest, grassland, bare soil, asphalt, roof-tile, roof-straight, shadow, main road, local road, path.

  2. (b)

    The results of data analysis for 2 groups (paved and unpaved surface) have the largest information ability.

  3. (c)

    The technology of data analysis enables repeatability practically without influence of the human factor.

  4. (d)

    Data analysis shows very high information ability in terms a low error rate and it can be used repeatedly. The error rate is due mainly a limit quality of the default underlying data.

  5. (e)

    The technology of data analysis has qualitative potential of further refinement with possibilities of future use of new, qualitatively higher, source datasets.

  6. (f)

    The entire technological process requires about 54 % of skilled manual work, especially in terms of data preparation and qualified decision between processes—see Fig. 5

    Fig. 5
    figure 5

    Portion of time-demand of automated part of data analysis and manual work

The results showed that only one additional dataset for filtering of sub-results suffices for the effective achievement of the required quality classification for the given purpose and this layer refines results of the automated process. Subsequent visual inspection associated with editing then slightly improves the result of classification so that the absolute error rate of the fact, that is represented by the image orthophoto, does not exceed 2–3 %.

The process described in this paper has general character and it can be used for classification of the surface above utilities such as water supply, sewerage, electric power distribution, media distribution, etc.