Keywords

1 Introduction

The expediency of using various forms of cooperation in organizing information activities was pointed out as early as in the 1990s [1,2,3,4]. The interest towards this topic was also provoked by the transition to the market economy in the former Soviet Union countries [5]. Currently, the relevance of this topic is determined by the need to develop innovative activities in conditions of limited financial resources.

Cluster policy is one of effective tools to develop territories and regions [6,7,8,9]. Cluster approach is most widely used in the European Union where it was elevated to the rank of public policy. The Russian Federation has begun to actively support innovative territorial clusters since 2011.

In the Republic of Kazakhstan, the cluster approach is laid down in the State Program of Industrial Innovative Development for 2015-2019. The program provides active financial support for clusters with the highest development potential. They are selected on a competitive basis.

This support is realized in several directions, including the processes of forming supplier bases and creating information platforms. Information systems for innovation management are an important component of cluster organizations infrastructure. The creation of such systems involves working with a wide range of commercial information resources. These resources are often used to form thematic databases designed to meet the information needs of cluster organizations. It is highly relevant to find the topology of such information processing system which would minimize overall operational costs for information support of innovative cluster activities. There are many approaches to solving this class of problems, as well as relevant data processing models, beginning with the simplest set-theoretic models and ending with the most complex simulation models. This article shows that the optimal topology of the information support system for cluster organization can be obtained by means of solving an integer linear programming problem.

2 Mathematical Statement of the Problem

The mathematical model proposed in this article describes the process of creating an information product in terms of cooperation. It also allows solving the problem of optimal distribution of technological operations between members of an innovation cluster in order to achieve the minimum of overall costs for product creation. This model is constructed as follows. Let the analysis of the market and choice of the database subject show that the information flow for database formation is the association of R subjects (rubrics):

$$\begin{aligned} \varPhi =\bigcup _{r=1}^{R}M_{r} \end{aligned}$$
(1)

and the volume of each rubric can be estimated by the number of documents belonging to it:

$$ \left| M_{r}\right| =V_{r}. $$

The level of interest of each of the partners \(U_{1}\), \(U_{2}\),..., \(U_{N}\) in the processing of each of the R rubrics can be estimated by the vector

$$\begin{aligned} \overline{p}_{n}=\left( p_{n1},p_{n2},\ldots ,p_{nR}\right) ,\,\,\,\left( n=\overline{1,N}\right) . \end{aligned}$$
(2)

In order not only to reflect the participant’s interest in processing documents for each of the R rubrics, but also to compare the level of their interest towards any of the rubrics, the following condition must be met:

$$\begin{aligned} \sum _{r=1}^{R}p_{nr}=1,\quad \left( 0\le p_{nr}\le 1\right) . \end{aligned}$$
(3)

The values \(p_{nr}\) can be determined in several ways. In particular, the level of interest of each of the partners in the documents of the \(r^{th}\) rubric can be estimated based on the predicted number of queries to each of the \(M_{r}\) arrays from each participant. Then we denote by \(b_{nr}\) the number of queries from the partner \(U_{n}\) to the array \(M_{r}\) and get:

$$\begin{aligned} p_{nr}=b_{nr}/\sum _{r=1}^{R}b_{nr}. \end{aligned}$$
(4)

More accurate estimates can be obtained using the notion of completeness of the answer in the database system. Here, as a criterion, the number of documents issued in response to each of the queries is used when searching all \(M_{r}\) arrays. Then, denoting by \(d_{nr}\) the total number of documents obtained by searching the array \(M_{r}\) for the entire set of queries \(B_{n}\) of the participant \(U_{n}\), we have:

$$\begin{aligned} B_{n}=\sum _{r=1}^{R}b_{nr},\quad p_{nr}=d_{nr}/\sum _{r=1}^{R}d_{nr}. \end{aligned}$$
(5)

Introducing a system of weighting coefficients \(\left( \beta _{n1},\,\beta _{n2},\ldots ,\,\beta _{nR}\right) \) to consider subjective factors which determine the interest of the participant \(U_{n}\) to the array \(M_{r}\), we get:

$$\begin{aligned} p_{nr}=\beta _{nr}d_{nr}/\sum _{r=1}^{R}d_{nr}. \end{aligned}$$
(6)

To simplify the time-consuming procedure of formulating a large number of queries, it is possible to use an approach based on the connection between the frequencies of terms in a database and the number of documents issued in response to a query. Then, using the results of a previously organized questionnaire of future customers (subscribers), and assuming that each of the queries includes only one term, we get a list of terms (normalized lexical units) for each of the \(U_{n}\) participants:

$$ L_{n}=\left( l_{n}^{1},\,l_{n}^{2},\ldots ,\,l_{n}^{K}\right) . $$

Comparing each term with frequency dictionaries of each \(M_{r}\) array, we get:

$$ \tilde{d}_{nr}=\sum _{r=1}^{R}F_{r}, $$

where

$$ F_{r}=\sum _{k=1}^{K}f_{nr}^{k}, $$

and \(f_{nr}^{k}\) is the frequency of the \(k^{th}\) term from the list of partner \(U_{n}\) in the array \(M_{r}\).

The technological process of information processing by the partners can be represented as an ordered sequence of simple or aggregated operations, the same for any \(M_{r}\) array:

$$\begin{aligned} O=\left( O^{1},\,O^{2},\ldots ,O^{Q}\right) . \end{aligned}$$
(7)

Let us denote by \(t_{n}^{qi}\) the volume of unit costs for the \(i^{th}\) resource (taking into account the characteristics of software and hardware resources, as well as other factors affecting the real cost of data processing operations in the \(U_{n}\) center) for the operation of \(O^{q}\) by the partner \(U_{n}\). Also, we consider the fact that the overall cost of \(i^{th}\) resource for each of the partners cannot exceed a certain limit value \(\mu _{n}^{i}\). Taking into account the volumes of the \(M_{r}\) arrays, let us introduce the indicator:

$$\begin{aligned} \tau _{nr}^{qi}=V_{r}t_{n}^{qi}, \end{aligned}$$
(8)

which characterizes the cost of the \(i^{th}\) resource required by the partner \(U_{n}\) to perform the operation \(O^{q}\) on the array \(M_{r}\). Then the overall cost can be described as:

$$\begin{aligned} H^{1}=\sum _{i=1}^{I}\sum _{n=1}^{N}\sum _{r=1}^{R}\sum _{q=1}^{Q}\omega _{nr}^{q}\tau _{nr}^{qi}, \end{aligned}$$
(9)

where

$$ \omega _{nr}^{q}=\left\{ \begin{array}{l} 1-\mathrm {if\,\,the\,\,}U_{n}\,\,\mathrm {participant\,\,performs\,\,the\,\,}O^{q}\,\mathrm {operation\,\,on\,\,the}\,\,M_{r}\,\,\mathrm {array};\\ 0-\mathrm {otherwise}; \end{array}\right. $$

and the equality holds:

$$\begin{aligned} \sum _{n=1}^{N}\omega _{nr}^{q}=1 \quad \left( q=\overline{1,Q};\,\,r=\overline{1,R}\right) . \end{aligned}$$
(10)

The latter means that each of the \(O^{q}\) operations on any \(M_{r}\) array is necessarily performed, and it is done only by one of the \(U_{n}\) partners, i.e. the principle of one-time processing of information is respected. Thus, there is a problem of minimizing the functional (9) with the equality (10) and limitations held:

$$\begin{aligned} \omega _{nr}^{q}=\left\{ 0,1\right\} ; \end{aligned}$$
(11)
$$\begin{aligned} \sum _{r=1}^{R}\sum _{q=1}^{Q}\omega _{nr}^{q}\tau _{nr}^{qi}\le \mu _{i}^{n}. \end{aligned}$$
(12)

With the solution of this problem, it is possible to find such distribution of work between partners which allows achieving the minimum of the overall costs for creating an information product. However, this is true only if there are no subjective factors affecting the distribution of work between partners, and causing the need for some operations to be performed centrally whereas the solution of others (for example, information service operations themselves) is decentralized. Based on the foregoing, the set of O operations can be divided into three disjoint subsets (\(O^{\prime }\) means centralized operations; \(O^{\prime \prime }\) means distributed operations; \(O^{\prime \prime \prime }\) means operations performed by each of the \(U_{n}\) partners), and further focus will be on distributed operations.

We impose a penalty on the execution of operations from \(O^{\prime \prime }\) in case if the participant \(U_{n}\) (interested in the results of processing documents of the \(r^{th}\) rubric) does not perform operations on its processing. Logically, the amount of the penalty should be proportional to the level of interest of the participant and the cost of performing the operation \(O^{q}\) on the array \(M_{r}\). In this case, the functional (9) will take the following form:

$$\begin{aligned} H^{2}=H_{1}+\sum _{i=1}^{I}\sum _{n=1}^{N}\sum _{r=1}^{R}\sum _{q:\,O^{q}\in O^{\prime \prime }}\left( 1-\omega _{nr}^{q}\right) p_{nr}\tau _{nr}^{qi}. \end{aligned}$$
(13)

To simplify the functional, we introduce the notation:

$$\begin{aligned} \theta _{nr}^{q}=\sum _{i=1}^{I}\tau _{nr}^{qi}. \end{aligned}$$
(14)

Then the minimized functional (13) can be rewritten in the form:

$$\begin{aligned} H^{2}=\sum _{n=1}^{N}\sum _{r=1}^{R}\sum _{q:\,O^{q}\in O^{\prime \prime }}\omega _{nr}^{q}\theta _{nr}^{q}+\sum _{n=1}^{N}\sum _{r=1}^{R}\sum _{q:\,O^{q}\in O^{\prime \prime }}\left( 1-\omega _{nr}^{q}\right) p_{nr}\theta _{nr}^{q}. \end{aligned}$$
(15)

Thus, the task of obtaining the optimal topology of the information management system for a cluster organization is described by means of the mathematical model (15), (10), (11), (12), which, as you can see, belongs to the class of integer linear programming models.

3 Results and Discussion

The results of numerical calculations for the proposed model allow us to determine the distribution of work between cluster members. This distribution allows to achieve the minimum of overall costs for operating the information support system of this cluster organization with given resource constraints. Figure 1 shows the graphical interpretation of the problem solution results (15) with constraints (10), (11), (12), and obtaining the network topology of distributed information processing in cluster organization conditions. In these conditions, any of the participants performs at least one operation on processing information of documents. There is at least one thematic rubric, and none of them can be duplicated.

Fig. 1.
figure 1

Graphical interpretation of the results obtained

Practical testing of the proposed model was carried out during the creation and implementation of an automated scientific and technical information system of the Siberian Branch of the Academy of Sciences. It covers research institutes of the Novosibirsk Scientific Center and information files in the fields of chemistry, biology, information sciences, environmental protection, etc. The results obtained allow to reduce financial costs for the creation and operation of the system almost twice [4].

4 Conclusion

The conditions of public-private partnership in solving the development problems of innovation activity cluster forms in the Republic of Kazakhstan involve new technologies of interaction between cluster organizations members. In some cases, this makes it possible to achieve a significant reduction in the cost of creating components of cluster organization infrastructure.

When forming the information infrastructure of a cluster, it is possible to organize a system of distributed information processing which would minimize the total cost of system operation.