Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Image understanding is important in fields such as medicine, aerospace, security, and semantic web. Classic approaches rely on stochastic methods involving feature classification and clustering, as in [1], where protein subcellular distributions are interpreted using various sets of subcellular location features (SLF), combined with supervised classification and unsupervised clustering methods. Earlier approaches use artificial neural network [2]. Another approach is case-based reasoning, as described in [3], where two creek type case-based reasoners operate within a propose-critique-modify task structure to combine low-level structure analysis with high-level interpretation of image content. Case-based reasoning (CBR) has been steadily expanding in the last 20 years and is widely applied in health sciences [4]. The subject is extensively covered in Perner’s 2008 book [5].

Apart from health sciences, image understanding has generated extensive work in security areas such as iris biometrics, as described in [6], where techniques are based on statistical analysis, starting with the work of Flom and Safir [7], Daugman [8] and Wildes [9], with subsequently inspired models such as neural networks, Gaussian mixture models, wavelets, fusing quality scores, etc. We find it also necessary to mention the model-based approach of the DARPA image understanding benchmark for parallel computers [10]. Among the probabilistic approaches, we also note Bayesian reasoning on qualitative descriptions for images [11].

However, more recent work tends toward a higher level of abstraction layer for image reasoning, using syntactic reasoning models such as [12], which employ a LALR type grammar and description languages [13]. Drawing from this and Knauff’s article [14] on a neuro-cognitive theory of deductive relational reasoning with mental models and visual images, we notice a direction in image understanding that can be successfully further expanded, namely providing a fully quantifiable model with a high-degree of expressiveness for human-like reasoning.

Thus, providing a formal language (or equivalent structure) that can capture abstract concepts, actions, and complex syllogisms about images, without necessarily mentioning the detected, but rather the semantic of the relationships between them.

This prompted us to revert to our work in brain−computer interfacing that led us to model human thought processes using concept algebra. The recent work performed by Feldman [15], Wang [1619], Hu [20], and Tien [21] strengthened our assumption that the model would fit well the need to express abstract relations between objects, as well as allow learning. This means that the model can use previous knowledge in addition to current observations in order to make deductions, resulting in better image comprehension across different image streams.

Using concept algebra produces the basis for a framework that allows reasoning on images, and can be further combined with epistemic logic to produce a reasoning and communication framework that can be used by heterogeneous (mobile) sensor agents.

2 Concept Algebra for Formal Ontology and Semantic Manipulation

Let us assume that we are provided with an image input stream, on which application-specific object detection and recognition have been performed, resulting in mapping of the objects to an informal static ontology. Our goal is to provide a denotational mathematical structure that is formal, dynamic, and general in order to rigorously model and process knowledge, thus obtaining a formal ontology fit for semantic manipulation and further use for inference and machine learning. Operational semantics for the calculus of concept algebra are formally elaborated using a set of computational processes in real-time process algebra (RTPA), as proposed by [19]. According to [19], we have the following definitions:

Definition 1

Denotational mathematics is a category of expressive mathematical structures that deals with high-level mathematical entities, with hyperstructures on HS beyond numbers on R with a series of embedded dynamic processes (functions).

Definition 2

A hyperstructure, HS, is a type of mathematical entity that is a complex n-uple with multiple fields of attributes and constraints, as well as their interrelations.

Wang employs the OAR (object-attribute-representation) model in order to extend classic ontologies such as WordNet (which is purely lexical) and ConceptNet (that adds complex concepts and higher-order concepts that compose verbs with arguments such as events and processes) as to distinguish between concept relations and attribute relations, thus facilitating machine learning and causal reasoning. Thus, he defines his language knowledge base, LKBUDM (UDM being a type suffix of RTPA).

Wang’s model heavily relies on RTPA, and views concepts as sets of IDs, attributes, objects, internal relations, and external input and output relations, whereas knowledge in general adds synonym and antonym relations to concepts.

In the following section, we propose an alternative to Wang’s model. We simplify by removing the RTPA notation, producing our own definition of concepts. From Wang’s model, we maintain the semantic environment Θ and the sets of relational and compositional operators OP = ●r, ●c.

Namely, we use the compositional operators as described by Wang [17]: inheritance, tailor, extension, substitute, composition, decomposition, aggregation, instantiation, and specification. We maintain the same semantic, only replacing concept representation. Thus, it remains possible to derive new concepts from previous ones. However, our approach relational operators differ, as alternative definitions of concepts co-exist, and translation from a syntactic representation to any semantic representation will lead to the same abstract concept. Thus, instead of relating synonyms, antonyms, etc., the relational operator links concrete concepts and abstract ones; be it pure abstractions (“good”, “beautiful”) or verbs.

3 Our View of Knowledge Representation

Remark 1

A concept may represent a concrete object (table, tree), a measurable phenomenon (wind, pressure), an abstract notion (task, gain, self), or an action (return, take off, beacon).

Remark 2

Auditory stimulation using words results in brain activation patterns that consist of simultaneously increased activity on a subset of the monitored brain locations. The set of monitored brain locations is finite, but can be arbitrarily chosen. Potentially infinite number of concepts can be defined on a set if you include PSDs (location and intensity, which is a real number).

Remark 3

One such brain location corresponds to a semantic dimension of the concept described by the stimulus word. The set of all semantic dimensions B, forms our working alphabet.

Definition 3

The definition of a concept C is the disjunction of the definitions of all its known synonyms S i :C = S 1 ∨ S 2 ∨  … ∨  S n (Building = House ∨ Hut ∨ Tower ∨ Shed).

Remark 4

Two distinct concepts, C1 and C2 may share semantic dimensions and one synonym may belong to one or more concepts (Ex.: “castle” may be in “house” or “fortification”).

Definition 4

A syntactic definition SintD of a concept synonym is a conjunction of free variables X 1 … X k , each variable X i corresponding to a feature in the agent-specific data model. The set of all syntactic definitions SintD is SintD.

Definition 5

A semantic definition SemD of a concept synonym is a conjunction of free variables Y 1Ym, each variable Y j corresponding to a semantic dimension in the abstract layer representation of the concept. The set of all semantic SemD definitions is SemD.

Definition 6

Translation between syntactic and semantic definitions of concept synonyms are performed by applying a bijective, invertible, and non-commutative function \( {\text{tsl}} \):\( SintD \to \,SemD \),where tsl(X 1 … X k ) = Y 1Ym. Its inverse \( {\text{tsl}}^{ - 1} \):\( SemD \to SintD \),\( {\text{tsl}}^{ - 1} (Y_{1} \ldots Y_{m} ) = X_{1} \ldots X_{k} \) performs semantic-to-syntactic translation.

Definition 7

The translation function \( {\text{tsl}} \):\( SintD \to \,SemD \) can be extended to function \( {\text{Tsl:}}SintD^{n} \to \,SemD^{n} \), \( {\text{Tsl}} \) having variable arity. Function \( {\text{Tsl}} \) allows translating concepts, as \( {\text{tsl}} \) allows translating concept synonyms.

Definition 8

A basic sentence \( F_{i} \) is a conjunction of concepts occurring simultaneously at a given moment t.

Definition 9

Basic inference is obtained by applying rules over basic sentences \( {\text{I}} = F_{1} \ldots F_{n} \), \( R_{1} \leftarrow R_{1}^{1} \ldots R_{1}^{{k_{1} }} \), where R is constant according to the domain—Horn clauses.

Definition 10

An agent is a mobile entity equipped with sensory input (denoted SI), such as an UAV, radiosonde, ground vehicle with thermocam, etc. The set of agents, Agents = {agent | agent = {self, resources, vocabulary, concept representation mechanism, inference mechanism, epistemic logic, learning mechanism, querying mechanism, game theory strategies, group, trusted agents, friends, enemies, task, current action, intention, gain, visible universe, invisible universe, knowledge}}.

Remark 5

The agent’s definition of self is a unique identifier (Self, rank) where Self is a word over an alphabet \( A \),\( A \cap B = \varPhi \), \( B = \{ \cup b_{i} \} \) and \( b_{i} \) is a semantic dimension, and rank is an indicator of the agent’s position in the group hierarchy.

Remark 6

The resource set R is formed by the semantic definitions of the concepts, denoting the resource to which weights are attached:\( {\text{R}} = \left\{ { ( {\text{SemD(C),w)}}} \right\} \).

Remark 7

The vocabulary D is initially a predefined set \( D_{0} \) of semantic concept definitions, which is further extended by learning or deduction. Thus, when an agent needs to learn a new concept C, it can ask several trusted agents and select the most frequent definition, query through a question/answer mechanism, or deduce the definition himself from web queries and ontology. The vocabulary becomes \( D_{i} = D_{0} \; \cup \;C \).

Remark 8

The sets for group G, trusted agents TA, friends Fr, and enemies E are apriorically defined, and can be updated by learning, reasoning, or communication with trusted sources. They consist of agent definitions as in 3.

Definition 11

An observation O is obtained by an agent by translating the sensory input (SI), which it receives as syntactic definitions into the corresponding semantic definitions and applying the inference rules to them\( F_{1} = tsl(SI_{1} ) \ldots F_{n} = tsl(SI_{n} ),\,R_{1} \leftarrow R_{1}^{1} \ldots R_{1}^{{k_{1} }} \ldots \).

Remark 9

The knowledge K of an agent is initially aprioric \( K_{0} \), and further updated by adding to it the validated inferences from observations and knowledge shared by trusted agents\( K_{i + 1} = k_{i} \cup VI \cup TF \), where \( TF = \{ \cup F,F \in \{ {\text{TrustedAgent}}\} \} \) and \( VI = \cup I,I = \{ \{ \cup O_{i} \} ,{\text{K}}_{i} {\text{R}}_{1} \leftarrow R_{1}^{1} \ldots R_{1}^{{K_{1} }} \ldots \} \wedge {\text{I = True}} \).

Remark 10

Common knowledge K is the intersection of knowledge of agents in a given group \( K\, = \, \cap \,K_{i} \), where \( {\text{i}}\, \in \,{\text{Agents}} \).

Definition 12

The visible universe V is the sum of the agent’s current observations via sensory input corroborated with the inferences, obtained by applying rules on the observations and its aprioric knowledge restricted to the current setting \( {\text{V = }}\{ \cup {\text{O}}_{i} \,\} \, \cup I,I = \{ \{ \cup {\text{O}}_{i} \,\} ,\,{\text{K}}_{\text{set}} ,{\text{R}}_{1} \leftarrow R_{1}^{1} \ldots R_{1}^{{K_{1} }} \ldots \} x \), where \( K_{set} \subseteq K \) \( (K_{set} = K \cap \{ \cup O_{i} \} \).

Definition 13

The invisible universe Inv represents everything that cannot be inferred from knowledge K and current observations \( O_{i} \).

4 Reasoning Mechanism

Thus, let us assume an agent that has acquired an image input stream. First, the agent will produce a syntactic representation of the objects it detects, which it translates to the semantic definition over the space of semantic dimensions (Fig. 1).

Fig. 1
figure 1

Sensory input is expressed syntactically

Once detected, sentences are being formed by applying rules to sequences of observations and extracting the abstract concepts. For instance, if in one image a tree is standing and whereas in the following it is down, the inference rule applied should introduce the abstract concept “fell” in the sentence, which is related through an relational operator to the concept “tree”. Each agent will have a set of abstract concepts to operate with, as described in the previous section (Fig. 2).

Fig. 2
figure 2

Agents will process visual input and produce agent-specific syntactic representations of the detected objects. These are further translated to algebraic semantic representations

Cascading application of inference rules should eventually describe the action that occurs in the image stream. The inference rules are applied recursively on the basic sentence and agent’s knowledge; the deductions in each step being added to the knowledge, until the targeted level of deduction is reached. In case of a multiagent system, the inference rules will also apply to knowledge from trusted agents. All new sentences are added to the agents’ knowledge, which it can share with other agents. In other words, if another agent does not possess the semantic definition of the “tree”, it can obtain it and the related abstract concepts from another agent, along with sample syntactic definitions. Learning means adding or replacing the order of the semantic definitions in the concept definitions. The most common, thus most likely, definition will be first. Rules can also be shared, since they are at the same abstraction level.

Thus, we have so far outlined a mechanism for inference on sensory input that allows action understanding and is representation-independent. It becomes possible for agents that have different input, in terms of dimensionality and significance, to learn concepts from one another and to corroborate their knowledge (Fig. 3).

Fig. 3
figure 3

Basic sentence: object identification and first-level abstractions and verb

Also, having formal representations for abstract concepts (such as gain, intention, trust, etc.) makes way for communication and cooperation in groups of agents, with the possibility to negotiate group strategies that are adequate in the given context.

The agents complete their vision of the visible universe by communicating, and use an epistemic logic on top of the algebraic formalism. Strategies are decided via a game-theoretic approach, whereas queries for learning new information are made by extending concept algebra with query algebra. Each agent is aware of both self and group interests, and will act according to what the situation requires (Figs. 4 and 5).

Fig. 4
figure 4

Inference loop with multiagent knowledge

Fig. 5
figure 5

Requesting semantic definition of unknown concept from another agent using query algebra

Finally, as the formalism is compatible with natural language makes it possible to send voice queries or commands simultaneously to different members of heterogeneous mobile agent groups (as each agent is aware of its identity and role) (see Fig. 6).

Fig. 6
figure 6

Group decisions with human interaction

5 Conclusions and Future Work

In this paper, we have outlined the building blocks for a framework based on concept algebra that can provide an abstraction layer for reasoning on images from different sources, regardless of specific data representations. Our approach to knowledge representation makes way for extensions to various logics, and for combining with query algebra for rigorously formalized searches in the input. The result is the possibility for agents with heterogenous knowledge and representations to communicate and learn, develop strategies, and define intentions within a multiagent group, and can interface with natural language. Future work involves the concrete definition of the inference rules and the deduction of abstract concepts and of actions. Also, implementation is required in order to assess the practical efficiency of the proposed reasoning mechanism. Finally, once the formalism is completely implemented and a high-level reasoning mechanism is thoroughly defined, extending the concept algebra with epistemic logic and implementing game theory decision-making strategies, would allow for intelligent sensing agents that can cooperate.