Introduction

Melanogenesis is a physiological process resulting in the synthesis of melanin pigments, which play a crucial protective role against skin photocarcinogenesis. In humans and other mammals, the biosynthesis of melanin takes place in a lineage of cells known as melanocytes, which contain the enzyme tyrosinase [1]. Tyrosinase (phenoloxidase) is known to be a key enzyme for melanin biosynthesis. This enzyme is mainly involved in the initial steps of the pathway which consist of the hydroxylation of the l-tyrosine (monophenolase activity) and the oxidation of the product of this reaction, the l-DOPA (diphenolase activity), to give rise to o-dopaquinone [2]. This o-quinone is transformed into melanins, followed by a series of divergent steps that give rise to a predominantly indolic pigment (eumelanin) and a closely related pigment containing benzothiazine subunits (phaeomelanin).The current view is that most human pigmentation involves a combination of these pathways giving rise to mixtures of varying composition [3, 4].

Many approaches are based on the use of analogue substrates for tyrosinase which are designed to maximize the generation of reactive orthoquinone oxidation products and increasing their diffusion range by preventing the spontaneous self-extinguishing cyclization reaction [5]. These, if released into the cytosol through the defective melanosomal membranes of malignant melanocytes, have the potential to react with vital cellular components and cause irreversible damage [6]. Therefore, inhibitors of tyrosinase should be useful as therapeutic agents for the treatment of melanin hyperpigmentation and cosmetic materials for whitening after sunburn [7, 8].

On other hand, the computational methods have become in a suitable alternative to the drug design, and have recently applied to QSAR studies of tyrosinase inhibitors [911], using congeneric or heterogeneous dataset of compounds. In this sense QSAR methods can reduce the costly failures of drug candidates in clinical trials by filtering virtual libraries of chemicals.

One of our research group has carried out QSAR/QSPR studies related to chemical, physicochemical and biological properties of different chemicals and drugs [1216], including studies in nucleic acid–drug interactions [17, 18] and discovery of antimalarial compounds [19]. The ‘in house’ TOpologicalMOlecular COMputer Design-Computer Aided‘Rational’ Drug Design (TOMOCOMD-CARDD) software [20] a novel computer-aided molecular design scheme, based in the graph theory and linear algebra; has been used to develop this entire works and many others.

Here we propose a new set of molecular descriptors (MDs) namely non-stochastic and stochastic bond-based bilinear indices, its application to discriminate tyrosinase inhibitor compounds (actives) from inactive ones using QSAR models, is shown. Furthermore a virtual screening is carried out with a small library of chemicals and as a final point we present the in silico identification, synthesis and in vitro assays of a new set of tetraketones, a procedure that can arise the potentialities of these new MDs into a real world application, that could help to speed up the discovery of new lead compounds to treat the hyperpigmentation and skin disorders.

Theoretical framework

The basis of the extension of bilinear indices that will be given here is the edge-adjacency matrix considered and explicitly defined in the chemical graph-theory literature [21, 22], and rediscovered by Estrada as an important source of new MDs [2328]. In this section, we first will define the nomenclature to be used in this work, then the atom-based molecular vector \(({\bar{x}})\) will be redefined for bond characterization using the same approach as previously reported, and finally some new definition of bond-based non-stochastic and stochastic bilinear indices will be given.

Background in edge-adjacency matrix and new edge-relations: stochastic edge-adjacency matrix

Let G = (V, E) be a simple graph, with \({V=\{v_{1}, v_{2},\ldots, v_{n}\}}\) and \({E=\{e_{1}, e_{2}, \ldots e_{m}\}}\) being the vertex- and edge-sets of G, respectively. Then G represents a molecular graph having n vertices and m edge (bonds). The edge-adjacency matrix E of G (likewise called bond-adjacency matrix, B) is a square and symmetric matrix whose elements e ij are 1 if and only if edge i is adjacent to edge j [25, 2830]. Two edges are adjacent if they are incidental to a common vertex. This matrix corresponds to the vertex-adjacency matrix of the associated line graph. Finally, the sum of the ith row (or column) of E is named the edge-degree of bond \({i,\,\,\delta (e_{i})}\) [23, 26, 27, 29, 30].

By using the edge (bond)–adjacency relationships we can find other new relation for a molecular graph that will be introduced here. The kth stochastic edge-adjacency matrix, \({{\bf ES}^{\varvec k}}\) can be obtained directly from \({{\bf E}^{\varvec k}}\). Here, \({{\bf ES}^{\varvec k}=[{}^{k}es_{ij}]}\) is a square table of order m (m = number of bonds) and the elements \({^{k}es_{ij}}\) are defined as follows:

$$ {}^{k}es_{ij}=\frac{{}^ke_{ij}}{{}^k\hbox{SUM}(E^k)_i}=\frac{{}^ke_{ij}}{{}^k\delta (e)_i}$$
(1)

where, \({{}^{k}e_{ij}}\) are the elements of the kth power of E and the SUM of the ith row of E k are named the k-order edge degree of bond i,  kδ(e) i . Note that the matrix \({{\bf ES}^{k}}\) in Eq. 1 has the property that the sum of the elements in each row is 1. An m × m matrix with nonnegative entries having this property is called a stochastic matrix [31].

Chemical information and bond-based molecular vector

The atom-based molecular vector (\({\bar{x}}\)) used to represent small-to-medium size organic chemicals has been explained in some detail elsewhere [1214, 16, 17, 3244]. In a manner parallel to the development of \({\bar{x}}\), we present the expansion of the bond-based molecular vector (\({\bar{w}}\)). The components (w) of \({\bar{w}}\) are numeric values, which represent a certain standard bond property (bond-label). That is to say, these weights correspond to different bond properties for organic molecules. Thus, a molecule having \({5, 10, 15,\ldots,m}\) bonds can be represented by means of vectors, with \({5, 10, 15,\ldots,m}\) components, belonging to the spaces \({\Re^{5}}\), \({\Re^{10}}\), \({\Re^{15},\ldots}\), \({\Re^{m}}\), respectively; where m is the dimension of the real sets (\({\Re^{m})}\). This approach allows us encoding organic molecules such as 3-hydroxy-2-butenenitrile through the molecular vector \({\bar{w}}\) = [\({w_{\rm Csp3-Csp2}}\), \({w_{\rm Csp2=Csp2}}\), \({w_{\rm Csp2-Osp3}}\), \({w_{\rm H-Osp3}}\), \({w_{\rm Csp2-Csp}}\), \({w_{\rm Csp\equiv Nsp}}\) ]. This vector belongs to the product space \({\Re^{6}}\).

These properties characterize each kind of bond (and bond-types) within the molecule. Diverse kinds of bond weights (w) can be used in order to codify information related to each bond in the molecule. These bond labels are chemically meaningful numbers such as standard bond distance [4548], standard bond dipole [4548] or even mathematical expressions involving atomic weights such as atomic log P [49], surface contributions of polar atoms [50], atomic molar refractivity [51], atomic hybrid polarizabilities [52], and Gasteiger–Marsilli atomic charge [53], atomic electronegativity in Pauling scale [54] and so on. Here, we characterized each bond with the following parameter:

$$ w=x_{i}/\delta_{i}+ x_{j}/\delta_{j}\ $$
(2)

which characterizes each bond. In this expression x i can be any standard weight of the atom i bonded with atom j. δi is the vertex (atom) degree of atom i. The use of each scale (bond property) defines alternative molecular vectors, \({\bar{w}}\).

The chemical information can also be codify by means of two different molecular vectors, for instance, \({\bar{w}=[w_{1}, \ldots,w_{n}]}\) and \({\bar{u}=[u_{1}, \ldots ,u_{n}]}\); then different combinations of molecular vectors (\({\bar{w}\ne \bar{u}}\)) are possible when a weighting scheme is used. In the present report, we characterized each bond with mathematical expressions involving the following parameters: atomic masses (M) [55], the van der Waals volumes (V) [55], the atomic polarizabilities (P) [55], and atomic electronegativity (E) in Mulliken scale [55]. The values of these atomic labels are shown in Table 1. From this weighting scheme, six (or 12 if \({\bar{w}_{M}\hbox{-}\bar{u}_{V} \neq \bar{w}_{V}\hbox{-}\bar{u}_{M}}\)) combinations (pairs) of molecular vectors (\({\bar{w},\bar{u};\bar{w}\neq \bar{u}}\)) can be computed, \({\bar{w}_{M}\hbox{-}\bar{u}_{V}}\), \({\bar{w}_{M}\hbox{-}\bar{u}_{P}}\), \({\bar{w}_{M}\hbox{-}\bar{u}_{K}}\), \({\bar{w}_{V}\hbox{-}\bar{u}_{P}}\), \({\bar{w}_{V}\hbox{-}\bar{u}_{K}}\), and \({\bar{w}_{P}\hbox{-}\bar{u}_{K}}\). Here, we used the symbols \({\bar{w}_{X}\hbox{-}\bar{u}_{Z}}\), where the subscripts X and Z mean two mathematical expressions involving atomic properties from our weighting scheme and a hyphen (-) expresses the combination (pair) of two selected bond-label chemical properties. In order to illustrate this we will consider this in an example describe in other section of this work.

Table 1 Values of the atom weights used for linear indices calculation [5457]

Definition of mathematical bilinear forms

In mathematics, a bilinear form in a real vector space is a mapping \({b:VxV\to \Re}\), which is linear in both arguments [5860]. That is, this function satisfies the following axioms for any scalar α and any choice of vectors \({\bar{v},\bar{w},\bar{v}_1,\bar{v}_2 ,\bar{w}_1}\) and \({\bar{w}_2}\).

  1. i.

    \({b(\alpha \bar{v},\bar{w})=b(\bar{v},\alpha \bar{w})=\alpha b(\bar{v},\bar{w})}\)

  2. ii.

    \({b(\bar{v}_1 +\bar{v}_2 ,\bar{w})=b(\bar{v}_1 ,\bar{w})+b(\bar{v}_2 ,\bar{w})}\)

  3. iii.

    \({b(\bar{v},\bar{w}_1 +\bar{w}_2 )=b(\bar{v},\bar{w}_1 )+b(\bar{v},\bar{w}_2)}\)

That is, b is bilinear if it is linear in each parameter, taken separately.

Let V be a real vector space in \({\Re^n(V\in \Re^n)}\) and consider that the following vector set, \({\left\{ {\bar{e}_1 ,\bar{e}_2 ,\ldots,\bar{e}_n} \right\}}\) is a basis set of \({\Re^n}\). This basis set permits us to write in unambiguous form any vectors \({\bar{w}}\) and \({\bar{y}}\) of V, where \({(w^1,w^2,\ldots,w^n)\in \Re^n}\) and \({(u^1,u^2,\ldots,u^n)\in \Re^n}\) are the coordinates of the vectors \({\bar{x}}\) and \({\bar{u}}\), respectively. That is to say,

$$ \bar{w}=\sum\limits_{i=1}^n {x^i\bar{e}_i } $$
(3)

and,

$$ \bar{u}=\sum\limits_{i=1}^n {y^j\bar{e}_j } $$
(4)

Subsequently,

$$ b(\bar{w},\bar{u})=b(w^i\bar{e}_i ,u^j\bar{e}_j )=w^iu^jb(\bar{e}_i ,\bar{e}_j ) $$
(5)

if we take the a ij as the n × n scalars \({b(\bar{e}_i ,\bar{e}_j)}\), That is,

$$ a_{ij} =b(\bar{e}_i ,\bar{e}_j ),\quad \hbox{ to }i=1,2,\ldots,n\hbox{ and }j=1,2,\ldots,n $$
(6)

Then,

$$ b(\bar{w},\bar{u})=\sum\limits_{i,j}^n {a_{ij} w^iu^j=\left[ W \right]^TA\left[ U \right]} =\left[ \begin{array}{lll} {w^1} & \ldots & {w^n} \\ \end{array} \right]\left[ \begin{array}{lll} {a_{11}} & \ldots & {a_{jn}} \\ \ldots & \ldots & \ldots \\ {a_{n1}} & \ldots & {a_{nn}} \\ \end{array} \right]\left[ \begin{array}{l} {u^1} \\ \vdots \\ {u^n} \\ \end{array} \right] $$
(7)

As it can be seen, the defined equation for b may be written as the single matrix equation (see Eq. 7), where [U] is a column vector (an n × 1 matrix) of the coordinates of \({\bar{u}}\) in a basis set of \({\Re^{n}}\), and [W]T (a 1 × n matrix) is the transpose of [W], where [W] is a column vector (an n × 1 matrix) of the coordinates of \({\bar{w}}\) in the same basis of \({\Re^{n}}\).

Finally, we introduce the formal definition of symmetric bilinear form. Let V be a real vector space and b be a bilinear function in V × V. The bilinear function b is called symmetric if \({b(\bar{w},\bar{u})=b(\bar{u},\bar{w}),\forall \bar{w},\bar{u}\in V}\) [5860] Then,

$$ b(\bar{w},\bar{u})=\sum\limits_{i,j}^n {a_{ij} w^iu^j} =\sum\limits_{i,j}^n {a_{ji} w^ju^i} =b(\bar{u},\bar{w}) $$
(8)

The total non-stochastic and stochastic bond-based bilinear indices

If a molecule consists of m bonds (vector of \({\Re^{m}}\)), then the kth total bilinear indices are calculated as bilinear maps (bilinear form) in \({\Re^{m}}\) in canonical basis set. Specifically, the kth total non-stochastic and stochastic bond bilinear indices, \({b_{k}(\bar{w},\bar{u})}\) and \({{}^{s}b_{k}(\bar{w},\bar{u})}\), are computed from these kth non-stochastic and stochastic edge adjacency matrices, \({{\bf E}^{\varvec k}}\) and \({{\bf ES}^{\varvec k}}\), as shown in Eqs. 9 and 10, correspondingly:

$$ b_k (\bar{w},\bar{u})=\sum\limits_{i=1} ^m \sum\limits_{j=1}^m {{}^ke_{ij} w^iu^j}=[{W}]^{t}{\bf E}^{\varvec k}[{U}] $$
(9)
$$ {}^sb_k (\bar{w},\bar{u})=\sum\limits_{i=1}^m \sum\limits_{j=1}^m {{}^kes_{ij} w^iu^j}=[{W}]^{t} {\bf ES}^{\varvec k}[{U}] $$
(10)

where, m is the number of bonds of the molecule, and \({w^{1}, \ldots ,w^{m}}\) and \({u^{1} ,\ldots, u^{m}}\) are the coordinates of the bond-based molecular vectors \({\bar{w}}\) and \({\bar{u}}\) in a canonical basis set of \({\Re^{n}}\). Therefore, if we used the canonical basis set, the coordinates [\({(w^{1},\ldots ,w^{n})}\) and \({(u^{1},\ldots ,u^{n})}\) ] of any molecular vectors (\({\bar{w}}\) and \({\bar{u}}\)) coincide with the components of those vectors [(\({w_{1},\ldots ,w_{n})}\) and \({(u_{1},\ldots ,u_{n})}\) ] [28, 45, 46]. For that reason, those coordinates can be considered as weights (bond-labels) of the edge of the molecular graph. The coefficients \({{}^{k}e_{ij}}\) and \({{}^{k}es_{ij}}\) are the elements of the kth power of the matrix E(G) and ES(G), correspondingly, of the molecular pseudograph. The defining Eqs. 9 and 10 for \({b_{k}(\bar{w},\bar{u})}\) and \({^{s}b_{k}(\bar{w},\bar{u})}\), respectively, may be also written as the single matrix equation (see Eqs. 9 and 10), where [U] is a column vector (an n × 1 matrix) of the coordinates of \({\bar{u}}\) in the canonical basis set of \({\Re^{n}}\), and [W]t is the transpose of [W], where [W is a column vector (an n × 1 matrix) of the coordinates of \({\bar{w}}\) in the canonical basis of \({\Re^{n}}\). Here, \({{\bf E}^{\varvec k}}\) and \({{\bf ES}^{\varvec k}}\) denote the matrices of bilinear maps with respect to the natural basis set.

It should be remarked that non-stochastic and stochastic bilinear indices are symmetric and non-symmetric bilinear forms, respectively. Therefore, if in the following weighting scheme, M and V are used as weights to compute theses MDs, two different sets of stochastic bilinear indices, \({^{{M{\rm -}V} {\varvec s}}{\varvec b}_{\varvec k}^{\bf H}(\bar{w},\bar{u})}\) and \({^{{V{\rm -}M} {\varvec s}}{\varvec b}_{\varvec k}^{\bf H}(\bar{w},\bar{u})}\) [because \({\bar{w}_{M}\hbox{-}\bar{u}_{V} \neq \bar{w}_{V}\hbox{-}\bar{u}_{M}}\)] can be obtained and only one group of non-stochastic bilinear indices (\({{}^{M{\rm-}V {\varvec s}}{\varvec b}_{\varvec k}^{\bf H}(\bar{w},\bar{u})={}^{\rm V-M {\varvec s}}{\varvec b}_{\varvec b}^{\bf H}(\bar{w},\bar{u})}\) because in this case \({\bar{w}_{M}\hbox{-}\bar{u}_{V}=\bar{w}_{V}\hbox{-}\bar{u}_{ M})}\) can be calculated.

The local non-stochastic and stochastic bond-based bilinear indices

Finally, in addition to total bond-based quadratic indices, computed for the whole molecule, a local-fragment (bond and bond-type) formalism can be developed. These descriptors are termed local non-stochastic and stochastic bilinear indices, \({b_{kL}(\bar{w},\bar{u})}\) and \({{}^{s}b_{kL}(\bar{w},\bar{u})}\), respectively. The definition of these descriptors is as follows:

$$b_{kL} (\bar{w},\bar{u})=\sum\limits_{i=1}^m \sum\limits_{j=1}^m {{}^ke_{ijL} w^iu^j}=[{W}]^{t}{\bf E}^{\varvec k}_{\bf L}[{U}] $$
(11)
$$ {}^sb_{kL} (\bar{w},\bar{u})=\sum\limits_{i=1}^m \sum\limits_{j=1}^m {{}^kes_{ijL} w^iu^j}=[{W}]^{t}{\bf ES}^{\varvec k}_{\bf L}[{U}] $$
(12)

where, m is the number of bonds and \({{}^{k}e_{ijL} [{}^{k}es_{ijL}]}\) is the kth element of the row “i” and column “j” of the local matrix \({{\bf E}^{\varvec k}_{\bf L}[{\bf ES}^{\varvec k}_{\bf L}]}\). This matrix is extracted from the \({{\bf E}^{\varvec k}[{\bf ES}^{\varvec k}]}\) matrix and contains information referred to the edges (bonds) of the specific molecular fragments and also of the molecular environment in k steps. The matrix \({{\bf E}^{\varvec k}_{\bf L}[{\bf ES}^{\varvec k}_{\bf L}]}\) with elements \({{}^{k}e_{ijL} [{}^{k}es_{ijL}]}\) is defined as follows:

$$ \begin{aligned} {}^{k}e_{ijL }[{}^{k}{\it es}_{ijL}] &={}^{ k}e_{ij }[{}^{k}es_{ijL}] \hbox{ if }\hbox{ both }e_{i}\hbox{ and }e_{j}\hbox{ are edges (bonds) contained within the molecular fragment}\\ &=1/2 {}^{k}e_{ij}[{}^{k}es_{ijL}] \hbox{ if }e_{i}\hbox{ and }e_{j}\hbox{ are edges (bonds) contained within the molecular fragment but not both}\\ &= 0\hbox{ otherwise} \end{aligned} $$
(13)

Is important to highlight that the scheme above follows the spirit of a Mulliken population analysis [61]. It should be remarked also that for every partitioning of a molecule into Z molecular fragments there will be Z local molecular fragment matrices. In this case, if a molecule is partitioned into Z molecular fragments, the matrices \({{\bf E}^{\varvec k} [{\bf ES}^{\varvec k}]}\) can be correspondingly partitioned into Z local matrices \({{\bf E}^{\varvec k}_{\bf L}[{\bf ES}^{\varvec k}_{\bf L}]}\), \({L\,=\,1,\ldots,Z}\), and the kth power of matrix E [ES] is exactly the sum of the kth power of the local Z matrices. In this way, the total (both non-stochastic and stochastic) bond-based bilinear indices are the sum of the non-stochastic and stochastic bond-based bilinear indices, respectively, of the Z molecular fragments:

$$ b_k (\bar{w},\bar{u})=\sum\limits_{L=1}^Z {b_{kL}} (\bar{w},\bar{u}) $$
(14)
$$ {}^sb_k (\bar{w},\bar{u})=\sum\limits_{L=1}^Z {{}^sb_{kL}} (\bar{w},\bar{u}) $$
(15)

Bond and bond-type bilinear fingerprints are specific cases of local bond-based bilinear indices. The kth bond-type bilinear indices of the edge-adjacency matrix are calculated by summing up the kth bond bilinear indices for all bonds of the same type in the molecule. That is to say, this extension of the bond bilinear index is similar to group additive schemes, in which an index appears for each bond type in the molecule together with its contribution based of the bond bilinear index.

In the bond-type bilinear indices formalism, each bond in the molecule is classified into a bond-type (fragment). In this sense, bonds may be classified into bond types in terms of the characteristics of the two atoms that define the bond. For all data sets, including those with a common molecular scaffold as well as those with very diverse structure, the kth fragment (bond-type) quadratic indices provide much useful information. Thus, the development of the bond-type bilinear indices description provides the basis for application to a wider range of biological problems in which the local formalism is applicable without the need for superposition or a closely related set of structures.

It is useful to perform a calculation on a molecule to illustrate the steps in the procedure. For this, in the next section we depict a pictorial representation of the calculus of the non-stochastic and stochastic bilinear indices of the bond matrix (both total and local) using a simple chemical example.

Sample calculation

The bilinear indices of the bond matrix are calculated in the following way. Considering the molecule of 3-hydroxy-2-butenenitrile as a simple example, we have the following labeled molecular graph and bond-based adjacency matrices (E and ES). The second (k = 2) and third (k = 3) power of these matrices and bond-based molecular vector, \({\bar{w}}\), are also given:

$$\begin{array}{l} E^0=ES^0=\left[ \begin{array}{lllll} 1 & & & & \\ & 1 & & & \\ & & 1 & & \\ & & & 1 & \\ & & & & 1 \\ \end{array} \right] E^1=\left[ \begin{array}{lllll} 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ \end{array} \right] E^2=\left[ \begin{array}{lllll} 2 & 1 & 1 & 0 & 1 \\ 1 & 3 & 0 & 1 & 1 \\ 1 & 0 & 2 & 0 & 1 \\ 0 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 2 \\ \end{array} \right] E^3=\left[ \begin{array}{lllll} 2 & 4 & 1 & 1 & 3 \\ 4 & 2 & 4 & 0 & 4 \\ 1 & 4 & 0 & 2 & 1 \\ 1 & 0 & 2 & 0 & 1 \\ 3 & 4 & 1 & 1 & 2 \\ \end{array} \right] \\ \\ ES^1=\left[ \begin{array}{lllll} 0 & 0.5 & 0 & 0 & 0.5 \\ 0.33 & 0 & 0.33 & 0 & 0.33 \\ 0 & 0.5 & 0 & 0.5 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0.5 & 0.5 & 0 & 0 & 0 \\ \end{array} \right] ES^2=\left[ \begin{array}{lllll} 0.4 & 0.2 & 0.2 & 0 & 0.2 \\ 0.16 & 0.5 & 0 & 0.16 & 0.16 \\ 0.25 & 0 & 0.5 & 0 & 0.25 \\ 0 & 0.5 & 0 & 0.5 & 0 \\ 0.2 & 0.2 & 0.2 & 0 & 0.4 \\ \end{array} \right] ES^3=\left[ \begin{array}{lllll} 0.18 & 0.36 & 0.090 & 0.090 & 0.27 \\ 0.28 & 0.14 & 0.28 & 0 & 0.28 \\ 0.12 & 0.5 & 0 & 0.25 & 0.12 \\ 0.25 & 0 & 0.5 & 0 & 0.25 \\ 0.27 & 0.36 & 0.090 & 0.090 & 0.18 \\ \end{array} \right] \\ \end{array} $$

The molecule contains five localized bonds (corresponding to five edges in the H-suppressed molecular graph). To these we will associate the five “bond orbitals” \({w_{1}, w_{2}, w_{3}, w_{4}}\), and w 5. Thus, \({\bar{w}=[w_{1}, w_{2}, w_{3}, w_{4}, w_{5}] = [w_{(\rm C-C)}, w_{(\rm C=C)}, w_{(\rm C-C)}, w_{(\rm C\equiv N)}, w_{(\rm C-O)}]}\) and each “bond orbital” can be computed by Eq. 2 using, for instance, the atomic electronegativity in Pauling scale (x) [54] as atomic weight (atom-label):

$$ \begin{array}{l} w_{1}=x_{C} /1 + x_{C} /4=2.55/1 + 2.55/4=3.1875\\ w_{2}=x_{C} /4 + x_{C }/3=2.55/4 + 2.55/3=1.4875\\ w_{3}=x_{C} /3 + x_{C} /4=2.55/3 + 2.55/4=1.4875\\ w_{4}=x_{C} /4 + x_{N} /3=2.55/4 + 3.04/3=1.650833\\ w_{5}=x_{C} /4 + x_{O }/1=2.55/4 + 3.44/1=4.0775 \end{array} $$

and therefore, \({\bar{w}}\) = [3.1875, 1.4875, 1.4875, 1.650833, 4.0775].

Besides other vector, \({\bar{u}}\) must be calculated in the same way that \({\bar{w}}\), but using other property, for example the atomic masses [55] as atomic weight (atom-label):

$$ \begin{array}{l} u_{1}=y_{C} /1 + y_{C} /4=12.01/1 + 12.01/4=15.0125\\ u_{2}=y_{C} /4 + y_{C }/3=12.01/4 + 12.01/3=7.005833\\ u_{3}=y_{C} /3 + y_{C} /4=12.01/3 + 12.01/4=7.005833\\ u_{4}=y_{C} /4 + y_{N} /3=12.01/4 + 14.01/3=7.6725\\ u_{5}=y_{C} /4 + y_{O }/1=12.01/4 + 16.00/1=19.0025 \end{array} $$

and therefore, \({\bar{u}}\) = [15.0125, 7.005833, 7.005833, 7.6725, 19.0025].

Each non-stochastic and stochastic total bilinear index will have the form:

$$ \begin{aligned} {\varvec b}_{k}(\bar{w},\bar{u})=&{}^{k}e_{11}w^{1}u^{1} + {}^{k}e_{21}w^{1}u^{2}+{}^{k}e_{31}w^{1}u^{3} +{}^{k}e_{41}w^{1}u^{4}+{}^{k}e_{51}w^{1}u^{5} +{}^{k}e_{12}w^{1}u^{2}+{}^{k}e_{22}w^{2}u^{2}\\ &+{}^{k}e_{32}w^{2}u^{3}+{}^{k}e_{42}w^{2}u^{4}+ {}^{k}e_{52}w^{2}u^{5}+{}^{k}e_{13}w^{1}u^{3} + {}^{k}e_{23}w^{2}u^{3}+{}^{k}e_{33}w^{3}u^{3} + {}^{k}e_{43}w^{3}u^{4}\\ &+{}^{k}e_{53}w^{3}u^{5}+{}^{k}e_{14}w^{1}u^{4} + {}^{k}e_{24}w^{2}u^{4}+{}^{k}e_{34}w^{3}u^{4}+ {}^{k}e_{44}w^{4}u^{4}+{}^{k}e_{54}w^{4}u^{5} +{}^{k}e_{15}w^{1}u^{5}\\ &+{}^{k}e_{25}w^{2}u^{5}+{}^{k}e_{35}w^{3}u^{5}+ {}^{k}e_{45}w^{4}u^{5}+{}^{k}e_{55}w^{5}u^{5}=\sum\limits_{(i)} {}^ke_{ii} w^iu^i+2\sum\limits_{(i,j)} {{}^ke_{ij} w^iu^j} \end{aligned} $$
(16)
$$ \begin{aligned} {}^{s}{\varvec b}_{k}(\bar{w},\bar{u})=&+{}^{k}es_{11}w^{1}u^{1} +{}^{k}es_{21}w^{1}u^{2}+{}^{k}es_{31}w^{1}u^{3} +{}^{k}es_{41}w^{1}u^{4}+{}^{k}es_{51}w^{1}u^{5} +{}^{k}es_{12}w^{1}u^{2}\\ &+{}^{k}es_{22}w^{2}u^{2}+{}^{k}es_{32}w^{2}u^{3} +{}^{k}es_{42}w^{2}u^{4}+{}^{k}es_{52}w^{2}u^{5} +{}^{k}es_{13}w^{1}u^{3}+{}^{k}es_{23}w^{2}u^{3}\\ &+{}^{k}es_{33}w^{3}u^{3}+{}^{k}es_{43}w^{3}u^{4} +{}^{k}es_{53}w^{3}u^{5}+{}^{k}es_{14}w^{1}u^{4} +{}^{k}es_{24}w^{2}u^{4}+{}^{k}es_{34}w^{3}u^{4}\\ &+{}^{k}es_{44}w^{4}u^{4}+{}^{k}es_{54}w^{4}u^{5} +{}^{k}es_{15}w^{1}u^{5}+{}^{k}es_{25}w^{2}u^{5} +{}^{k}es_{35}w^{3}u^{5}+{}^{k}es_{45}w^{4}u^{5}\\ &+{}^{k}es_{55}w_{5}u_{5}=\sum\limits_{(i)} {}^kes_{ii} w^iu^j+2\sum\limits_{(i,j)} {{}^kes_{ij} w^iu^j} \end{aligned} $$
(17)

The \({{}^{k}e_{ii}}\) ’s and \({{}^{k}es_{ii}}\) ’s can be considered a measure of the attraction of an electron for a bond in the k step. The \({{}^{k}e_{ij}}\) ’s and \({{}^{k}es_{ij}}\) ’s are the terms of interaction between two bonds in the k step. The \({{}^{k}e_{ij}}\) ’s =\({{}^{k}e_{ji}}\) ’s are equal by symmetry (non-oriented molecular graph). However, \({{}^{k}es_{ij}\neq {}^{k}es_{ji}}\). This is a logical result because the kth es ij elements are the transition probabilities with the ‘electrons’ moving from bond i to j at the discrete time periods t k and it should be different in both senses. This result is in total agreement if the electronegativity of the two atom types in the bonds are taken into account.

In this way, \({{\bf E}^{\varvec k}}\) and \({{\bf ES}^{\varvec k}}\) can be seen as graph-theoretic electronic-structure models [62]. In fact, quantum chemistry starts from the fact a molecule is made up of electrons and nuclei. The distinction here between bonded and non-bonded atoms is difficult to justify. Any two nuclei of a molecule interact directly and indirectly through the electrons present in the molecule. Only the intensity of this interaction varies in going from one pair of nuclei to another. In this sense, the electron in an arbitrary bond i can move (step-by-step) to other bonds at different discrete time periods t k \({(k=0, 1, 2, 3,\ldots)}\) through the chemical-bonding network. That is to say, the \({{\bf E}^{1}}\) and \({{\bf ES}^{1}}\) matrices consider the valence-bond electrons in one step and their power \({(k=0, 1, 2, 3\ldots)}\) can be considering as an interacting-electron chemical-network model in k step. This model can be seen as an intermediate between the quantitative quantum-mechanical Schrödinger equation and classical chemical bonding ideas [62].

On the other hand, the kth (k = 0–3) non-stochastic total quadratic indices can be expressed as the sum of the local (bond) quadratic indices for this molecule as follows:

$$ \begin{aligned} {\varvec q}_{0}(\bar{w},\bar{u})=&q_{0L}(\bar{w},\bar{u}_{1}) + q_{ 0L}(\bar{w},\bar{u}_{2})+q_{ 0L}(\bar{w},\bar{u}_{3}) + q_{ 0L}(\bar{w},\bar{u}_{4})+q_{ 0L}(\bar{w},\bar{u}_{5}) = 47.85234\\ &+ 10,42118 + 10,42118 + 12,66602 + 77,48269=158,8434\\ {\varvec q}_{1}(\bar{w},\bar{u})=&q_{ 1L}(\bar{w},\bar{u}_{1}) + q_{ 1L}(\bar{w},\bar{u}_{2})+q_{1L}(\bar{w},\bar{u}_{3}) + q_{ 1L}(\bar{w},\bar{u}_{4})+q_{1L}(\bar{w},\bar{u}_{5}) = 83,22306\\ &+ 61,16852 + 21,91033 + 11,48915 + 89,30822=267,09929\\ {\varvec q}_{2}(\bar{w},\bar{u})=&q_{ 2L}(\bar{w},\bar{u}_{1}) + q_{ 2L}(\bar{w},\bar{u}_{2})+q_{ 2L}(\bar{w},\bar{u}_{3}) + q_{ 2L}(\bar{w},\bar{u}_{4})+q_{ 2L}(\bar{w},\bar{u}_{5}) = 201,2588\\ &+ 93,50003 + 71,5897 + 24,15517 + 272,6899=663,1936\\ {\varvec q}_{3}(\bar{w},\bar{u})=&q_{ 3L}(\bar{w},\bar{u}_{1}) + q_{ 3L}(\bar{w},\bar{u}_{2})+q_{ 3L}(\bar{w},\bar{u}_{3}) + q_{ 3L}(\bar{w},\bar{u}_{4})+q_{ 3L}(\bar{w},\bar{u}_{5}) = 414,6557\\ &+ 265,5164 + 115,4104 + 78,92521 + 511,0498=1385,5575 \end{aligned} $$

The terms in the summations for calculating the total quadratic indices are the so-called local (bond) quadratic indices. We have written these terms in the consecutive order of the bond labels in the graph. For instance, the non-stochastic bond quadratic indices of order 0, 1, 2 and 3 for the bond labeled as 1 are 47.85234, 83.22306, 201.2588 and 414.6557, respectively.

The kth total stochastic quadratic indices values are also the sum of the kth local (bond) stochastic quadratic indices values for all bonds in the molecule:

$$ \begin{aligned} {}^{s}{\varvec q}_{0}(\bar{w},\bar{u})=&{}^{s}q_{0L}(\bar{w},\bar{u}_{1}) + {}^{s}q_{ 0L}(\bar{w},\bar{u}_{2})+{}^{s}q_{ 0L}(\bar{w},\bar{u}_{3})+{}^{s}q_{ 0L}(\bar{w},\bar{u}_{4}) + {}^{s}q_{ 0L}(\bar{w},\bar{u}_{5}) =\\ &47,85234 + 10,42118 + 10,42118 + 12,66602 + 77,48269=158,8434\\ {\varvec q}_{1}(\bar{w},\bar{u})=&{}^{s}q_{ 1L}(\bar{w},\bar{u}_{1}) + {}^{s}q_{ 1L}(\bar{w},\bar{u}_{2})+{}^{s}q_{ 1L}(\bar{w},\bar{u}_{3})+{}^{s}q_{ 1L}(\bar{w},\bar{u}_{4}) + {}^{s}q_{ 1L}(\bar{w},\bar{u}_{5}) =\\ &39,75061 + 25,47438 + 12,93994 + 8,597788 + 42,27359=129,0363\\ {}^{s}{\varvec q}_{2}(\bar{w},\bar{u})=&{}^{s}q_{ 2L}(\bar{w},\bar{u}_{1}) + {}^{s}q_{ 2L}(\bar{w},\bar{u}_{2})+{}^{s}q_{ 2L}(\bar{w},\bar{u}_{3})+{}^{s}q_{ 2L}(\bar{w},\bar{u}_{4}) + {}^{s}q_{ 2L}(\bar{w},\bar{u}_{5}) =\\ &40,43786 + 18,32877 + 16,63249 + 10,15001 + 54,77602=140.3252\\ {}^{s}{\varvec q}_{3}(\bar{w},\bar{u})=&{}^{s}q_{ 3L}(\bar{w},\bar{u}_{1}) + {}^{s}q_{ 3L}(\bar{w},\bar{u}_{2})+{}^{s}q_{ 3L}(\bar{w},\bar{u}_{3})+{}^{s}q_{ 3L}(\bar{w},\bar{u}_{4}) + {}^{s}q_{ 3L}(\bar{w},\bar{u}_{5}) =\\ &39,15194 + 22,05334 + 13,87389 + 13,8189 + 48,32158=137,2196 \end{aligned} $$

Material and methods

TOMOCOMD-CARDD approach

The total and local (bond-type) bond-based bilinear indices were calculate by the interactive program for molecular design and bioinformatic research TOMOCOMD-CARDD [20]. The software was developed based on a user-friendly philosophy. That is to say, this computer graphics software shows a great efficiency of interaction with the user, without prior knowledge of programming skills (e.g. practicing pharmaceutic and organic chemist, teacher, university student, and so on). CARDD subprogram allows drawing the structures (drawing mode) and calculating 2D (topologic), 3D-chiral (2.5D) and 3D (geometric and topographic) non-stocahstic and stochastic MDs (calculation mode).

The main steps for the application of this method in QSAR/QSPR and for drug design can be briefly summarized as follows:

  1. 1.

    Drawing of the molecular pseudographs for each molecule in the data set, using the drawing mode.

  2. 2.

    Use appropriate weights in order to differentiate the molecular atoms. The weights used in this work are those previously proposed for the calculation of the DRAGON descriptors [5557], i.e., atomic mass (M), atomic polarizability (P), atomic Mullinken electronegativity (K) plus the van der Waals atomic volume (V). The values of these atomic labels are shown in Table 1 [5457].

  3. 3.

    Computation of the total and local (bond and bond-type) bond bilinear indices of the bond adjacency matrix can be carried out in the software calculation mode, where one can select the atomic properties and the descriptor family before calculating the molecular indices. This software generates a table in which the rows correspond to the compounds, and the columns correspond to the bond-based (both total and local) bilinear maps or other MD family implemented in this program.

  4. 4.

    Development of a QSPR/QSAR equation by using several multivariate analytical techniques, for instance, linear discrimination analysis. That is to say, one can find a quantitative relationship between an activity A and the bond-based bilinear fingerprints having, for instance, the following appearance:

    $$ {\bf A}=a_{0}{\varvec b}_{0}(\bar{w},\bar{u})+a_{1}{\bf b}_{1}(\bar{w},\bar{u}) + a_{2}{\varvec b}_{2}(\bar{w},\bar{u}) +\cdots+ a_{k}{\varvec b}_{k}(\bar{w},\bar{u}) + \hbox{c} $$
    (18)

    where A is the measured activity, \({{\varvec b}_{k}(\bar{w},\bar{u})}\) are the kth non-stochastic total bond-based bilinear indices, and the a k s are the coefficients obtained by the linear regression analysis.

  5. 5.

    Test of the robustness and predictive power of the QSPR/QSAR equation by using internal [leave-one-out (LOO)] and external (using a test set and an external predicting set) validation techniques.

The bond-based TOMOCOMD-CARDD descriptors computed in this study were the following:

  1. (1)

    kth (k = 15) total non-stochastic bond-based bilinear indices not considering and considering H-atoms in the molecular graph (G) [\({{\varvec b}_{\varvec b}(\bar{w},\bar{u})}\) and \({{\varvec b}_{\varvec k}^{ H}(\bar{w},\bar{u})}\), respectively].

  2. (2)

    kth (k = 15) total stochastic bond-based bilinear indices not considering and considering H-atoms in the molecular graph (G) [\({{}^{\varvec s}{\varvec b}_{\varvec b}(\bar{w},\bar{u})}\) and \({{}^{\varvec s}{\varvec b}_{\varvec b}^{ H}(\bar{w},\bar{u})}\), respectively].

  3. (3)

    kth (k = 15) bond-type local (group = heteroatoms: S, N, O) non-stochastic bilinear indices not considering and considering H-atoms in the molecular graph (G) [\({{\varvec b}_{{\varvec k}{ L}}(\bar{w}_E ,\bar{u}_E)}\) and \({{\varvec b}_{{\varvec k}{ L}}^{ H}(\bar{w}_E ,\bar{u}_E)}\), correspondingly]. These local descriptors are putative molecular charge, dipole moment, and H-bonding acceptors.

  4. (4)

    kth (k = 15) bond-type local (group = heteroatoms: S, N, O) stochastic bilinear indices not considering and considering H-atoms in the molecular graph (G) [\({{}^{\varvec s}{\varvec b}_{{\varvec b}{ L}}(\bar{w}_E ,\bar{u}_E)}\), and \({{}^{\varvec s}{\varvec b}_{{\varvec b}{ L}}^{ H}(\bar{w}_E ,\bar{u}_E)}\), correspondingly]. These local descriptors are putative molecular charge, dipole moment, and H-bonding acceptors.

Database construction

The database collected to our study of tyrosinase inhibitory activity consists of 685 compounds in total. The active compounds inside this set were of 246, having reported activity against the enzyme tyrosinase. The rest, 412 organic-chemicals were chosen as inactive compounds. In both cases (active and inactive ones) we consider the structural molecular variability as important goal to assure the quality of our QSAR study.

In the case of tyrosinase inhibitor compounds (actives) many different subsystems were included. An example of the most representative tyrosinase reference drugs is illustrates in Fig. 1, together with some tyrosinase inhibitors of different families.

Fig. 1
figure 1

Random, but not exhaustive, sample of the molecular families of tyrosinase inhibitors studied here and some reference drugs

The names of compounds in the active database together with their experimental data taken from the literature are shown in Table 1 of Supporting Information. In the same way, we depict in Table 2 (Supporting Information) the molecular structures of these 246 tyrosinase inhibitors. This dataset provides a helpful tool for scientific research in many chemistry fields related with the tyrosinase enzyme and its inhibitors.

Table 2 Main results of the k-MCAs, for tyrosinase inhibitors and inactives drug-like compounds

By other way, the rest 412 compounds having different pharmacological uses were selected for the inactive set. All these chemicals were taken from the Negwer Handbook [63], where their names, synonyms and structural formulas can be found.

Statistical techniques

The STATISTICA software [64] was used to develop the different statistical methods used in this report. In first place we employed the cluster analysis as a method that recognizes similarities among cases and it contains them according to these criteria [65]. In our case k-MCA (k-means cluster analysis) and k-NNCA (k-nearest neighbors cluster analysis) algorithms were used to design the training and prediction series [6467]. The dendrograms were obtained using the Euclidean distance (X-axis) and the complete linkage (Y-axis), and show the distance between the compounds inside the clusters, that are grouped according to its chemical similarity encoded by the MDs used as variables. Linear Discriminant Analysis (LDA) a simple and very useful technique in drug design was carried out to find the QSAR models [13, 16, 17, 19, 34, 35, 37, 38, 4247, 6873]. Here, the forward stepwise procedure was fixed as the strategy for variable selection and taken into account the principle of parsimony (Occam’s razor) for model selection.

The classification of cases was carried out by mean of posterior classification probabilities. Tyrosinase inhibitory activity was codified by a dummy variable “Class”. This variable indicates the presence of either an active compound (Class = 1) or an inactive compound (Class = −1). By using the models, one compound can then be classified as active, if \({\Delta P\% > 0}\), being \({\Delta P\%=[P\hbox{(Active)} - P\hbox{(Inactive)}]\times 100}\), or as inactive otherwise. P (Active) and P (Inactive) are the probabilities with which the equations classify a compound as active or inactive, respectively.

The Randić’s method of orthogonalization was used in this study as a way to avoid the interrelation among the molecular fingerprints [45, 7479]. This may possible a better statistical interpretation of the correlation coefficient and to evaluate the role of individual MDs in the QSAR model.

The data set was standardized before the orthogonalization process, because the different MDs included here used entirely “different types of scales”. This process to proportionate each variable has a mean of 0 and a standard deviation of 1.

Experimental methods

The synthesis and characterization of the 24 tetraketones, their biological studies and cross references have been reported by other of our research team [80].

Tyrosinase inhibition assay was performed with kojic acid and l-mimosine as standard inhibitors for the tyrosinase in a 96-well microplate format using a SpectraMax 340 micro-plate reader (Molecular Devices, CA, USA) according to the method developed by Hearing [81]. Briefly, the compounds were first screened for the o-diphenolase inhibitory activity of tyrosinase using l-DOPA as substrate. All the active inhibitors from the preliminary screening were subjected to IC50 studies. Compounds were dissolved in methanol to a concentration of 2.5%. Thirty units of mushroom tyrosinase (28 nM from Sigma Chemical Co., USA) were first preincubated with the test compounds in 50 nM Na-phosphate buffer (pH 6.8) for 10 min at 25 °C. Then the l-DOPA (0.5 mM) was added to the reaction mixture and the enzymatic reaction was monitored by measuring the change in absorbance at 475 nm (at 37 °C) due to the formation of the DOPAchrome for 10 min. The percentage of inhibition of the enzyme was calculated as follows, by using MS Excel\({^{\rm \circledR TM}}\) 2000 (Microsoft Corp., USA) based program developed for this purpose:

$$ \hbox{Percent inhibition}=[({B}-{S})/{B}]\times 100 $$
(19)

Here, B and S are the absorbances for the blank and samples, respectively. After the screening of the compounds, 50% inhibitory concentrations (IC50) were also calculated. Kojic acid and l-mimosine were used as standard inhibitors for the tyrosinase and both of them were purchased from Sigma Chem. Co., USA.

Results and discussion

Dividing the training and prediction series through cluster analysis

In above section we describe the database selection process, now the structural variability of such set must be proved. This is a crucial aspect in any QSAR study in order to explain its reliability. Following this main reason, different cluster analysis techniques were carried out. In first place was used a k-NNCA to prove the structural diversity in the families presented in the data. Two dendrograms, one for the active compounds series and other for the inactive ones, were obtained through hierarchical cluster analysis (Figs. 2, 3) were can be observed different structural patterns which demonstrate the chemical variability of the database.

Fig. 2
figure 2

A dendrogram illustrating the results of the hierarchical k-NNCA of the set of tyrosinase inhibitors used in the training and prediction set of the present work

Fig. 3
figure 3

A dendrogram illustrating the results of the hierarchical k-NNCA of the set of inactive compounds (non-tyrosinase inhibitors) used in the training and prediction set of the present work

Now the dataset should be partitioned in training and prediction sets, to find the discriminant functions, but due to the difficulty of evaluating the output dendrograms other kind of CA must be do it, for the selection of compounds in a ‘rational’ way.

Therefore we chose the k-MCA to solve this problem, and were applied to active and inactive subsets. The first k-MCA for tyrosinase inhibitors divide this set into 10 clusters. On other hand the k-MCA II split the inactive set into 12 clusters. The variables used were the kth non-stochastic bond-bilinear indices, and the analyses of variance for these k-MCAs are depicted in Table 2.

The following process using the cluster analysis techniques to divide entire database in training and prediction series is shown in shown in Fig. 4. How can be observed in the same diagram there are 183 active compounds and 295 inactive ones belonging to training set (478 organic-chemicals). The prediction series of 180 compounds have 63 tyrosinase inhibitors and 117 non-inhibitors of tyrosinase.

Fig. 4
figure 4

General algorithm used to design training and test sets throughout k-MCA

Developing the discriminant functions

The representative selection of training set permit continues to the next step, the finding of the classification functions to discriminate between active and inactive. For this we select the LDA as statistical technique due to it’s broadly use and simplicity.

In total were obtained fourteen models, the first six models developed with the non-stochastic bond-based bilinear indices and the other first six perform with the stochastic molecular descriptors, these equations are depicted Table 3. Besides, below we shown the Eqs. 32 and 33 of the last seven models in both cases (non-stochastic and stochastic molecular fingerprints) resulting in a combination of all pairs of atom weights (atomic labels):

$$ \begin{aligned} {\bf Class}= &-0.636 -8.422\times 10^{-2 {MP}}{\varvec b}_{0L}^{ H}(\bar{w}_E ,\bar{u}_E ) +0.107^{ MP}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E )\\ &+1.792\times 10^{-2 { MK}}{\varvec b}_{ 1L}^{ H}(\bar{w}_E ,\bar{u}_E ) -2.373\times 10^{-2 { MK}}{\varvec b}_{ 1L}(\bar{w}_E ,\bar{u}_E ) +3.287\times 10^{-5 { VP}}{\varvec b}_{5}^{ H}(\bar{w},\bar{u})\\ &-9.590\times 10^{-2 { VP}}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E ) +1.166\times 10^{-2 { VP}}{\varvec b}_{ 1L}(\bar{w}_E ,\bar{u}_E ) +2.277\times 10^{-2 {VK}}{\varvec b}_{0}^{ H}(\bar{w},\bar{u})\\ &+5.4\times 10^{-3 { VK}}{\varvec b}_{1}^{ H}(\bar{w},\bar{u}) -4.04\times 10^{-3 { VK}}{\varvec b}_{2}^{ H}(\bar{w},\bar{u}) +2.34\times 10^{-2 { VK}}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E ) \end{aligned} $$
(32)

N = 478  λ = 0.45  D 2 = 5.13  F = 51.6  Canonical R = 0.74  χ 2 = 374.8  Q Total  = 91.00%  C = 0.81

$$ \begin{aligned} {\bf Class}=& -0.302 +5.290\times 10^{-3 { MV}}{\varvec b}_{5L}^{ H}(\bar{w}_E ,\bar{u}_E ) +6.267\times 10^{-3 {MP}}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E)\\ &+1.262\times 10^{-2 { MK}}{\varvec b}_{0}(\bar{w},\bar{u}) -3.458\times 10^{-2 { MK}}{\varvec b}_{ 0L}^{ H}(\bar{w}_E ,\bar{u}_E) -1.734\times 10^{-2 { VP}}{\varvec b}_{0}(\bar{w},\bar{u})\\ &+1.286\times 10^{-2 { VP}}{\varvec b}_{ 14L}(\bar{w},\bar{u}) -4.840\times 10^{-2 { VP}}{\varvec b}_{ 4L}(\bar{w}_E ,\bar{u}_E) +0.129^{ VK}{\varvec b}_{ 2L}^{ H}(\bar{w}_E ,\bar{u}_E)\\ &-0.133^{ VK}{\varvec b}_{ 3L}^{ H}(\bar{w}_E ,\bar{u}_E ) +1.067\times 10^{-2 { VK}}b_{ 0L}(\bar{w}_E ,\bar{u}_E) \end{aligned} $$
(33)

N = 478  λ = 0.46  D 2 = 5.00  F = 55.4 Canonical R = 0.74  χ 2 = 368.5  Q Total = 90.17%  C = 0.79

Table 3 Discriminant models obtained with total and local non-stochastic and stochastic bond-based bilinear indices used in this study

Prediction performances of all the obtained models including these last two equations are given in Table 4, together with the Wilks’ statistics (λ), the square of the Mahalanobis distances (D 2), and the Fisher ratio (F). The models selected showed to be statistically significant at p-level <0.0001.

Table 4 Prediction performances and statistical parameters for LDA-based QSAR models in the training set

The fitted models 32 and 33, resulting of the combination of weighting schemes for the non-stochastic and stochastic bond-level bilinear indices, respectively, exhibit the best results, how can be observed in Table 4. These best two equations correctly classified the 91.00% and 90.17% of the training set, and showed values of the Matthews correlation coefficients (C) of 0.81 and 0.79, respectively. The most common parameters in medical statistics for all the models are depicted in the same Table 4.

Although these two best models exhibited good results, the interpretation of the individual role of every index in the model can become in a difficulty due to the interrelation among them (data not shown). This impelled us to use the Randić’s orthogonalization process to avoid this problem, and eliminate the collinearity between the variables [7478].

In Eqs. 34 and 35 are depicted the results of the orthogonalization process for the best two models of the non-stochastic and stochastic bilinear indices, correspondingly.

$$ \begin{aligned} {\bf Class} =&-0.331 -1.515^{1}O(^{ VP}{\varvec b}_{ 1L}^{ H}(\bar{w}_E ,\bar{u}_E )) +2.037^{2}O(^{ VK}{\varvec b}_{1}^{ H}(\bar{w},\bar{u})) 2.406^{3}O(^{ VK}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E ))\\ &-3.176^{4}O(^{ VK}{\varvec b}_{0}^{ H}(\bar{w},\bar{u})) +0.805^{5}O(^{ VP}{\varvec b}_{5}^{ H}(\bar{w},\bar{u})) -6.540^{6}O(^{ VK}{\varvec b}_{2}^{ H}(\bar{w},\bar{u}))\\ &-1.959^{7}O(^{ VP}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E )) -1.015^{8}O(^{ MK}{\varvec b}_{ 1L}^{ H}(\bar{w}_E ,\bar{u}_E )) +1.913^{9}O(^{ MP}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E ))\\ &-5.997^{10}O(^{ MK}{\varvec b}_{ 1L}(\bar{w}_E ,\bar{u}_E )) -16.337^{11}O(^{ MP}{\varvec b}_{ 0L}^{ H}(\bar{w}_E ,\bar{u}_E )) \end{aligned} $$
(34)

N = 478  λ = 0.45  D 2 = 5.13 F = 51.6 Canonical R = 0.74 χ 2 = 374.8 Q Total = 91.00% C = 0.81.

$$ \begin{aligned} {\bf Class} =& -2.414\times 10 ^{-2} -1.197 ^{1}O(^{VP}{\varvec b}_{4L}(\bar{w}_E ,\bar{u}_E )) +4.346 ^{2}O(^{ VK}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E ))\\ &+1.075 ^{3}O(^{ VP}{\varvec b}_{14}^{ H}(\bar{w},\bar{u})) -3.196 ^{4}O(^{ VP}{\varvec b}_{0}(\bar{w},\bar{u})) +0.899 ^{5}O(^{ MP}{\varvec b}_{ 0L}(\bar{w}_E ,\bar{u}_E )) \\ &-1.197 ^{6}O(^{ MK}{\varvec b}_{ 0L}^{ H}(\bar{w}_E ,\bar{u}_E )) +5.474 ^{7}O(^{ MK}{\varvec b}_{0}(\bar{w},\bar{u})) +2.093 ^{8}O(^{ VK}{\varvec b}_{ 2L}^{ H}(\bar{w}_E ,\bar{u}_E )) \\ &-7.371 ^{9}O(^{ VK}{\varvec b}_{ 3L}^{ H}(\bar{w}_E ,\bar{u}_E )) +3.461 ^{10}O(^{ MV}{\varvec b}_{ 5L}^{ H}(\bar{w}_E ,\bar{u}_E )) \end{aligned} $$
(35)

N = 478 λ = 0.46 D 2 = 5.00  F = 55.4  Canonical R = 0.74 χ 2 = 368.5  Q Total = 90.17% C = 0.79. Here, we used the symbols \({{}^{m}O[{\varvec b}_{k}(\bar{w},\bar{u})]}\), where the superscript m expresses the order of importance of the variable \({{\varvec b}_{k}(\bar{w},\bar{u})}\) after a preliminary forward-stepwise analysis and O means orthogonal. If we take a look to the statistical parameter to every model before and after of the orthogonalization process, can be observed that they keep be the same for the non-orthogonal and orthogonal descriptors.

Assessing the predictive power of the models

Validation external process or most commonly namely test set is necessary to ensure the quality and extrapolation power of the QSAR models found in this report [82, 83]. Following this aim all the equations were evaluated and results are shown in Table 5. In the case of the best two discriminant functions Eqs. 32 and 33, presented overall accuracies of 93.33% (C = 0.85) and 88.89% (C = 0.77), respectively. Likewise a plot of the ΔP% for the entire dataset using models (32) and (33), is illustrates in Figs. 5 and 6.

Table 5 Prediction performances for LDA-based QSAR models in the test set
Fig. 5
figure 5

Plot of the ΔP% from Eq. 32 (using non-stochastic bond-based bilinear indices) for each compound in the training and test sets. Compounds 1–183 and 184–246 are active (tyrosinase inhibitors) in training and test sets, respectively; chemicals 247–541 and 542–658 are inactive (non-tyrosinase inhibitors) in both training and test sets, correspondingly

Fig. 6
figure 6

Plot of the ΔP% from Eq. 33 (using stochastic bond-based bilinear indices) for each compound in the training and test sets. Compounds 1–183 and 184–246 are active (tyrosinase inhibitors) in training and test sets, respectively; chemicals 247–541 and 542–658 are inactive (non-tyrosinase inhibitors) in both training and test sets, correspondingly

The results of the classification using the total fourteen models, for all the active and inactive organic-chemicals in the training and external series are shown in Tables 3–10 of Supporting Information.

Simulated virtual screening of new tyrosinase inhibitors

The good behavior of the results obtained above, encouraged us, to expand moreover the possibilities of this novel approach for the in silico discovery of novel tyrosinase inhibitor compounds. Virtual High Throughput Screening (HTS) can become an important tool capable to resolve the largely query of database of thousands of compounds, and has the potential to transform early-stage drug discovery. To prove the ability of our models a simulated virtual screening to a data of 104 organic-chemicals (Table 6) reported from the literature as inactive/inactive (see the last column of the same Table 6: Ref) was carried out. The molecular structures of these compounds are shown in Table 11 of Supporting Information.

Table 6 Results of the virtual screening

Besides to assure the great possibility of our models to identify several classes of compounds a k-NNCA to this data was carried and the dendrogram obtained can be observed in Fig. 7, where a great molecular diversity can be visualized. In Table 6 are depicted the results of the classifications of the 104 compounds. Additionally the posterior classification probabilities (including canonical scores) for all the equations are summarized in Table 12 (Supporting Information). The percent of globally good classification were of 85.57% and 84.61% for the non-stochastic and stochastic molecular descriptors, correspondingly.

Fig. 7
figure 7

A dendrogram illustrating the results of the hierarchical k-NNCA of the set of active/inactive chemicals used for evaluating the predictive ability of the QSAR models for ligand-based virtual screening

This method could very useful, due that making use of this many great databases of drug-like compounds could be make it, and some compounds identified reported with the new biological activity, also taken into account that this kind of chemicals have well-established methods of synthesis, as well as their toxicological, pharmacodynamical and pharmaceutical properties are well known.

Biosilico identification of novel tyrosinase inhibitors and experimental corroboration

The entire algorithm describes in the above sections, was make up with the main objective to explore the possibilities of the current in silico approach for the identification of hits from largely databases. In this sense an in silico screening of novel compounds looking for the biological activity concern to this work was performed. To make this, a pool of compounds never described in the literature as tyrosinase inhibitors was chosen. Later the in silico essays were done using all the models developed inside this report, to find bioactive chemicals that present tyrosinase inhibitory activity.

Here, 24 tetraketones were evaluated with the LDA-based QSAR models, and the in vitro assays of the synthesized compounds were done to corroborate the in silico predictions. The values of the posterior classification probabilities (ΔP%) obtained with all the equations for the data are shown in Table 7. Hence here we can see that exits a good concordance among the theoretical predictions and the experimental results for all the organic-chemicals, and all were active against the tyrosinase enzyme in the in vitro assays. It is important to stand out that the majority of compounds showed values of activity higher than Kojic acid (standard tyrosinase inhibitor: IC50 = 16.67 μM) with the exception of TK2 (IC50 = 26.63 μM), TK4 (IC50 = 16.99μM), TK7 (IC50 = 19.73μM) and TK19 (IC50 = 71.47 μM). By other way, four chemicals TK10 (IC50 = 2.09 μM), TK11 (IC50 = 2.61 μM), TK21 (IC50 = 2.06 μM), TK23 (IC50 = 3.19 μM), exhibited more potent activity compared with l-mimosine (IC50 = 3.68 μM) a reference drug. In Table 8 are depicted the molecular structures of these tetraketones and the rest used in this study.

Table 7 Results of ligand-based in silico screening and tyrosinase inhibitory activities of new tetraketones
Table 8 Molecular structure of the new tetraketones

As a final point, a hierarchical cluster analysis was performed for all the active compounds of the training, test, virtual screening and the new tetraketones (Fig. 8). The aim of this k-NNCA was compared if there was any similarity between the novel bioactive chemicals and some subsystems in the rest of the active database. After an exhaustive analysis to each cluster, we observed that these tetraketones were distributed in many clusters, which is reasonable because this class of compounds don’t have common structural features with none of the compounds in the active database. Therefore, they can be selected to make a structural optimization with the objective to find a more potent tyrosinase inhibitory activity, and afterward a complete study of ADMET properties should be carried out to entering these organic-chemicals discovered into the pipeline of the drug market development.

Fig. 8
figure 8

A dendrogram illustrating the results of the hierarchical k-NNCA of the set of all active chemicals (tyrosinase inhibitors) included in training, test, virtual screening and new active tetraketones discovery in the present work

Summary and outlook

Many studies in the area of tyrosinase inhibitory activity are involved to finding novel inhibitors from different sources, due to its wide applications as food additives, depigmentation agents, in the treatment of melanogenesis disorders, to control insect pests and so on. The interest of pharmaceutical, cosmetic, and agricultural sciences in this kind of chemicals is referred to its broad spectrum of applications, and wide distribution through all the phylogenetic scale.

The advent of virtual High Through Screening (vHTS) encompassing in silico techniques in the drug discovery, are solutions that enable research to proceed faster and more efficiently. These new algorithms starting from the convergence of information technology and drug discovery, can be useful to resolve the question of accelerate the pace of drug discovery in the identification of higher quality compounds. Nevertheless, in this case, the process of searching of new tyrosinase inhibitor compounds until now is through trial-error traditional methods [84, 85].

Taken all these into consideration, we made use of the non-stochastic and stochastic bond-based bilinear indices, a new set of MDs, together with pattern recognition techniques to discriminate active compounds from inactive ones. QSAR models found here were used in a virtual screening to arising from the in silico to ‘real’ world applications. Besides, is reported the biosilico identification of a novel tetraketone family as tyrosinase inhibitors using the new molecular fingerprints. The experimental in vitro assays were also carried out to prove the usefulness of the TOMOCOMD-CARDD descriptors for the rational design of new bioactive agents.

These kinds of works are in the light of new challenges for the pharmaceutical industries because a research in modern drug discovery needs training and experience in multiple life science domain areas as well as in computer science [86]. Finally, the present report could permit us to look forward to many exciting new insights in the field of tyrosinase inhibitor compounds research for the treatment of hyperpigmentation and melanogenesis disorders in the years ahead.

Supporting information available

The complete list of compounds used in training and prediction sets, as well as their structures, posterior classification and scores according to LDA-based QSAR models, chemistry and data analysis of the obtained chemicals is available free of charge via Internet at...