Abstract
A methodology is proposed to build statistical test procedures pertaining to models with incomplete information; the lack of information corresponds to a nuisance parameter in the description of the model. The supremal approach based on the dual representation of CASM divergences (or \(f-\)divergences) is fruitful; it leads to M-estimators with simple and standard limit distribution, and it is versatile with respect to the choice of the divergence. Duality approaches to divergence-based optimisation are widely considered in statistics, data analysis and machine learning: indeed, they avoid any smoothing or grouping technique which would be necessary for a more direct divergence minimisation approach for the same problem.
We are interested in a widely considered but still open problem which consists in testing the number of components in a parametric mixture. Although common, this is still a challenging problem since the corresponding model is non-regular particularly because of the true parameter lying on the boundary of the parameter space. This range of problems has been considered by many authors who tried to derive the asymptotic distribution of some statistic under boundary conditions. The present approach based on supremal divergence M-estimators makes the true parameter an interior point of the parameter space, providing a simple solution for a difficult question. To build a composite test, we aggregate simple tests.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Keywords
- Non-regular models
- Dual form of f-divergences
- Statistical test aggregation
- Number of components in mixture models
1 Dual Representation of the \(\varphi \)-Divergences and Tests
We consider CASM divergences (see [15] for definitions and properties):
where Q and P are probability measures on the same probability space. Extensions to divergences between probability measures and signed measures can be found in [22]. Dual formulations of divergences can be found in [7, 16]. Another interpretation of these formulations can be found in [9, Section 4.6]. They are widely considered in statistics, data analysis and machine learning (see e.g. [4, 20]).
As in [1], let \(\mathcal F\) be some class of \(\mathcal B\)-measurable (borelian) real valued functions and let \(\mathcal M_{\mathcal F} = \{P\in \mathcal M : \int |f| dP < \infty , \forall f\in \mathcal F\}\) where \(\mathcal M\) is the space of probability measures. Let any \(P^*\in \mathcal M\), which shall be the underlying true unknown probability law in a statistical context in the following sections. Assume that \(\varphi \) is differentiable and strictly convex. Then, for all \(P\in \mathcal M_{\mathcal F}\) such that \(D_\varphi (P,P^*)\) is finite and \(\varphi '(dP/dP^*)\) belongs to \(\mathcal F\), \(D_\varphi \) admits the dual representation (see Theorem 4.4 in [6]):
where \(\varphi ^\#(x) = \sup _{t\in \mathbb R} tx-\varphi (t)\) is the Fenchel-Legendre convex conjugate. Moreover, the supremum is uniquely attained at \(f = \varphi '(dP/dP^*)\).
This result can be used in two directions. First, a statistical model, e.g. a parametrical model \(\{P_\theta : \theta \in \varTheta \}\) with \(P_\theta \) is absolutely continuous with respect to some dominating measure \(\mu \) for any \(\theta \), naturally induces a family \(\mathcal F = \{\varphi '(p_\theta /p_{\theta '}) : \theta ,\theta '\in \varTheta \}\). This is the main framework of this paper.
Conversely, a class of functions \(\mathcal F\) defines the distribution pairs P and Q that can be compared, which are these such that \(\varphi '(dP/dQ)\in \mathcal F\). Furthermore it induces a divergence \(D_\varphi \) on these pairs. A typical example is the logistic model.
The KLm divergence is defined by the generator \(\varphi : x\in \mathbb R \mapsto -log x + x -1\) and leads to the maximum likelihood estimator for both forms of estimation for the supremal estimator, once of which is defined bellow (see Remark 3.2 in [7]).
We consider in this paper the problem of testing the number of components in a mixture model. This question has been considered by various authors. [2, 10, 12, 14, 17] have considered likelihood ratio tests and showed some difficulties with those due to the fact that the likelihood ratio statistic is unbounded with respect to n. [17] prove that its distribution is driven by a \(\log \log n\) term in a specific simple Gaussian mixture model. The test statistic needs to be calibrated in accordance with this result. But first, as stated by [17], the convergence to the limit distribution is extremely slow, making this result unpractical. And second, it seems very difficult to derive the corresponding term for a different model, and even more so for a general situation.
Our approach to this problem is suggested by the dual representation of the divergence. For the KLm divergence, it amounts to considering the maximum likelihood estimator itself as a test statistic instead of the usual maximum value of the likelihood function. This leads to a well-defined limiting distribution for the test statistic under the null. This holds for a class of estimators obtained by substituting KLm by any regular divergence. This approach also eliminates the curse of irregularity encountered by many authors for the problem of testing the number of components in a mixture.
Since we are interested in composite hypotheses, there is no justification in this context that the likelihood ratio test would be the best (in terms of uniform power) as is usually considered (e.g. [8, 18]) and [13] showed what difficulties likelihood ratio tests can encounter in this context.
[8] considered tests based on an estimation of the minimum divergence between the true distribution and the null model. In we make use of the unicity of the optimiser of the dual representation of the divergence in (1) and of the supremal divergence estimator introduced by [7]. An immediate practical advantage of this choice as compared to estimating the minimum divergence is that one less optimisation is needed. Moreover [23] showed that this estimator is robust for several choices of the divergence.
Our procedure for composite hypotheses consists in the aggregation of simple tests in the spirit of [11]. [5] used a similar aggregation procedure for testing between two distributions under noisy data and obtained some control of the resulting test power.
2 Notation and Hypotheses
Let \(\{f_1(\,.\,;\theta _1):\theta _1\in \varTheta _1\}\), \(\varTheta _1\subset \mathbb R^p\), and \(\{f_2(\,.\,;\theta _2):\theta _2\in \varTheta _2\}\), \(\varTheta _2\subset \mathbb R^q\), be probability density families with respect to a \(\sigma \)-finite measure \(\lambda \) on \((\mathcal X, \mathcal B)\). For some fixed open interval \(]a,b[ \ni 0\), let \(\varTheta \subset ]a,b[ \times \varTheta _1 \times \varTheta _2\), and
for any \((\pi ,\theta )\in \varTheta \) with \(\theta = (\theta _1,\theta _2)\).
Assume that \(x_1,\dots ,x_n\in \mathbb R\) have been observed and they are modelled as a realisation of the i.i.d. sample \(X_1,\dots ,X_n\) which distribution \(\mathbb P^* := g_{\pi ^*,\theta ^*}.\lambda \) is known up to the parameters \((\pi ^*,\theta ^*)\in \varTheta \). Our aim is to test the hypothesis \(H_0 : \pi ^* = 0\).
Assume that \(g_{\pi , \theta } = g_{\pi ^*, \theta ^*} \Rightarrow \pi = \pi ^*, \theta _1 = \theta _1^* \text { and, if } \pi ^* \ne 0 \text {, } \theta _2 = \theta _2^*\).
Let g be a probability density with respect to \(\lambda \) such that \(Supp(g) \subset Supp(g_{\pi ,\theta })\) for any \((\pi ,\theta )\in \varTheta \) such that
Let us define for any \((\pi ,\theta ) \in \varTheta \),
and assume that \((\pi ,\theta )\mapsto m_{\pi ,\theta }(x)\) is continuous for any \(x\in \mathcal X \). Let us also assume that
where \(d(\cdot ,\cdot )\) denotes the Euclidean distance and where, as usual, the operator-type notation \(\mathbb P^*Y\) denotes the expectation—with respect to the probability measure \(\mathbb P^*\)—of the random variable Y.
Theorem 1
For any \((\pi ^*,\theta ^*)\in \varTheta \)
which we call the supremal form of the divergence. Moreover attainment holds uniquely at \((\pi ,\theta ) = (\pi ^*,\theta ^*)\).
Definition 1
Let \(\mathbb {P}_{n}\) denote the empirical measure pertaining to the sample \(X_{1},\dots ,X_{n}\). Define
the supremal estimator of \(\left( \pi ^{*},\theta ^{*}\right) \).
The existence of \(( \hat{\pi },\hat{\theta })\) can be guaranteed by assuming that \(\varTheta \) is compact. When uniqueness does not hold, consider any maximizer. This class of estimators has been introduced in [7], under the name .
3 Consistency of the Supremal Divergence Estimator
Let us first state the consistency of the supremal divergence estimator of the proportion and the parameters of the existing component, when the non-existing component parameters are fixed, uniformly over the latter.
Here and below, by abuse of notation, we let \(\varphi '\bigl (\frac{g}{g_{\pi ,\theta }}\bigr )\) stand for \(x\mapsto \varphi '\bigl (\frac{g(x)}{g_{\pi ,\theta }(x)}\bigr )\), and so on.
Remark that, for \(\pi ^* = 0\) and any \(\theta _1^*\in \varTheta _1\) and \(\theta _2\in \varTheta _2\), we can unambiguously write \(m_{\pi ^*,\theta _1^*}\) for \(m_{\pi ^*,\theta _1^*,\theta _2}\) since the parameter \(\theta _2\) is not involved in the expression of \(m_{0,\theta _1^*,\theta _2}\).
Theorem 2
Assume that \(\pi ^* = 0\) and let for any \(\theta _2\in \varTheta _2\), \((\hat{\pi }(\theta _2),\hat{\theta }_1(\theta _2))\in ]a,b[ \times \varTheta _1\) such that
Then
The convergence holds a.s. in the particular case of (2) when, a.s.,
4 Asymptotic Distribution of the Supremal Divergence Estimator
Under \(H_0\) (\(\pi ^* = 0\)), the joint asymptotic distribution of \((\hat{\pi }(\theta _2),\hat{\pi }(\theta _2'))\) is provided by the following theorem. The interior of \(\varTheta \) will be denoted by \(\mathring{\varTheta }\).
Theorem 3
Let \(\theta _2\in \varTheta _2\) and \(\theta _2'\in \varTheta _2\) such that \((\pi ^*,\theta _1^*,\theta _2)\in \mathring{\varTheta }\) and \((\pi ^*,\theta _1^*,\theta _2')\in \mathring{\varTheta }\). Write
and let \((\hat{\pi },\hat{\theta }_1)\) and \((\hat{\pi }',\hat{\theta }_1')\) be such that
Assume that \(\pi ^* = 0\).
Moreover, assume that :
-
\((\pi ,\theta _1)\in \varTheta (\theta _2)\mapsto m_{\pi ,\theta _1,\theta _2}(x)\) (resp. \((\pi ,\theta _1)\in \varTheta (\theta _2')\mapsto m_{\pi ,\theta _1,\theta _2'}(x)\)) is differentiable \(\lambda \)-a.e. with derivative \(\psi _{\pi ,\theta _1} = \begin{pmatrix} \frac{\partial }{\partial \pi }m_{\pi ,\theta _1,\theta _2} \\ \frac{\partial }{\partial \theta _1}m_{\pi ,\theta _1,\theta _2} \end{pmatrix}_{|(\pi ,\theta _1)}\) (resp. \(\psi _{\pi ,\theta _1}' = \begin{pmatrix} \frac{\partial }{\partial \pi }m_{\pi ,\theta _1,\theta _2'} \\ \frac{\partial }{\partial \theta _1}m_{\pi ,\theta _1,\theta _2'} \end{pmatrix}_{|(\pi ,\theta _1)}\)) such that \(P^* \psi _{\pi ^*,\theta _1^*} = 0\) (resp. \(P^* \psi _{\pi ^*,\theta _1^*}' = 0\)).
-
\((\pi ,\theta _1)\in \varTheta (\theta _2) \mapsto P^* \psi _{\pi ,\theta _1}\) (resp. \((\pi ,\theta _1)\in \varTheta (\theta _2') \mapsto P^* \psi _{\pi ,\theta _1}'\)) is differentiable at \({\pi ^*,\theta _1^*}\) with invertible derivative matrix \(H = D{\left( P^*\psi \right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } \) (resp. \(H' = D{\left( P^*\psi '\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } \)).
-
\(\{\psi _{\pi ,\theta _1} : (\pi ,\theta _1)\in \varTheta (\theta _2) \}\) and \(\{\psi _{\pi ,\theta _1}' : (\pi ,\theta _1)\in \varTheta (\theta _2') \}\) are \(P^*\)-Donsker.
-
\(\int (\psi _{\hat{\pi },\hat{\theta }_1}(x) - \psi _{\pi ^*,\theta _1^*}(x))^2 dP^*(x) \xrightarrow []{P^*}0\) and \(\int (\psi _{\hat{\pi },\hat{\theta }_1}'(x) - \psi _{\pi ^*,\theta _1^*}'(x))^2 dP^*(x) \xrightarrow []{P^*}0\).
Assume that \(H {=} D{\left( P^*\psi \right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } {=} P^*D^2{\left( h\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } \) (resp. \(H' {=} D{\left( P^*\psi '\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } {=}\) \( P^*D^2{\left( h'\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } \)) with \(P^* |D^2{\left( h\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } | < \infty \) (resp. \(P^* |D^2{\left( h'\right) }_{\left| {\tiny \begin{pmatrix} \pi ^*\\ \theta _1^*\end{pmatrix}}\right. } | < \infty \)) where \(h : (\pi ,\theta _1)\in \varTheta (\theta _2)\mapsto m_{\pi ,\theta _1,\theta _2}(x)\) (resp. \(h' : (\pi ,\theta _1)\in \varTheta (\theta _2')\mapsto m_{\pi ,\theta _1,\theta _2'}(x)\)).
Then with \(a_n\) (resp. \(a_n'\)) being the (1, 1)-entry of the matrix \(H_n^{-1} \cdot \bigl (P_n\psi _{\hat{\pi },\hat{\theta }_1}\psi _{\hat{\pi },\hat{\theta }_1}^T\bigr ) \cdot H_n^{-1}\) (resp. of \(H_n'^{-1} \cdot \bigl ( P_n\psi _{\hat{\pi },\hat{\theta }_1}'\psi _{\hat{\pi },\hat{\theta }_1}'^T\bigr ) \cdot H_n'^{-1}\))—where \(H_n\) (resp. \(H_n'\)) denotes the Hessian matrix of \((\pi ,\theta _1) \mapsto P_nm_{\pi ,\theta _1,\theta _2}\) (resp. of \((\pi ,\theta _1) \mapsto P_nm_{\pi ,\theta _1,\theta _2'}\)) at the point \((\hat{\pi },\hat{\theta }_1)\), which is supposed to be invertible with high probability—one gets
with
where b is the (1, 1)-entry of the matrix \(H^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}'^T\bigr ) \cdot H'^{-1}\), and a (resp. \(a'\)) the (1, 1)-entry of the matrix \(H^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}^T\bigr ) \cdot H^{-1}\) (resp. of \(H'^{-1} \cdot \bigl (\mathbb {P}\psi _{\pi ^*,\theta _1^*}'\psi _{\pi ^*,\theta _1^*}'^T\bigr ) \cdot H'^{-1}\)).
This result naturally generalises to k-tuples. The marginal result for \(\theta \) actually also holds when \(\pi ^* > 0\) and \(\theta _2 = \theta _2^*\), which is useful to control the power of the test procedure to be defined.
Let us consider as a test statistic \(T_n = \sup _{\theta _2} \sqrt{\frac{n}{a_n}} \hat{\pi }\) and let us reject \(H_0\) when \(T_n\) takes large values. It seems sensible to reduce \(\hat{\pi }\) for each value of \(\theta _2\) so that, under \(H_0\), it is asymptotically distributed as a \(\mathcal N(0,1)\) and that the (reduced) values of \(\hat{\pi }\) for different values of \(\theta _2\) can be compared. In practice, the asymptotic variance has to be estimated hence the substitution of \(\hat{\pi }\), \(\hat{\theta }_1\), and \(P_n\) for \(\pi ^*\), \(\theta _1^*\), and \(P^*\) in \(H^{-1} P^* \psi _{\pi ^*,\theta _1^*}\psi _{\pi ^*,\theta _1^*}^T H^{-1}\). This choice is justified in [19].
The Bonferoni aggregation rule is not sensible here since the tests for different values of \(\theta _2\) are obviously not independent so that such a procedure would lead to a conservative test. Hence the need in Theorem 3 for the joint asymptotic distribution to take the dependence between \(\hat{\pi }\) for different values of \(\theta _2\). This leads to the study of the asymptotic distribution of \(T_n\) which should be the distribution of \(\sup W\) where W is a Gaussian process which covariance structure is given by Theorem 3. This will be proved in the forthcoming section.
5 Asymptotic Distribution of the Supremum of Supremal Divergence Estimators
\(H_0\) is assumed to hold in this section.
It is stated that the asymptotic distribution of \(T_n\) is that of the supremum of a Gaussian process with the covariance \(\frac{b}{\sqrt{aa'}}\), as in (4).
Then it is stated that the distribution of the latter can be approximated by maximising the Gaussian process with the covariance \(\frac{b_n}{\sqrt{a_na_n'}}\), where \(a_n\), \(a_n'\), and \(b_n\) are estimations of the corresponding quantities, on a finite grid of values for \(\theta _2\).
Let X be the centred Gaussian process over \(\varTheta _2\) with
where a and b are defined in Theorem 3.
Theorem 4
Under general regularity conditions pertaining to the class of derivatives of m (Glivenko-Cantelli classes), we have
This results from [21] when \(dim(\varTheta _2) = 1\) and [24] when \(dim(\varTheta _2) > 1\).
Theorem 5
Under the same general regularity conditions as above, we have
The proof of the last result when \(dim(\varTheta _2) = 1\) makes use of the fact that \(\theta _2 \mapsto \hat{\pi }(\theta _2)\) is cadlag ([21]). This is a reasonable assumption, which holds in the examples which we considered. We are eager for counter-examples! When \(dim(\varTheta _2) > 1\), the result holds also by [3].
Let now \(X^n\) be the centred Gaussian process over \(\varTheta _2\) with
where \(a_n\), \(a_n'\) are defined in Theorem 3 and \(b_n\) is defined analogously.
Theorem 6
Let, for any \(\delta > 0\), \(\varTheta _2^\delta \) be a finite set such that \(\forall \theta _2\in \varTheta _2, \exists \tilde{\theta }_2\in \varTheta _2^\delta / \Vert \theta _2-\tilde{\theta }_2\Vert \le \delta \). Then
6 Algorithm
Our algorithm for testing that the data was sampled from a single-component mixture (\(H_0 : \pi ^* = 0\)) against a two-component mixture (\(H_1 : \pi ^* > 0\)) is presented in Algorithm 1.
In this algorithm, \(\hat{\pi }(\theta _2)\) is defined in (2) and (3). It depends on g. This Theorems hold as long as g fulfils \(Supp(g) \subset Supp(g_{\pi ,\theta })\) for any \((\pi ,\theta )\in \varTheta \). However it has to be chosen with care. The constants in the asymptotic distribution in Theorem 3 depend on it. Moreover [23] argue that the choice of g can influence the robustness properties of the procedure.
The choice of \(\varphi \) is also obviously crucial (see also [23] for the induced robustness properties).
The choice of \(\varphi \) and g are important practical questions which are work in progress.
As already stated, the supremal estimator for the modified Kullback-Leibler divergence \(\varphi : x \in \mathbb R^{+*} \mapsto -\log x + x - 1\) is the usual maximum likelihood estimator. In this instance the estimator does not depend on g.
References
Al Mohamad, D.: Towards a better understanding of the dual representation of phi divergences. Stat. Pap. 59(3), 1205–1253 (2016). https://doi.org/10.1007/s00362-016-0812-5
Bickel, P.J., Chernoff, H.: Asymptotic distribution of the likelihood ratio statistic in a prototypical non regular problem. In: Statistics and Probability: A Raghu Raj Bahadur Festschrift, pp. 83–96 (1993)
Bickel, P.J., Wichura, M.J.: Convergence criteria for multiparameter stochastic processes and some applications. Ann. Math. Stat. 42(5), 1656–1670 (1971)
Birrell, J., Dupuis, P., Katsoulakis, M.A., Pantazis, Y., Rey-Bellet, L.: (f, \(\gamma \))-divergences: interpolating between f-divergences and integral probability metrics. J. Mach. Learn. Res. 23(1), 1816–1885 (2022)
Broniatowski, M., Jurečková, J., Moses, A.K., Miranda, E.: Composite tests under corrupted data. Entropy 21(1), 63 (2019)
Broniatowski, M., Keziou, A.: Minimization of \(\varphi \)-divergences on sets of signed measures. Studia Scientiarum Mathematicarum Hungarica 43(4), 403–442 (2006)
Broniatowski, M., Keziou, A.: Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 100(1), 16–36 (2009)
Broniatowski, M., Miranda, E., Stummer, W.: Testing the number and the nature of the components in a mixture distribution. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2019. LNCS, vol. 11712, pp. 309–318. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26980-7_32
Broniatowski, M., Stummer, W.: Some universal insights on divergences for statistics, machine learning and artificial intelligence. In: Geometric Structures of Information, pp. 149–211 (2019)
Feng, Z., McCulloch, C.E.: Statistical inference using maximum likelihood estimation and the generalized likelihood ratio when the true parameter is on the boundary of the parameter space. Stat. Probabil. Lett. 13(4), 325–332 (1992)
Garel, B.: Asymptotic theory of the likelihood ratio test for the identification of a mixture. J. Stat. Plan. Infer. 131(2), 271–296 (2005)
Ghosh, J.K., Sen, P.K.: On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. Technical report, North Carolina State University. Department of Statistics (1984)
Hall, P., Stewart, M.: Theoretical analysis of power in a two-component normal mixture model. J. Stat. Plan. Infer. 134(1), 158–179 (2005)
Hartigan, J.A.: Statistical theory in clustering. J. Classif. 2(1), 63–76 (1985)
Liese, F., Vajda, I.: Convex statistical distances, vol. 95. Teubner (1987)
Liese, F., Vajda, I.: On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52(10), 4394–4412 (2006)
Liu, X., Shao, Y.: Asymptotics for the likelihood ratio test in a two-component normal mixture model. J. Stat. Plan. Infer. 123(1), 61–81 (2004)
Lo, Y., Mendell, N.R., Rubin, D.B.: Testing the number of components in a normal mixture. Biometrika 88(3), 767–778 (2001)
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Nguyen, X., Wainwright, M.J., Jordan, M.I.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)
Pollard, D.: Convergence of Stochastic Processes. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4612-5254-2
Rüschendorf, L.: On the minimum discrimination information theorem. In: Statistical Decisions, pp. 263–283 (1984)
Toma, A., Broniatowski, M.: Dual divergence estimators and tests: robustness results. J. Multivar. Anal. 102(1), 20–36 (2011)
van der Vaart, A., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, Heidelberg (1996). https://doi.org/10.1007/978-1-4757-2545-2
Acknowledgements
The authors gratefully acknowledge the reviewers for their valuable comments and suggestions which helped improving the article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baudry, JP., Broniatowski, M., Thommeret, C. (2023). Aggregated Tests Based on Supremal Divergence Estimators for Non-regular Statistical Models. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2023. Lecture Notes in Computer Science, vol 14071. Springer, Cham. https://doi.org/10.1007/978-3-031-38271-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-38271-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38270-3
Online ISBN: 978-3-031-38271-0
eBook Packages: Computer ScienceComputer Science (R0)