Some Proposal of the High Dimensional PU Learning Classification Procedure

Furmańczyk, Konrad; Dudziński, Marcin; Dziewa-Dawidczyk, Diana

doi:10.1007/978-3-030-77967-2_2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

International Conference on Computational Science

1116 Accesses
1 Citations

Abstract

In our work, we propose a new classification method for positive and unlabeled (PU) data, called the LassoJoint classification procedure, which combines the thresholded Lasso approach in the first two steps with the joint method based on logistic regression, introduced by Teisseyre et al. [12], in the last step. We prove that, under some regularity conditions, our procedure satisfies the screening property. We also conduct some simulation study in order to compare the proposed classification procedure with the oracle method. Prediction accuracy of the proposed method has been verified for some selected real datasets.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

Article 04 November 2019

Positive-unlabeled classification under class-prior shift: a prior-invariant approach based on density ratio estimation

Article 27 June 2022

Keywords

1 Introduction

Learning from positive and unlabeled (PU in short) data is an approach, where training data contains only positive and unlabeled examples, which means that the true labels $Y\in \left\{ 0,1\right\} $ are not observed directly, since only surrogate variable $S\in \left\{ 0,1\right\} $ is observable. This surrogate variable equals 1 - if an example is labeled, or 0 - if otherwise. The PU datasets appear in a large number of applications. For example, they often appear while dealing with the so-called under-reporting data from medical surveys, fraud detection and ecological modeling (see, e.g., Hastie and Fithian [6]). Some other interesting examples of the under-reporting survey data may be found in Bekker and Davis [1] and Teisseyre et al. [12].

Suppose that X is a feature vector and, as mentioned earlier, $Y\in \left\{ 0,1\right\} $ denotes a true class label and $S\in \left\{ 0,1\right\} $ is a variable indicating, whether an example is labeled or not (then, $S=1$ or $S=0$, respectively). We apply a commonly used assumption, called the Selected Completely At Random (SCAR) condition, which states that the labeled examples are randomly selected from a set of positives examples, independently from X, i.e. $P(S=1|Y=1,X)=P(S=1|Y=1).$ Let $c=P(S=1|Y=1)$. The parameter c is called the label frequency and plays a key role in the PU learning problem. The primary objective of our note is to introduce a new PU learning classification procedure leading to the estimation of the posterior probability $f(x)=P(Y=1|X=x)$.

The three basic methods of this estimation have been proposed so far. They consist in minimizing the empirical risk of logistic loss function and are known as: the naive method, the weighted method, and the joint method (the last one has been recently introduced in the paper of Teisseyre et al. [12]). All of these approaches have been thoroughly described in [12]. As the joint method will be applied in our procedure’s construction, some details regarding this method will be presented in the next section of our article. We have named our proposed classification method as the LassoJoint procedure, since it is a three-step approach combining the thresholded Lasso procedure with the joint method from Teisseyre et al. [12]. Namely, in its two first steps we perform - for some prespecified level - the thresholded Lasso procedure, in order to obtain the support for coefficients of a feature vector X, while in the third step we apply - on the previously determined support - the joint method. Apart from the works, where different learning methods applying logistic regression for PU data have been proposed, there are also some other interesting articles, where various machine learning tools in the PU learning problems have been used. In this context, it is worthwhile to mention: the papers of Hou [7] and Guo [5], where the generative adversial networks (GAN) for the PU problem have been employed, the work of Mordelet and Vert [10], where the bagging Support Vector Machine (SVM) approach for the PU data have been applied, and an article of Song and Raskutti [11], where the multidimensional PU problem with regard to the features selection has been investigated, and where the so-called PUlasso design has been established. It turns out that the LassoJoint procedure, which we propose in our work, is computationally simple and efficient in comparison to the other existing methods where the PU problem is considered. The simplicity and efficiency of our approach have been confirmed by the conducted simulation study.

The remainder of the paper is structured as follows. Namely, in Sect. 2 we describe our classification procedure in detail, in particular we also prove that the introduced method is the so-called screening procedure (i.e., it selects with a high probability the most significant predictors and the number of selected features is not greater that the sample size), as the screening property is necessary to apply the joint method in the final step of the procedure. In turn, in Sect. 3 we carry out some numerical study, in order to check the efficiency of the proposed approach, while in Sect. 4 we summarize and conclude our research. The results of numerical experiments on real data are given in Supplement^{Footnote 1}.

2 The Proposed LassoJoint Algorithm

In our considerations, we assume that we have a random vector (Y, X), where $Y\in \left\{ 0,1\right\} $ and $X\in \mathbb {R}^{p}$ is a feature vector, and that a random sample $\left( Y_{1},X_{1}\right) ,\ldots ,\left( Y_{n},X_{n}\right) $ is distributed as $\left( Y,X\right) $ and independent of it. In addition, we suppose that the coordinates $X_{ji}$ of $X_{i}$, $i=1,...,n$, $j=1,...,p$, are subgaussian with a parameter $\sigma _{jn}^{2}$, i.e. $E\exp \left( uX_{ji}\right) \le \exp \left( u^{2}\sigma _{jn}^{2}/2\right) $ for all $u\in \mathbb {R}.$

Let: $s_{n}^{2}=\max _{1\le j\le p}\sigma _{jn}^{2}$, $\lim \sup _{n}s_{n}^{2}<\infty $, and $P\left( Y=1\mid X=x\right) =q(x^{T}\beta )$ for some function $0<q(x)<1$ and all $x\in \mathbb {R}^{p}$, where p may depend on n and $p>n$. Put: $I_{0}=\left\{ j:\beta _{j}\ne 0\right\} $, $ I_{1}=\{1,\ldots ,p\}\backslash I_{0}$ and $\left| I_{0}\right| =p_{0}.$ We shall assume - as in Kubkowski and Mielniczuk [9] - that the distribution of X satisfies the linear regression condition (LRC), which means that

$$\begin{aligned} E(X\mid X^{T}\beta )=u_{0}+u_{1}X^{T}\beta \text { for some } u_{0},u_{1}\in \mathbb {R}^{p}. \end{aligned}$$

This condition is fulfilled (for all $\beta $) by the class of elliptical distributions (such that, e.g., the normal distribution or the multivariate t-Student distribution). Reasoning as in Kubkowski and Mielniczuk [9], we obtain that under (LRC), there exists $\eta $ satisfying $\beta ^{*}=\eta \beta ,$ where $\eta \ne 0$ if $\mathrm{cov}(Y,X^{T}\beta )\ne 0$, and where $\beta ^{*}=\arg \min _{\beta }R(\beta ),$ with R standing for the risk function given by $R\left( \beta \right) =-E_{\left( X,Y\right) }l\left( \beta ,X,Y\right) ,$ where in turn, $l\left( \beta ,X,Y\right) =Y\log \sigma (X^{T}\beta )+\left( 1-Y\right) \log \left( 1-\sigma (X^{T}\beta )\right) ,$ with $\sigma $ denoting logistic function of the form $\sigma (X^{T}\beta )=\exp (X^{T}\beta )/\left[ 1+\exp (X^{T}\beta )\right] .$ Put: $I_{0}^{*}=\left\{ j:\beta _{j}^{*}\ne 0\right\} $, $ I_{1}^{*}=\{1,\ldots ,p\}\backslash I_{0}^{*}$. It may be observed that under (LRC), we have $I_{0}=I_{0}^{*}$ and consequently that $ \mathrm{Supp}(I_{0})=\mathrm{Supp}(I_{0}^{*}).$ In addition, put $H(b)=E(X^{T}X\sigma ^{\prime }(X^{T}b))$ and define a cone $ C(d,w)=\{\varDelta \in \mathbb {R}^{p}:\left\| \varDelta _{w^{c}}\right\| _{1}\le d\left\| \varDelta _{w}\right\| _{1}\}$, where: $w\subseteq \{1,\ldots ,p\}$, $w^{c}=\{1,\ldots ,p\}\backslash w$, $\varDelta _{w}=(\varDelta _{w_{1}},\ldots ,\varDelta _{w_{k}})$, for $w=(w_{1},\ldots ,w_{k})$. Furthermore, let $\kappa $ be a generalized minimal eigenvalue of the matrix $H(\beta ^{*})$ given by $\kappa =\inf _{\varDelta \in C(3,s_{0}^{*})}\frac{\varDelta ^{T}H(\beta ^{*})\varDelta }{\varDelta ^{T}\varDelta }\text {.}$ Moreover, we also define $\beta _{\min }^{*}$ and $\beta _{\min }$ as $ \beta _{\min }^{*}:=\min _{j\in I_{0}^{*}}\left| \beta _{j}^{*}\right| $ and $\beta _{\min }:=\min _{j\in I_{0}}\left| \beta _{j}\right| $, respectively.

After these preliminaries, we are now in a position to depict the proposed method. Namely, our procedure, called the LassoJoint approach, is a three-step method, which is described as follows:

(1) For available PU dataset $(s_{i},x_{i})$, $i=1,\ldots ,n$, we perform the ordinary Lasso procedure (see Tibshirani [14]) for some tuning parameter $\lambda >0$, i.e. we compute the following Lasso estimator of $\beta ^{*}$: $\hat{\beta }^{(L)}=\arg \min _{\beta \in R^{p+1}}\hat{R}(\beta )+\lambda \sum _{j=1}^{p}\left| \beta _{j}\right| ,$ where $\hat{R}(\beta )=-\frac{1}{n}\sum _{i=1}^{n}\left[ s_{i}\log \left( \sigma (x_{i}^{T}\beta )\right) +\left( 1-s_{i}\right) \log \left( 1-\sigma (x_{i}^{T}\beta )\right) \right] $ and subsequently, we obtain the corresponding support $\mathrm{Supp}^{(L)}=\{1\le j\le p:\hat{\beta }_{j}^{(L)}\ne 0\}$;

(2) We perform the thresholded Lasso for some prespecified level $\delta $ and obtain the support $\mathrm{Supp}^{(TL)}=\{1\le j\le p:\left| \hat{\beta } _{j}^{(L)}\right| \ge \delta \}$;

(3) We apply the joint method from Teisseyre et al. [12] for the predictors from $\mathrm{Supp}^{(TL)}$.

Remark. The PU problem is related to incorrect specification of the logistic model. Under the SCAR assumption, we obtain that $P(S=1|X=x)=cq(x^T\beta )$ and consequently, if $cq()\ne \sigma ()$, then in step (1) we are fitting misspecified logistic model to (S, X). Generally speaking, the joint method from [12] consists in fitting the PU data to logistic function and in the minimization, with respect to $\beta $ and $c=P(S=1\mid Y=1)$, of the following empirical risk $\hat{R}\left( \beta ,c\right) =-\frac{1}{n}\sum _{i=1}^{n}\left[ s_{i}\log \left( c\sigma (x_{i}^{T}\beta )\right) +\left( 1-s_{i}\right) \log \left( 1-c\sigma (x_{i}^{T}\beta )\right) \right] ,$ where $\sigma $ stands for logistic function and $\left\{ \left( s_{i},x_{i}\right) \right\} $ is the sample of observations from the distribution of a random vector $\left( S,X\right) . $

The newly proposed LassoJoint procedure is similar to the LassoSD approach introduced in Furmańczyk and Rejchel [3]. The only difference between these two methods is that, we apply the joint method from [12] in the last step of our procedure - contrary to the procedure from [3] , where the authors use multiple hypotheses testing in its final stage. The introduced LassoJoint procedure is determined by the two parameters $ \lambda $ and $\delta $, which may depend on n. The selection of $\lambda $ and $\delta $ is possible, if we impose the following conditions - denoted as the assumptions (A1)–(A4):

(A1)
The generalized eigenvalue $\kappa $, of the matrix $H(\beta ^{*})$, is such that $m\le \kappa \le M$, for some $0<m<M;$
(A2)
$p_{0}^{2}\log (p)=o(n)$, $\log (p)=o(n\lambda ^{2})$, $\lambda ^{2}p_{0}^{2}\log (np)=o(1),$ as $n\rightarrow \infty ;$
(A3)
$p_{0}+\frac{c_{n}^{2}}{\delta ^{2}}\le n$, where $c_{n}=10 \frac{\sqrt{p_{0}}}{\kappa }\lambda ;$
(A4)
$\beta _{\min }\ge \left( \delta +c_{n}/\sqrt{p_0}\right) /\eta .$

Clearly, in view of (LRC), the condition from (A4) is equivalent to the constraint that $\beta _{\min }^{*}\ge \delta +c_{n}/\sqrt{p_0}$. In addition, due to (A2), we get that $c_{n}\rightarrow 0.$

The main strictly theoretical result of our work is the following assertion.

Theorem 1

(Screening property) Under the conditions (LRC) and (A1)–(A4), we have that with a probability at least $1-\epsilon _{n}$, where $\epsilon _{n}\rightarrow 0$:

(a)
$\left| \mathrm{Supp}^{(TL)}\right| \le p_{0}+\frac{c_{n}^{2}}{ \delta ^{2}}\le n$,
(b)
$I_{0}\subset \mathrm{Supp}^{(TL)}$.

The presented theorem states that the proposed LassoJoint procedure is the so-called screening procedure, which means that (in the first two steps) this method selects, with a high probability, the most significant predictors of the model and that the number of selected features is not greater than the sample size n. This screening property guarantees that with a high probability, we may apply the joint procedure from Teisseyre et al. [12], based on fitting logistic regression to PU data. The proof of Theorem 1 uses the following lemma, which straightforwardly follows from Theorem 4.9 in Kubkowski [8].

Lemma 1

Under the assumptions (A1)–(A2), we obtain that with a probability at least $ 1-\epsilon _{n}$, with $\epsilon _{n}$ satisfying $\epsilon _{n}\rightarrow 0 $, the following property holds: $\left\| \hat{\beta }^{(L)}-\beta ^{*}\right\| _{2}\le c_{n},$ where $c_{n}\rightarrow 0$ and $\left\| x\right\| _{2}=\sqrt{ \sum _{j=1}^{p}x_{j}^{2}}$ for $x\in \mathbb {R}^{p}$.

Proof

First, we prove the relation stated in (a). By the definition of $\hat{\beta } _{j}^{(L)}$, we get $\sum _{j\in I_{1}\cap \mathrm{Supp}^{(TL)}}\left( \hat{\beta }_{j}^{(L)}\right) ^{2}\ge \delta ^{2}\left| I_{1}\cap \mathrm{Supp}^{(TL)}\right| \text {.}$ Hence,

$$\begin{aligned} \left| I_{1}\cap \mathrm{Supp}^{(TL)}\right|\le & {} \frac{1}{\delta ^{2}} \sum _{j\in I_{1}\cap \mathrm{Supp}^{(TL)}}\left( \hat{\beta }_{j}^{(L)}\right) ^{2} \\= & {} \frac{1}{\delta ^{2}}\sum _{j\in I_{1}\cap \mathrm{Supp}^{(TL)}}\left( \hat{\beta } _{j}^{(L)}-\beta _{j}^{*}\right) ^{2}\le \frac{1}{\delta ^{2}} \left\| \hat{\beta }^{(L)}-\beta ^{*}\right\| _{2}^{2}, \end{aligned}$$

and

$\left| \mathrm{Supp}^{(TL)}\right| \le \left| I_{0}\cap \mathrm{Supp}^{(TL)}\right| +\left| I_{1}\cap \mathrm{Supp}^{(TL)}\right| \le p_{0}+\frac{1}{\delta ^{2}}\left\| \hat{\beta }^{(L)}-\beta ^{*}\right\| _{2}^{2}\text {.}$

It follows from the cited lemma that with a probability at least $1-\epsilon _{n}$, where $\epsilon _{n}\rightarrow 0$, we have $\left| \mathrm{Supp}^{(TL)}\right| \le p_{0}+\frac{c_{n}^{2}}{\delta ^{2}} \text {.}$ Combining this inequality with (A3), we obtain (a). Thus, we only need to prove the property in (b).

Since $\left\{ \min _{j\in I_{0}}\left( \hat{\beta }_{j}^{(L)}\right) ^{2}\ge \delta ^{2}\right\} \subseteq \left\{ I_{0}\subset \mathrm{Supp}^{(TL)}\right\} $ , it is sufficient to show that

$$\begin{aligned} P\left( \min _{j\in I_{0}}\left( \hat{\beta }_{j}^{(L)}\right) ^{2}\ge \delta ^{2}\right) \ge 1-\epsilon _{n} \text {.} \end{aligned}$$

(1)

Let:

$\hat{\beta }_{I_{0}}^{(L)}:=\left\{ \hat{\beta }_{j}^{(L)}:j\in I_{0}\right\} $, $\beta _{I_{0}}^{*}:=\left\{ \beta _{j}^{*}:j\in I_{0}\right\} $. As $\left\| \hat{\beta }^{(L)}-\beta ^{*}\right\| _{2}^{2}\ge \left\| \hat{\beta }_{I_{0}}^{(L)}-\beta _{I_{0}}^{*}\right\| _{2}^{2}$, we have from the given lemma that with a probability at least $1-\epsilon _{n}$,

$$p_0\min _{j\in I_{0}}\left( \hat{\beta }_{j}^{(L)}-\beta _j^*\right) ^{2} \le \left\| \hat{\beta }_{I_{0}}^{(L)}-\beta _{I_{0}}^{*}\right\| _{2}^{2}=\sum _{j\in I_{0}}\left( \hat{\beta }_{j}^{(L)}-\beta _{j}^{*}\right) ^{2}\le c_{n}^{2}.$$

Hence, $\min _{j\in I_{0}}\left| \hat{\beta }_{j}^{(L)}-\beta _j^*\right| \le c_n/\sqrt{p_0}.$ In addition, by the triangle inequality, we obtain that for $j\in I_{0},$ $\left| \hat{\beta }_{j}^{(L)}\right| \ge \left| \beta _{j}^{*}\right| -\left| \hat{\beta }_{j}^{(L)}-\beta _{j}^{*}\right| $ and therefore, $\min _{j \in I_0}\left| \hat{\beta }_{j}^{(L)}\right| \ge \min _{j \in I_0}\left| \hat{\beta }_{j}^{*}\right| -c_n/\sqrt{p_0}. $ This and (A4) imply that with a probability at least $1-\epsilon _{n}, $ $\min _{j\in I_{0}}\left( \hat{\beta }_{j}^{(L)}\right) \ge \delta ^2,$ which yields (1) and consequently (b).

3 Numerical Study

Suppose that: $X_{1},\ldots \,X_{p}$ are generated independently from N(0, 1), and $Y_{i}$, $i=1,\ldots ,n$, are generated from the $\mathrm{binom}(1,p_{i})$ distribution, where: $p_{i}=\sigma (\beta _{0}+\beta _{1}X_{1i}+\ldots +\beta _{p}X_{pi}), \beta _{0}=1. $ The following high-dimensional models were simulated:

(M1)
$p_{0}=5,p=1.2\cdot 10^3,n=10^3, \beta _{1}=\ldots =\beta _{p_0}=1,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$;
(M2)
$p_{0}=5,p=1.2\cdot 10^3,n=10^3, \beta _{1}=\ldots =\beta _{p_0}=2,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$;
(M3)
$p_{0}=5,p=10^3,n=10^3, \beta _{1}=\ldots =\beta _{p_0}=2,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$;
(M4)
$p_{0}=20,p=10^3,n=10^3, \beta _{1}=\ldots =\beta _{p_0}=2,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$;
(M5)
$p_{0}=5,p=2\cdot 10^3,n=2\cdot 10^3, \beta _{1}=\ldots =\beta _{p_0}=2,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$;
(M6)
$p_{0}=5,p=2\cdot 10^3,n=2\cdot 10^3,\beta _{1}=\ldots =\beta _{p_0}=3,\beta _{p_{0}+1}=\ldots =\beta _{p}=0$.

For all of the specified models, the LassoJoint method was implemented. In its first step, the Lasso method was used with some tuning parameters $ \lambda $ that were chosen either on the basis of 10-fold cross-validation scheme in the first scenario or by putting $\lambda =((\log p)/n)^{1/3}$ in the second scenario. In the second step, we applied the thresholded Lasso design for $\delta =0.5\cdot ((\log p)/n)^{1/3}$.

In the third - and simultaneously - the last step of our procedure, the variables selected by the thresholded Lasso method were employed to the joint method from [12] for the problem of the PU data classification. From the listed models, we randomly selected $c\cdot 100\%$ of the labeled observations of S, for $c=0.1;0.3;0.5;0.7;0.9$. Next, we generated a test sample of size 1000 from our models and determined their accuracy percentage based on 100 MC replications of our experiments. The idea of our procedure’s accuracy assesment is similar to the idea from Furmańczyk and Rejchel [4]. We applied the ‘glmnet’ package [2] from the R software and [13] in our computations. The results of our simulation study are collected in Table 1 (the column ‘oracle’ shows the accuracy of classifier that uses only the significant predictors and the true parameters of logistic models).

Table 1. Results for M1–M6

Full size table

Real data experiments and all codes in R are presented in Supplement, available on https://github.com/kfurmanczyk/ICCS21.

4 Conclusions

The results of our simulation study show that if c increases, then the percentage of correct classifications increases as well. They also show that the classifications obtained by applying the proposed LassoJoint method display smaller classification errors (and thus - better classification accuracy) for the models with larger signals (i.e., for the M5 and M6 models). Comparing the M3 and M5 models, we can see that with an increase of the number of significant predictors ($p_{0}$), the classification accuracy is slightly decreasing. Furthermore, in all cases - except for the situation where $c=0.1$ - the selection of the tuning parameter $\lambda $ obtained by using the cross-validation design results in better classification accuracy. In addition, we may observe that in the case when $c=0.7$ or $c=0.9$, our LassoJoint approach is nearly as good as the ‘oracle’ method. In turn, for $ c=0.1$ the classification accuracy was low - from 0.41 do 0.60, but in the ‘easiest’ case, i.e. when $c=0.9$, the classification accuracy ranged from 0.7 to 0.82. Furthermore, the results of our experiments conducted on real datasets show that if c increases, then the percentage of correct classifications increases as well. In addition, these results show similar classification accuracy among all of the considered classification methods (see Supplement). The proposed new LassoJoint classification method for PU data allows for the relatively low simulation computational costs to analyze data in a high-dimensional case, i.e. when the number of predictors exceeds the size of available sample ($p>n$). We aim to devote our further research to a more detailed analysis of the introduced procedure, in particular to the examination regarding optimal selection of the model parameters.

Notes

1.
https://github.com/kfurmanczyk/ICCS21.

References

Bekker, J., Davis, J.: Learning from positive and unlabeled data: a survey. Mach. Learn. 109(4), 719–760 (2020). https://doi.org/10.1007/s10994-020-05877-5
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Simon, N., Tibshirani, R.: Glmnet: Lasso and elastic-net regularized generalized linear models. R package version 2.0 (2015)
Google Scholar
Furmańczyk, K., Rejchel, W.: High-dimensional linear model selection motivated by multiple testing. Statistics 54, 152–166 (2020)
Article MathSciNet Google Scholar
Furmańczyk, K., Rejchel, W.: Prediction and variable selection in high-dimensional misspecified classification. Entropy 22(5), 543 (2020)
Article MathSciNet Google Scholar
Guo, T., et al.: On positive-unlabeled classification in GAN. In: CVPR (2020)
Google Scholar
Hastie, T., Fithian, W.: Inference from presence-only data; the ongoing controversy. Ecography 36, 864–867 (2013)
Article Google Scholar
Hou, M., Chaib-draa, B., Li, C., Zhao, Q.: Generative adversarial positive-unlabeled learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018) (2018)
Google Scholar
Kubkowski, M.: Misspecification of binary regression model: properties and inferential procedures. Ph.D. thesis, Warsaw University of Technology, Warsaw (2019)
Google Scholar
Kubkowski, M., Mielniczuk, J.: Active set of predictors for misspecified logistic regression. Statistics 51, 1023–1045 (2017)
Article MathSciNet Google Scholar
Mordelet, F., Vert, J.P.: A bagging SVM to learn from positive and unlabeled examples. Pattern Recogn. Lett. 37, 201–209 (2013)
Article Google Scholar
Song, H., Raskutti, G.: High-dimensional variable selection with presence-only data. arXiv:1711.08129v3 (2018)
Teisseyre, P., Mielniczuk, J., Łazęcka, M.: Different strategies of fitting logistic regression for positive and unlabelled data. In: Krzhizhanovskaya, V.V., et al. (eds.) ICCS 2020. LNCS, vol. 12140, pp. 3–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50423-6_1
Chapter Google Scholar
Teisseyre, P.: Repository from https://github.com/teisseyrep/Pulogistic. Accessed 1 Jan 2021
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Warsaw University of Life Sciences, Warsaw, Poland
Konrad Furmańczyk, Marcin Dudziński & Diana Dziewa-Dawidczyk

Authors

Konrad Furmańczyk
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Dudziński
View author publications
You can also search for this author in PubMed Google Scholar
Diana Dziewa-Dawidczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konrad Furmańczyk .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furmańczyk, K., Dudziński, M., Dziewa-Dawidczyk, D. (2021). Some Proposal of the High Dimensional PU Learning Classification Procedure. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-77967-2_2
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Some Proposal of the High Dimensional PU Learning Classification Procedure

Abstract

Similar content being viewed by others

Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

Positive-unlabeled classification under class-prior shift: a prior-invariant approach based on density ratio estimation

Keywords

1 Introduction

2 The Proposed LassoJoint Algorithm

Theorem 1

Lemma 1

Proof

3 Numerical Study

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Some Proposal of the High Dimensional PU Learning Classification Procedure

Abstract

Similar content being viewed by others

Classification Methods Based on Fitting Logistic Regression to Positive and Unlabeled Data

Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric

Positive-unlabeled classification under class-prior shift: a prior-invariant approach based on density ratio estimation

Keywords

1 Introduction

2 The Proposed LassoJoint Algorithm

Theorem 1

Lemma 1

Proof

3 Numerical Study

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation