Abstract
When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than \(\epsilon \) will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an \(\epsilon \) false negative rate using as few as \(1/\epsilon \) data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate while also observing low false detection (positive) rate.
The NASA University Leadership Initiative (grant #80NSSC20M0163) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not any NASA entity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lyft motion prediction dataset. https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/data (2020).
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: a multimodal dataset for autonomous driving (2019). arXiv:1903.11027
Cai, F., Koutsoukos, X.: Real-time out-of-distribution detection in learning-enabled cyber-physical systems. In: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), pp. 174–183 (2020)
Calafiore, G., Campi, M.: The scenario approach to robust control design. IEEE Trans. Autom. Control 51(5), 742–753 (2006)
Chen, Y., Rosolia, U., Fan, C., Ames, A., Murray, R.: Reactive motion planning with probabilistic safety guarantees. In: Conference on Robot Learning (2020)
Correll, N., Bekris, K.E., Berenson, D., Brock, O., Causo, A., Hauser, K., Okada, K., Rodriguez, A., Romano, J.M., Wurman, P.R.: Analysis and observations from the first amazon picking challenge. IEEE Trans. Autom. Sci. Eng. 15(1), 172–188 (2016)
Crestani, D., Godary-Dejean, K., Lapierre, L.: Enhancing fault tolerance of autonomous mobile robots. Robot. Auton. Syst. 68, 140–155 (2015)
Ding, S.X.: Model-Based Fault Diagnosis Techniques: Design Schemes, Algorithms and Tools, pp. 3–11. Springer, London (2013)
Eppner, C., Höfer, S., Jonschkowski, R., Martín-Martín, R., Sieverling, A., Wall, V., Brock, O.: Lessons from the amazon picking challenge: Four aspects of building robotic systems. In: Robotics: Science and Systems (2016)
Feldman, S., Bates, S., Romano, Y.: Improving conditional coverage via orthogonal quantile regression (2021). arXiv:2106.00394
Foody, G.M.: Sample size determination for image classification accuracy assessment and comparison. Int. J. Remote Sens. 30(20), 5273–5291 (2009)
Gammerman, A., Nouretdinov, I., Burford, B., Chervonenkis, A., Vovk, V., Luo, Z.: Clinical mass spectrometry proteomic diagnosis by conformal predictors. Stat. Appl. Genet. Mol. Biol. 7(2), 1–12 (2008)
Harirchi, F., Ozay, N.: Model invalidation for switched affine systems with applications to fault and anomaly detection. Anal. Design Hybrid Syst. (ADHS) 48(27), 260–266 (2015)
Harirchi, F., Ozay, N.: Guaranteed model-based fault detection in cyber-physical systems: a model invalidation approach (2017). arXiv:1609.05921
Hernandez, C., Bharatheesha, M., Ko, W., Gaiser, H., Tan, J., van Deurzen, K., de Vries, M., Van Mil, B., van Egmond, J., Burger, R., et al.: Team delft’s robot winner of the amazon picking challenge 2016. In: Robot World Cup, pp. 613–624. Springer (2016)
Khalastchi, E., Kalech, M.: On fault detection and diagnosis in robotic systems. ACM Comput. Surv. (CSUR) 51(1), 1–24 (2018)
von Luxburg, U., Schölkopf, B.: Statistical learning theory: models, concepts, and results. In: Gabbay, D.M., Hartmann, S., Woods, J. (eds.) Inductive Logic, Handbook of the History of Logic, vol. 10, pp. 651–706. North-Holland (2011)
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Robotics: Science and Systems (RSS) (2017)
Mahler, J., Matl, M., Liu, X., Li, A., Gealy, D., Goldberg, K.: Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning (2017). arXiv:1709.06670
Mahler, J., Matl, M., Satish, V., Danielczuk, M., DeRose, B., McKinley, S., Goldberg, K.: Learning ambidextrous robot grasping policies. Sci. Robot. 4(26), eaau4984 (2019)
Muradore, R., Fiorini, P.: A pls-based statistical approach for fault detection and isolation of robotic manipulators. IEEE Trans. Ind. Electron. 59(8), 3167–3175 (2011)
Nouretdinov, I., Costafreda, S.G., Gammerman, A., Chervonenkis, A., Vovk, V., Vapnik, V., Fu, C.H.: Machine learning classification with confidence: application of transductive conformal predictors to mri-based diagnostic and prognostic markers in depression. NeuroImage 56(2), 809–813 (2011)
Patton, R., Chen, J.: Observer-based fault detection and isolation: Robustness and applications. Control Eng. Pract. 5(5), 671–682 (1997)
Perdomo, J., Zrnic, T., Mendler-Dünner, C., Hardt, M.: Performative prediction. In: International Conference on Machine Learning, pp. 7599–7609. PMLR (2020)
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data (2020)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. (JMLR). https://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf (2008)
Tibshirani, R.J., Barber, R.F., Candès, E.J., Ramdas, A.: Conformal prediction under covariate shift (2019). arXiv:1904.06019
Vemuri, A.T., Polycarpou, M.M., Diakourtis, S.A.: Neural network based fault detection in robotic manipulators. IEEE Trans. Robot. Autom. 14(2), 342–348 (1998)
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: Expert system framework for fault detection and fault tolerance in robotics. Comput. & Electr. Eng. 20(5), 421–435 (1994)
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: Robotic fault detection and fault tolerance: a survey. Reliab. Eng. & Syst. Safety 46(2), 139–158 (1994)
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: A dynamic fault tolerance framework for remote robots. IEEE Trans. Robot. Autom. 11(4), 477–490 (1995)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic learning in a random world (2005)
Vovk, V., Lindsay, D., Nouretdinov, I.: Mondrian confidence machine (2003)
Yu, K.T., Fazeli, N., Chavan-Dafle, N., Taylor, O., Donlon, E., Lankenau, G.D., Rodriguez, A.: A summary of team mit’s approach to the amazon picking challenge 2015 (2016). arXiv:1604.03639
Zeng, A., Yu, K.T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1386–1383. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 6.1 Proofs
Proposition 1: Under Assumption 1, Algorithm 1 is \(\epsilon + 1/(1+|{\mathcal {A}}|)\)-safe (with respect to \({\hat{Y}}, {\hat{Z}}\)).
Proof
Given a sequence of data points \((Y_1, Z_1), \cdots , (Y_T, Z_T)\), denote the subsequence of “unsafe” data as \((Y_{c_1}, Z_{c_1}), \cdots , (Y_{c_M}, Z_{c_M})\) where \(Z_{c_t}\) is the t-th unsafe example (i.e. \(f(Z_{c_t}) < f_0\)), so \(M = |{\mathcal {A}}|\). Suppose that \(\hat{Z}\) is also unsafe, i.e. \(f(\hat{Z}) < f_0\). Let denote an unordered bag (i.e. it is a set that can have repeated elements). We can bound the safety by
By the assumption of exchangeability we are equally likely to observe any permutation of . Intuitively, it is equally likely for \(g(\hat{Y})\) to be the largest, 2nd largest, etc., among \(g(Y_{c_1}), \cdots , g(Y_{c_M}), g(\hat{Y})\). Formally, the random variable \(|\lbrace t \mid g(Y_{c_t}) < g(\hat{Y}) \rbrace | + U\) takes on all values \(\lbrace 0, 1, \cdots , M \rbrace \) with equal probability. Therefore,
We can combine this with the original result to get
1.2 6.2 Lower Bound on the False Positive Rate
Consider a function w that maps a dataset \({\mathcal {D}}= (g(X_1), Y_1), \cdots , (g(X_T), Y_T)\) of unsafe examples, and a new data point \(g({\hat{X}})\), to \(\lbrace 0, 1\rbrace \). We argue that any w that gives a distribution-free false negative rate guarantee should depend only on the ordering between \(g(X_1), \cdots , g(X_T), g({\hat{X}})\), and not on their specific values. In other words, w should take the form defined by
for some deterministic function \(\phi \) and real number \(\gamma \). We know that when the data is exchangeable, \(\# \lbrace t, g({\hat{X}}) < g(X_t) \rbrace \) is uniformly distributed on \(\lbrace 0, 1, \cdots , T \rbrace \).
Case 1 Suppose \(\phi \) takes the value 0 for at least one possible input; then the false negative rate is given by
and the false positive rate is given by
so combined we have
Case 2 Suppose \(\phi \) takes the value 0 for none of the inputs; then the false negative rate is given by
so we would still (trivially) have \(\text {FPR} \ge 1-(1+T)\epsilon \).
So far we have shown that if w were to take the specific form of Eq. (6), then the false positive rate must be lower bounded by \(1-(1+T)\epsilon \). In other words, when \(\epsilon = o(1/T)\), the false positive rate tends to 1 when T is large.
1.3 Additional Experimental Details: Driver Alert System
Safety score: We define the safety score by the Mahalanobis distance between the ego-vehicle and the agent, where the first eigenvector is aligned with the ego-vehicle’s velocity vector, and the second eigenvector is orthogonal to the ego-vehicle; the magnitude of the first eigenvector is the magnitude of the velocity, and the magnitude of the second eigenvector is approximately half of a car width (we use 1m). Intuitively, this means that agents that are along the ego-vehicle’s velocity vector appear closer than agents in the perpendicular direction. This metric is similar to time to collision (TTC), but it is continuous whereas TTC is not—TTC is infinite unless two vehicles are exactly on a collision course.
Dataset details: The nuScenes dataset includes 952 scenes collected across Boston and Singapore, divided into a 697/105/150 train/val/test split (the same split used for the original Trajectron++). Each scene is 20 s long. The Kaggle Lyft Motion Prediction dataset is a subset of the full Lyft Level 5 dataset (chosen over the full dataset for computational reasons). It includes approximately 16k scenes, divided into an 70%/15%/15% train/val/test split. Each scene is 25 s long. Both datasets include labeled ego-vehicle trajectories as well as labeled detections and trajectories for other agents in the scene. Note that for both of these datasets, because the training split was used to train the Trajectron++ model, we used the validation split as the input training data for Algorithm 1.
Additional experimental results: We demonstrate empirically on the nuScenes dataset that the sum of \(\epsilon \) and the false positive rate must be high when there are few (e.g. \(< 1/T\)) samples, which is consistent with what our theory from Sect. 3.2 would predict. Figure 4 plots the epsilon bound as well as the false negative and false positive rates vs. the number of unsafe samples in the validation dataset; we see that when \(\epsilon \) decreases as 1/T, the false positive rate is relatively flat and low.
We also demonstrate empirically on the Kaggle Lyft dataset that the variance on the false negative rate over different train/test splits is low. Table 1 displays the variance on the false negative rate calculated over the 100 trials at each \(\epsilon \) value. All of the variances are well below 0.003, suggesting that the test sequence false negative rates are clustered around \(\epsilon \) (rather than having some sequences that fail on zero examples and others with catastrophic failures). As further evidence, in Fig. 5, we provide a representative box plot of the false negative rates over the 100 trials with \(\epsilon = 0.04\). The variances are indeed clustered around 0.04.
1.4 6.3 Additional Experimental Details: Robotic Grasping Experiments
Model and dataset details: The Grasp Quality Convolutional Neural Network (GQ-CNN) from [18] is a model that classifies whether a candidate robotic grasp will be successful. The inputs to a GQ-CNN are a point cloud representation of an object, \(\textbf{y}\), and a candidate grasp, \(\textbf{u}\). A GQ-CNN outputs the predicted probability, \(Q_{\theta }(\textbf{y}, \textbf{u})\), that the candidate grasp will be able to successfully pick and transport the object. We use this predicted probability as the safety score, \(g = Q_{\theta }(\textbf{y}, \textbf{u})\). We consider a candidate grasp “unsafe” if it will not be able to successfully pick the object (i.e. the true label is \(Z = 0\)). Note that this is exactly the ROC curve threshold tuning setup. We use the DexNet dataset of synthetic objects grasped with a parallel jaw gripper [18], which includes approximately 500k pick attempts not used in training the GQ-CNN model. These are divided into a 50%/50% train/test split. Each example is labeled a success if the robot successfully picks and places the object, and a failure otherwise.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, R. et al. (2023). Sample-Efficient Safety Assurances Using Conformal Prediction. In: LaValle, S.M., O’Kane, J.M., Otte, M., Sadigh, D., Tokekar, P. (eds) Algorithmic Foundations of Robotics XV. WAFR 2022. Springer Proceedings in Advanced Robotics, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-031-21090-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-21090-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21089-1
Online ISBN: 978-3-031-21090-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)