Sample-Efficient Safety Assurances Using Conformal Prediction

Luo, Rachel; Zhao, Shengjia; Kuck, Jonathan; Ivanovic, Boris; Savarese, Silvio; Schmerling, Edward; Pavone, Marco

doi:10.1007/978-3-031-21090-7_10

Rachel Luo¹⁵,
Shengjia Zhao¹⁵,
Jonathan Kuck¹⁶,
Boris Ivanovic¹⁵,
Silvio Savarese¹⁵,
Edward Schmerling¹⁵ &
…
Marco Pavone¹⁵

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 25))

Included in the following conference series:

International Workshop on the Algorithmic Foundations of Robotics

983 Accesses
9 Citations

Abstract

When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $\epsilon $ will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an $\epsilon $ false negative rate using as few as $1/\epsilon $ data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate while also observing low false detection (positive) rate.

The NASA University Leadership Initiative (grant #80NSSC20M0163) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not any NASA entity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Case Studies for Computing Density of Reachable States for Safe Autonomous Motion Planning

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics

Article Open access 24 June 2021

Safe Global Optimization

References

Lyft motion prediction dataset. https://www.kaggle.com/c/lyft-motion-prediction-autonomous-vehicles/data (2020).
Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., Beijbom, O.: nuscenes: a multimodal dataset for autonomous driving (2019). arXiv:1903.11027
Cai, F., Koutsoukos, X.: Real-time out-of-distribution detection in learning-enabled cyber-physical systems. In: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS), pp. 174–183 (2020)
Google Scholar
Calafiore, G., Campi, M.: The scenario approach to robust control design. IEEE Trans. Autom. Control 51(5), 742–753 (2006)
Article MathSciNet MATH Google Scholar
Chen, Y., Rosolia, U., Fan, C., Ames, A., Murray, R.: Reactive motion planning with probabilistic safety guarantees. In: Conference on Robot Learning (2020)
Google Scholar
Correll, N., Bekris, K.E., Berenson, D., Brock, O., Causo, A., Hauser, K., Okada, K., Rodriguez, A., Romano, J.M., Wurman, P.R.: Analysis and observations from the first amazon picking challenge. IEEE Trans. Autom. Sci. Eng. 15(1), 172–188 (2016)
Article Google Scholar
Crestani, D., Godary-Dejean, K., Lapierre, L.: Enhancing fault tolerance of autonomous mobile robots. Robot. Auton. Syst. 68, 140–155 (2015)
Article Google Scholar
Ding, S.X.: Model-Based Fault Diagnosis Techniques: Design Schemes, Algorithms and Tools, pp. 3–11. Springer, London (2013)
Google Scholar
Eppner, C., Höfer, S., Jonschkowski, R., Martín-Martín, R., Sieverling, A., Wall, V., Brock, O.: Lessons from the amazon picking challenge: Four aspects of building robotic systems. In: Robotics: Science and Systems (2016)
Google Scholar
Feldman, S., Bates, S., Romano, Y.: Improving conditional coverage via orthogonal quantile regression (2021). arXiv:2106.00394
Foody, G.M.: Sample size determination for image classification accuracy assessment and comparison. Int. J. Remote Sens. 30(20), 5273–5291 (2009)
Article Google Scholar
Gammerman, A., Nouretdinov, I., Burford, B., Chervonenkis, A., Vovk, V., Luo, Z.: Clinical mass spectrometry proteomic diagnosis by conformal predictors. Stat. Appl. Genet. Mol. Biol. 7(2), 1–12 (2008)
Article MathSciNet Google Scholar
Harirchi, F., Ozay, N.: Model invalidation for switched affine systems with applications to fault and anomaly detection. Anal. Design Hybrid Syst. (ADHS) 48(27), 260–266 (2015)
Google Scholar
Harirchi, F., Ozay, N.: Guaranteed model-based fault detection in cyber-physical systems: a model invalidation approach (2017). arXiv:1609.05921
Hernandez, C., Bharatheesha, M., Ko, W., Gaiser, H., Tan, J., van Deurzen, K., de Vries, M., Van Mil, B., van Egmond, J., Burger, R., et al.: Team delft’s robot winner of the amazon picking challenge 2016. In: Robot World Cup, pp. 613–624. Springer (2016)
Google Scholar
Khalastchi, E., Kalech, M.: On fault detection and diagnosis in robotic systems. ACM Comput. Surv. (CSUR) 51(1), 1–24 (2018)
Article Google Scholar
von Luxburg, U., Schölkopf, B.: Statistical learning theory: models, concepts, and results. In: Gabbay, D.M., Hartmann, S., Woods, J. (eds.) Inductive Logic, Handbook of the History of Logic, vol. 10, pp. 651–706. North-Holland (2011)
Google Scholar
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. Robotics: Science and Systems (RSS) (2017)
Google Scholar
Mahler, J., Matl, M., Liu, X., Li, A., Gealy, D., Goldberg, K.: Dex-net 3.0: Computing robust robot suction grasp targets in point clouds using a new analytic model and deep learning (2017). arXiv:1709.06670
Mahler, J., Matl, M., Satish, V., Danielczuk, M., DeRose, B., McKinley, S., Goldberg, K.: Learning ambidextrous robot grasping policies. Sci. Robot. 4(26), eaau4984 (2019)
Google Scholar
Muradore, R., Fiorini, P.: A pls-based statistical approach for fault detection and isolation of robotic manipulators. IEEE Trans. Ind. Electron. 59(8), 3167–3175 (2011)
Article Google Scholar
Nouretdinov, I., Costafreda, S.G., Gammerman, A., Chervonenkis, A., Vovk, V., Vapnik, V., Fu, C.H.: Machine learning classification with confidence: application of transductive conformal predictors to mri-based diagnostic and prognostic markers in depression. NeuroImage 56(2), 809–813 (2011)
Article Google Scholar
Patton, R., Chen, J.: Observer-based fault detection and isolation: Robustness and applications. Control Eng. Pract. 5(5), 671–682 (1997)
Article Google Scholar
Perdomo, J., Zrnic, T., Mendler-Dünner, C., Hardt, M.: Performative prediction. In: International Conference on Machine Learning, pp. 7599–7609. PMLR (2020)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data (2020)
Google Scholar
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. (JMLR). https://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf (2008)
Tibshirani, R.J., Barber, R.F., Candès, E.J., Ramdas, A.: Conformal prediction under covariate shift (2019). arXiv:1904.06019
Vemuri, A.T., Polycarpou, M.M., Diakourtis, S.A.: Neural network based fault detection in robotic manipulators. IEEE Trans. Robot. Autom. 14(2), 342–348 (1998)
Article Google Scholar
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: Expert system framework for fault detection and fault tolerance in robotics. Comput. & Electr. Eng. 20(5), 421–435 (1994)
Article Google Scholar
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: Robotic fault detection and fault tolerance: a survey. Reliab. Eng. & Syst. Safety 46(2), 139–158 (1994)
Article Google Scholar
Visinsky, M.L., Cavallaro, J.R., Walker, I.D.: A dynamic fault tolerance framework for remote robots. IEEE Trans. Robot. Autom. 11(4), 477–490 (1995)
Article Google Scholar
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic learning in a random world (2005)
Google Scholar
Vovk, V., Lindsay, D., Nouretdinov, I.: Mondrian confidence machine (2003)
Google Scholar
Yu, K.T., Fazeli, N., Chavan-Dafle, N., Taylor, O., Donlon, E., Lankenau, G.D., Rodriguez, A.: A summary of team mit’s approach to the amazon picking challenge 2015 (2016). arXiv:1604.03639
Zeng, A., Yu, K.T., Song, S., Suo, D., Walker, E., Rodriguez, A., Xiao, J.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1386–1383. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Stanford University, Stanford, CA, 94305, USA
Rachel Luo, Shengjia Zhao, Boris Ivanovic, Silvio Savarese, Edward Schmerling & Marco Pavone
Dexterity, Inc., Redwood City, CA, 94063, USA
Jonathan Kuck

Authors

Rachel Luo
View author publications
You can also search for this author in PubMed Google Scholar
Shengjia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Kuck
View author publications
You can also search for this author in PubMed Google Scholar
Boris Ivanovic
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Savarese
View author publications
You can also search for this author in PubMed Google Scholar
Edward Schmerling
View author publications
You can also search for this author in PubMed Google Scholar
Marco Pavone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Pavone .

Editor information

Editors and Affiliations

Center for Ubiquitous Computing, University of Oulu, Oulu, Finland
Steven M. LaValle
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
Jason M. O’Kane
Department of Aerospace Engineering, University of Maryland, College Park, MD, USA
Michael Otte
Gates Computer Science, Stanford University, Stanford, CA, USA
Dorsa Sadigh
Department of Computer Science, University of Maryland, College Park, MD, USA
Pratap Tokekar

Appendix

1.1 6.1 Proofs

Proposition 1: Under Assumption 1, Algorithm 1 is $\epsilon + 1/(1+|{\mathcal {A}}|)$-safe (with respect to ${\hat{Y}}, {\hat{Z}}$).

Proof

Given a sequence of data points $(Y_1, Z_1), \cdots , (Y_T, Z_T)$, denote the subsequence of “unsafe” data as $(Y_{c_1}, Z_{c_1}), \cdots , (Y_{c_M}, Z_{c_M})$ where $Z_{c_t}$ is the t-th unsafe example (i.e. $f(Z_{c_t}) < f_0$), so $M = |{\mathcal {A}}|$. Suppose that $\hat{Z}$ is also unsafe, i.e. $f(\hat{Z}) < f_0$. Let denote an unordered bag (i.e. it is a set that can have repeated elements). We can bound the safety by

By the assumption of exchangeability we are equally likely to observe any permutation of . Intuitively, it is equally likely for $g(\hat{Y})$ to be the largest, 2nd largest, etc., among $g(Y_{c_1}), \cdots , g(Y_{c_M}), g(\hat{Y})$. Formally, the random variable $|\lbrace t \mid g(Y_{c_t}) < g(\hat{Y}) \rbrace | + U$ takes on all values $\lbrace 0, 1, \cdots , M \rbrace $ with equal probability. Therefore,

We can combine this with the original result to get

1.2 6.2 Lower Bound on the False Positive Rate

Consider a function w that maps a dataset ${\mathcal {D}}= (g(X_1), Y_1), \cdots , (g(X_T), Y_T)$ of unsafe examples, and a new data point $g({\hat{X}})$, to $\lbrace 0, 1\rbrace $. We argue that any w that gives a distribution-free false negative rate guarantee should depend only on the ordering between $g(X_1), \cdots , g(X_T), g({\hat{X}})$, and not on their specific values. In other words, w should take the form defined by

$$\begin{aligned} w({\mathcal {D}}, {\hat{Z}}) = \left\{ \begin{array}{ll} \phi \left( \# \lbrace t, g({\hat{X}}) < g(X_t) \rbrace \right) &{} \text { with probability } \gamma \\ 1 &{} \text { with probability } 1-\gamma \end{array} \right. \end{aligned}$$

(6)

for some deterministic function $\phi $ and real number $\gamma $. We know that when the data is exchangeable, $\# \lbrace t, g({\hat{X}}) < g(X_t) \rbrace $ is uniformly distributed on $\lbrace 0, 1, \cdots , T \rbrace $.

Case 1 Suppose $\phi $ takes the value 0 for at least one possible input; then the false negative rate is given by

$$\begin{aligned} \text {FNR} \ge \gamma /(1+T) \end{aligned}$$

(7)

and the false positive rate is given by

$$\begin{aligned} \text {FPR} \ge 1-\gamma \end{aligned}$$

(8)

so combined we have

$$\begin{aligned} \text {FPR} \ge 1-\gamma \ge 1 - (1+T) \text {FNR} \ge 1 - (1+T) \epsilon \end{aligned}$$

(9)

Case 2 Suppose $\phi $ takes the value 0 for none of the inputs; then the false negative rate is given by

$$\begin{aligned} \text {FNR} = 0, \text {FPR} = 1 \end{aligned}$$

(10)

so we would still (trivially) have $\text {FPR} \ge 1-(1+T)\epsilon $.

So far we have shown that if w were to take the specific form of Eq. (6), then the false positive rate must be lower bounded by $1-(1+T)\epsilon $. In other words, when $\epsilon = o(1/T)$, the false positive rate tends to 1 when T is large.

1.3 Additional Experimental Details: Driver Alert System

Safety score: We define the safety score by the Mahalanobis distance between the ego-vehicle and the agent, where the first eigenvector is aligned with the ego-vehicle’s velocity vector, and the second eigenvector is orthogonal to the ego-vehicle; the magnitude of the first eigenvector is the magnitude of the velocity, and the magnitude of the second eigenvector is approximately half of a car width (we use 1m). Intuitively, this means that agents that are along the ego-vehicle’s velocity vector appear closer than agents in the perpendicular direction. This metric is similar to time to collision (TTC), but it is continuous whereas TTC is not—TTC is infinite unless two vehicles are exactly on a collision course.

Dataset details: The nuScenes dataset includes 952 scenes collected across Boston and Singapore, divided into a 697/105/150 train/val/test split (the same split used for the original Trajectron++). Each scene is 20 s long. The Kaggle Lyft Motion Prediction dataset is a subset of the full Lyft Level 5 dataset (chosen over the full dataset for computational reasons). It includes approximately 16k scenes, divided into an 70%/15%/15% train/val/test split. Each scene is 25 s long. Both datasets include labeled ego-vehicle trajectories as well as labeled detections and trajectories for other agents in the scene. Note that for both of these datasets, because the training split was used to train the Trajectron++ model, we used the validation split as the input training data for Algorithm 1.

Additional experimental results: We demonstrate empirically on the nuScenes dataset that the sum of $\epsilon $ and the false positive rate must be high when there are few (e.g. $< 1/T$) samples, which is consistent with what our theory from Sect. 3.2 would predict. Figure 4 plots the epsilon bound as well as the false negative and false positive rates vs. the number of unsafe samples in the validation dataset; we see that when $\epsilon $ decreases as 1/T, the false positive rate is relatively flat and low.

We also demonstrate empirically on the Kaggle Lyft dataset that the variance on the false negative rate over different train/test splits is low. Table 1 displays the variance on the false negative rate calculated over the 100 trials at each $\epsilon $ value. All of the variances are well below 0.003, suggesting that the test sequence false negative rates are clustered around $\epsilon $ (rather than having some sequences that fail on zero examples and others with catastrophic failures). As further evidence, in Fig. 5, we provide a representative box plot of the false negative rates over the 100 trials with $\epsilon = 0.04$. The variances are indeed clustered around 0.04.

Table 1. Variance on the test sequence false negative rates at different $\epsilon $.

Full size table

1.4 6.3 Additional Experimental Details: Robotic Grasping Experiments

Model and dataset details: The Grasp Quality Convolutional Neural Network (GQ-CNN) from [18] is a model that classifies whether a candidate robotic grasp will be successful. The inputs to a GQ-CNN are a point cloud representation of an object, $\textbf{y}$, and a candidate grasp, $\textbf{u}$. A GQ-CNN outputs the predicted probability, $Q_{\theta }(\textbf{y}, \textbf{u})$, that the candidate grasp will be able to successfully pick and transport the object. We use this predicted probability as the safety score, $g = Q_{\theta }(\textbf{y}, \textbf{u})$. We consider a candidate grasp “unsafe” if it will not be able to successfully pick the object (i.e. the true label is $Z = 0$). Note that this is exactly the ROC curve threshold tuning setup. We use the DexNet dataset of synthetic objects grasped with a parallel jaw gripper [18], which includes approximately 500k pick attempts not used in training the GQ-CNN model. These are divided into a 50%/50% train/test split. Each example is labeled a success if the robot successfully picks and places the object, and a failure otherwise.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, R. et al. (2023). Sample-Efficient Safety Assurances Using Conformal Prediction. In: LaValle, S.M., O’Kane, J.M., Otte, M., Sadigh, D., Tokekar, P. (eds) Algorithmic Foundations of Robotics XV. WAFR 2022. Springer Proceedings in Advanced Robotics, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-031-21090-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-21090-7_10
Published: 15 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21089-1
Online ISBN: 978-3-031-21090-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us