Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

Tucker, Mycal; Aksaray, Derya; Paul, Rohan; Stein, Gregory J.; Roy, Nicholas

doi:10.1007/978-3-030-28619-4_27

Mycal Tucker¹⁴,
Derya Aksaray¹⁴,
Rohan Paul¹⁴,
Gregory J. Stein¹⁴ &
…
Nicholas Roy¹⁴

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 10))

2846 Accesses
4 Citations

Abstract

Our goal is to enable robots to understand or “ground” natural language instructions in the context of their perceived workspace. Contemporary models learn a probabilistic correspondence between input phrases and semantic concepts (or groundings) such as objects, regions or goals for robot motion derived from the robot’s world model. Crucially, these models assume a fixed and a priori known set of object types as well as phrases and train probable correspondences offline using static language-workspace corpora. Hence, model inference fails when an input command contains unknown phrases or references to novel object types that were not seen during the training. We introduce a probabilistic model that incorporates a notion of unknown groundings and learns a correspondence between an unknown phrase and an unknown object that cannot be classified into known visual categories. Further, we extend the model to “hypothesize” known or unknown object groundings in case the language utterance references an object that exists beyond the robot’s partial view of its workspace. When the grounding for an instruction is unknown or hypothetical, the robot performs exploratory actions to gather new observations and find the referenced objects beyond the current view. Once an unknown grounding is associated with percepts of a new object, the model is adapted and trained online using accrued visual-linguistic observations to reflect the new knowledge gained for interpreting future utterances. We evaluate the model quantitatively using a corpus from a user study and report experiments on a mobile platform in a workspace populated with objects from a standardized dataset. A video of the experimental demonstration is available at: https://youtu.be/XFLNdaUKgW0.

Authors thank support in part from the U.S. Army Research Laboratory under the RCTA program, the National Science Foundation. G. J. Stein acknowledges support by a NDSEG Graduate Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Inferring Maps and Behaviors from Natural Language Instructions

Grounding natural language instructions to semantic goal representations for abstraction and generalization

Article 13 August 2018

Grounding Verbs of Motion in Natural Language Commands to Robots

Notes

1.
If $\lambda _i$ is a noun phrase, the corresponding grounding set $\varGamma ^i$ contains the objects in the world (i.e., $\varGamma ^O$). If $\lambda _i$ is a prepositional phrase (e.g., “front of a box”) then $\varGamma ^i$ contains symbols denoting the discretized spatial regions (i.e., front, behind, left etc.) with respect to the objects under consideration (i.e., $\varGamma ^{RO}$). If $\lambda _i$ is a verb phrase referring to the actions that the robot can take (e.g., “move towards a box”, “pick up the block”), then $\varGamma ^i$ contains the set of constraints defined with respect to pairs of regions (e.g., picking a block can be implicitly expressed as a intersection constraint between the robot’s end-effector and the region occupied by the object).

References

Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Google Scholar
Deits, R., Tellex, S., Thaker, P., Simeonov, D., Kollar, T., Roy, N.: Clarifying commands with information-theoretic human-robot dialog. J. Hum.-Robot. Interact. 2(2), 58–79 (2013)
Article Google Scholar
Duvallet, F., Walter, M., Howard, T., Hemachandra, S., Oh, J.H., Teller, S., Roy, N., Stentz, A.T.: Inferring maps and behaviors from natural language instructions. In: International Symposium on Experimental Robotics (2014)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9,1871–1874 (2008)
Google Scholar
Fong, T.W., Thorpe, C., Baur, C.: Robot, asker of questions. Robot. Auton. Syst. (2003)
Google Scholar
Howard, T., Tellex, S., Roy, N.: A natural language planner interface for mobile manipulators. In: International Conference on Robotics and Automation (2014)
Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)
Article MathSciNet Google Scholar
Nyga, D., Beetz, M.: Reasoning about unmodelled concepts-incorporating class taxonomies in probabilistic relational models. arXiv:1504.05411 (2015)
Paul, R., Arkin, J., Roy, N., Howard, T.: Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. In: Proceedings of Robotics Science and Systems (RSS), Ann Arbor, Michigan, USA (2016)
Google Scholar
Ros, R., Lemaignan, S., Sisbot, E.A., Alami, R., Steinwender, J., Hamann, K., Warneken, F.: Which one? grounding the referent based on efficient human-robot interaction. In: 19th International Symposium in Robot and Human Interactive Communication, pp. 570–575 (2010)
Google Scholar
Roy, N., Pineau, J., Thrun, S.: Spoken dialogue management using probabilistic reasoning. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong (2000)
Google Scholar
Tellex, S., Kollar, T., Dickerson, S., Walter, M., Banerjee, A., Teller, S., Roy, N.: Understanding natural language commands for robotic navigation and mobile manipulation. In: National Conference on Artificial Intelligence (2011)
Google Scholar
Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1469–1472. ACM (2010)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision, pp. 391–405. Springer (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Mycal Tucker, Derya Aksaray, Rohan Paul, Gregory J. Stein & Nicholas Roy

Authors

Mycal Tucker
View author publications
You can also search for this author in PubMed Google Scholar
Derya Aksaray
View author publications
You can also search for this author in PubMed Google Scholar
Rohan Paul
View author publications
You can also search for this author in PubMed Google Scholar
Gregory J. Stein
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mycal Tucker .

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Nancy M. Amato
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Greg Hager
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
Shawna Thomas
Department of Electrical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
Miguel Torres-Torriti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tucker, M., Aksaray, D., Paul, R., Stein, G.J., Roy, N. (2020). Learning Unknown Groundings for Natural Language Interaction with Mobile Robots. In: Amato, N., Hager, G., Thomas, S., Torres-Torriti, M. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-28619-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-28619-4_27
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28618-7
Online ISBN: 978-3-030-28619-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inferring Maps and Behaviors from Natural Language Instructions

Grounding natural language instructions to semantic goal representations for abstraction and generalization

Grounding Verbs of Motion in Natural Language Commands to Robots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Unknown Groundings for Natural Language Interaction with Mobile Robots

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inferring Maps and Behaviors from Natural Language Instructions

Grounding natural language instructions to semantic goal representations for abstraction and generalization

Grounding Verbs of Motion in Natural Language Commands to Robots

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation