Robots following voice commands

**Steve Demuth** · 03-05-2024, 11:33 AM

Part of my part-time post-retirement employment is to help my former employer take advantage of new kinds of automation, including large language model and similar AI, and robotics. The advances in bringing AI and automation together continue to astound me. Not saying you should fear for your job yet, but the things that are done today that were unimaginable a decade ago are truly amazing.

The three minute "wow" version on youtube: https://www.youtube.com/watch?v=Vq_DcZ_xc_E

(And, yes, I can tell you all the reasons that's still along way from being useful, although Amazon has deployed some of these robots in a trial run at a warehouse, but that's not the point. The last mile may be as hard as the first 99 all put together, but nonetheless, we are clearly getting into the 90s ....)

**Jerry Bruette** · 03-05-2024, 12:10 PM

Is that robot using vision to determine which bin to place the different types of trash? I noticed that when each pile of trash was moved that there are marks on the floor where the trash was located. Can Digit find the trash, differentiate the types of trash and determine which bin it belongs in or were the locations predetermined for illustrative purposes?

I noticed the bottom of the box opened when Digit picked it up, if the contents fell out would Digit be able to recognize that and go back to clean up the contents that may be scattered around?

This type of automation is at a whole different level than what I worked with at my job before I retired. I find it very interesting, but agree that it we don't have to totally fear for our jobs yet, but it's coming.

**Jim Becker** · 03-05-2024, 1:11 PM

Voice recognition is truly benefitting from AI learning techniques which will continue to drive the adoption of voice control into user interfaces for sure. One thing that I like is that a human can keep their eye on what's happening (whether driving or running some process) while using their voice to create content or control that process. I use it a lot for my own purposes for messaging, both to do so when my hands and eyes are busy or when I'm just not up to typing on a small screen. Those are simple things, but as the voice recognition gets better, one will have to be less particular about certain grammatical things being misinterpreted. An example would be when I pick up an order from Mickey D's for our disabled daughter and I message her that I'll be at her apartment soon. There is a difference in saying "Clown food in three minutes" vs saying "Food from the clown in three minutes" if I want to get a proper transcription. The former currently gets interpreted as the recipient is the Clown

while the latter correctly identifies that the food is coming from the Clown.

(I kid you not) The system will be able to learn better than it does now.

**Steve Demuth** · 03-05-2024, 2:36 PM

(To both Jim and Jerry)

What is remarkable to me is the chain of causality in this demo. First, as Jim says, there is AI based voice recognition and transcription to text. Then there is the vision processing parsing the immediate area into a collection of objects, and using AI to name those objects, followed LLM processing the command text along with the parsed and labeled landscape text to generate a new string of text that, in essence translates "clean up this mess" into "put object A into bin 1, object B into bin 2," etc., followed by LLM/AI again translating that into robot commands, and then the robot executing those commands. (I know that not all of what I just wrote is explained in the video, but I'm deeply enough into doing this at the hospital to know that some version of that is what is going on).

Now, that's remarkable in two ways: first that it actually works, but second in that very little of what I described is programmed into the Robot or its control system in the sense that most people understand the notion of computer programming. There is, to be sure, traditional programming in the drivers and basic control systems for the machines, but the things I described are done primarily through training of models, where the programming is meta-programming (building of the teaching-learning systems), and the actual operational software is the trained models. So this robot has been taught to do its job, not programmed to do it.

Mostly I'm ok with being old and close to being finally put completely out to pasture, but some of this stuff is so darned exciting I get rather cranky about the fact that I'll never really be part of it.

Sigh....

**Jim Becker** · 03-05-2024, 8:38 PM

The reason that things are not programmed in like "back in the dark ages" is because the systems become self-learning...a major factor with AI. They "program" themselves from data and context and a whole bunch of other things. It's pretty remarkable. But, of course, that doesn't bode well long term for human coders who like to type...

**Steve Demuth** · 03-05-2024, 9:33 PM

Yup (Building AI for medicine was part of my job before I retired. It's not the techniques that amaze me, it's the pace of progress in applying them to complex, multi-model problems like making a smart, bipedal robot box mover).

Thread: Robots following voice commands

Thread Tools

Rate This Thread

Display

Robots following voice commands

Posting Permissions