Learning Common Sense from Video

Common sense makes humans very efficient learners, so machine learning researchers have been working on ways to imbue machines with at least some ‘common sense’. In a previous blog we discussed using pictures to train natural language processing systems, in a sense giving the systems partial ‘knowledge’ of what words represent in the physical world. ML systems can get even closer to common sense with a little help from video ML models and human teachers.

In my latest iMerit blog I discuss an innovative deep learning architecture that applies the concept of attention, commonly used in sequence models for language processing, to analyze motion patterns in video using only 30 percent of the computations used in previous approaches.

Next I discuss training such a video analysis system to learn the basic language of movement. For this training the human teacher goes beyond typical training data annotation, drawing on knowledge of the physical world to improvise representative examples of the basic concepts of movement. It is hoped that this will give the ML system a bit of ‘common sense’, allowing it to more easily learn new video analysis tasks.

Author: Tom Robertson

Tom Robertson, Ph.D., is an organizational and engineering consultant specializing in harmonizing human and artificial intelligence. He has been an AI researcher, an aerospace executive, and a consultant in Organizational Development. An international speaker and teacher, he has presented in a dozen countries and has served as visiting faculty at Écoles des Mines d’Ales in France and Portland State University.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: