Learning Words with Pictures

Natural language processing (NLP) machines have made great progress by learning to recognize complex statistical patterns in sentences and paragraphs. Work with modern deep learning models such as the transformer has shown that sufficiently large networks (hundreds of millions parameters) can do a good job processing language (e.g., translation), without having any information about what the words mean.

We humans make good use of meaning when we process language. We understand how the things, actions, and ideas described by language relate to each other. This gives us a big advantage over NLP machines – we don’t need the billions of examples these machines need to learn language.

NLP researchers have asked the question, “Is there some way to teach machines something about the meaning of words, and will that improve their performance?” This has led to the development of NLP systems that learn not just from samples of text, but also from digital images associated with the text, such as the one above from the COCO dataset. In my latest iMerit blog I describe such a system – the Vokenizer!

The Road to Human-Level Natural Language Processing

Language is a hallmark of human intelligence, and Natural Language Processing (NLP) has long been a goal of Artificial Intelligence. The ability of early computers to process rules and look up definitions made machine translation seem right around the corner. However language proved to be more complicated than rules and definitions.

The observation that humans use practical knowledge of the world to interpret language set off a quest to create vast databases of human knowledge to apply to NLP. But it wasn’t until deep learning became available that human-level NLP was achieved, using an approach quite unlike human language understanding.

In my latest iMerit blog I trace the path that led to modern NLP systems, which leave meaning to humans and let machines do what they are good at – finding patterns in data.