Learning Without a Teacher

Machine learning applications generally rely on supervised learning, learning from training samples that have been labeled by a human ‘teacher’. Unsupervised learning learns what it can from unlabeled training samples. What can be learned this way are basic structural characteristics of the training data, and this information can be a useful aid to supervised learning.

In my latest iMerit blog I describe how the long-used technique of clustering has been incorporated into deep learning systems, to provide a useful starting point for supervised learning and to extrapolate what is learned from labeled training data.

The Road to Human-Level Natural Language Processing

Language is a hallmark of human intelligence, and Natural Language Processing (NLP) has long been a goal of Artificial Intelligence. The ability of early computers to process rules and look up definitions made machine translation seem right around the corner. However language proved to be more complicated than rules and definitions.

The observation that humans use practical knowledge of the world to interpret language set off a quest to create vast databases of human knowledge to apply to NLP. But it wasn’t until deep learning became available that human-level NLP was achieved, using an approach quite unlike human language understanding.

In my latest iMerit blog I trace the path that led to modern NLP systems, which leave meaning to humans and let machines do what they are good at – finding patterns in data.

Encoding Human and Machine Knowledge for Machine Learning

iMerit is a remarkable company of over 4000 people that specializes in annotating the data needed to train machine learning systems.

I am writing a series of blogs for them on various aspects of machine learning. In my latest blog I explain how ML systems embody both human intelligence and a form of machine ‘intelligence’.

Just as our biology provides the basis for human learning, human-provided ML system designs provide frameworks that enable machine learning. Through human engineering, these designs bring ML systems to the point where everything they need to ‘know’ about the world can be reflected in their parameters.

Analogous to the role of our parents and teachers, training data annotation drives the learning process toward competent action. Annotation is the crucial link between the ML system and its operational world, and accurate and complete annotation is the only way an ML system can learn to perform well.

The Three Edge Case Culprits: Bias, Variance, and Unpredictability

iMerit is a remarkable company of over 4000 people that specializes in annotating the data needed to train machine learning systems.

I am writing a series of blogs for them on various aspects of machine learning. In my latest blog I explain how ML systems can be fooled by being either too ‘simple‘, too ‘inexperienced‘, or faced with too many surprises.

How Does Mislabeled Training Data Affect ML System Performance?

iMerit is a remarkable company of over 4000 people that specializes in annotating the data needed to train machine learning systems.

I am writing a series of blogs for them on various aspects of machine learning. In my latest blog I explain how inaccuracies in training data labels (‘label noise’) affect ML system performance. It turns out that it’s not so much how many errors that matters, but how those errors are structured.

Thinking About Thinking Machines

AI’s have been developed that respond to human language, drive cars, and play masterful chess. As these feats traditionally require human intelligence, it might be said that AIs possess a form of intelligence.

What do we humans make of this artificial ‘intelligence’? Are AIs intelligent entities in the same sense as we humans? Can machines think?

Thinking about thinking is second nature to us humans. What first evolved as an ability to guess what others are thinking, to better compete and collaborate, further evolved into self-reflection. While other animals self-reflect, humans have a unique ability to conceptualize thinking. While self-reflection is key to human intelligence in general, it shows up particularly in philosophy (Descartes’s famous ‘I think therefore I am’), and it has of course driven the invention of AI itself.

For a long time, philosophers have been thinking about whether machines can (ever) think. The philosophical debate centers on whether there is something inherent in human intelligence that can never be duplicated by a machine .

In a 1980 paper, the philosopher John Searle argues that machines can never achieve human-like intelligence. In his famous ‘Chinese room’ thought experiment, a non-Chinese-speaking man is able to respond, in Chinese, to Chinese messages passed through a slot in the door. He is able to do this without understanding any Chinese, simply by referring to Chinese-to-Chinese correspondence tables.

Searle says that like computers, the Chinese room only simulates thinking, which is clearly different from a person communicating from an understanding of the Chinese language. Thus, AIs do not understand as humans do, and cannot really be thinking. AIs only simulate thinking.

Hubert Dreyfus also believed that machines will never think like humans. In ‘What Computers Can’t Do‘ he argues that human intelligence requires the context of a human body and a human life, which can never be reduced to machine algorithms.

On the other hand, in a 1950 paper Alan Turing concluded that there is no reason a machine might not eventually be judged as ‘thinking’, to the extent we are able to come up with a suitable test. He proposed his famous ‘Imitation Game’ (now called the Turing Test) as a criteria: if an AI can carry on an open-ended conversation with a person and not reveal itself as a non-person, we have no justification to say it is not thinking.

More recently, AI pioneer Geoff Hinton made another argument for the possibility of machines thinking. He believes deep neural networks may eventually achieve human-level intelligence. He points out that our brains work with patterns of billions of elementary elements (electrical and chemical signals) in a way not fundamentally different from the way deep neural networks encode patterns in their billions of parameters.

AI practitioners have tended to view the question of whether machines will ever think as more of a practical issue than a philosophical one. They point out the difficulty in pinning down what constitutes human intelligence, and the difficulty in predicting or ruling out technical breakthroughs. They prefer to look at what AI has accomplished so far, and speculate when continuing progress might produce AIs that match human-level performance.

It seems these speculations have often underestimated how far we have to go. For example, in the 1960’s, AI pioneer Herbert Simon predicted human level intelligence by the 1980’s. In 2006, Ray Kurzweil predicted that a computer with the power of a human brain would be available around 2020 (for $1,000).

We are still waiting!

Inside an AI

Artificial intelligence gets its name from the fact that AIs perform tasks associated with human intelligence, such as recognizing faces or understanding language or playing chess. For these tasks, we can measure AI performance and compare it to human performance, using a single ‘yardstick’, such as accuracy or word error rate or games won.

But can artificial intelligence and human intelligence be compared in a general way, using a single yardstick? Is there a general intelligence scale upon which, for example, humans might average 500, today’s best AIs 275, and future superintelligent AIs 1000?

Of course, it is difficult to measure even human intelligence on a single scale. It is generally acknowledged that measures like IQ tests, while useful as predictors of particular capabilities, do not capture the breadth of human intelligence.

However, putting aside the fundamental difficulty of quantifying intelligence, human or otherwise, can we compare human and artificial intelligence, beyond performance on specific tasks? Should we talk about humans being smarter than AIs, or vice versa? I would say ‘No’. Today human and artificial intelligence are so different that it doesn’t make sense to try to compare them along a single scale.

One striking difference between AIs and humans shows up in the way deep neural networks work. These networks, at the heart of today’s most advanced AIs, learn patterns from huge masses of data, and use these patterns to ‘understand’ things like visual images or language. However the way these networks ‘perceive’, ‘learn’, and ‘understand’ the world is decidedly non-human.

Let’s consider machine translation as an example. First, a little history. In 1954 an IBM 701 computer was programmed with a dictionary and rules that allowed translation of Russian sentences into English. The results were so encouraging that it was predicted that the problem of automatic machine translation would be completely solved in 3 to 5 years.

However in the next 10 years little progress was made. Research in machine translation came to be considered such a long shot that funding was drastically curtailed. Critics at the time pointed out that human translation requires complex cognitive processing that would be extremely difficult or impossible to program into computers. When humans interpret language, we don’t just hear it as sounds or or see it as symbols, we understand it as objects, actions, ideas, and relationships, which is key to our understanding language.

In the next decades, researchers in machine translation tried to get closer to human understanding by developing complex models of linguistic structure and meaning. While this enabled machine translators to gradually improve, human translators performed much better.

In more recent years deep neural networks began to be applied to machine translation, greatly improving performance. By 2016, Google announced the GNMT system, a deep neural network that reduced translation errors by 60% compared to previous methods.

How did GNMT achieve this quantum jump in performance? Did Google engineers finally figure out how to program the kind of understanding into their computers that humans need to make good translations?

The answer to this last question is: “No, quite the opposite!” The designers of GNMT did away with any attempt to incorporate human-like knowledge. GNMT relies on none of the complex models of language structure and meaning used by previous methods.

Instead, GNMT uses a type of neural network called Long Short-Term Memory (LSTM). Basically, LSTM [Note 1] allows sequences of output numbers (translated sentences) to be calculated from sequences of input numbers (sentences to be translated). The calculations in GNMT are controlled by hundreds of millions of parameters. Millions of examples are used to set these parameters, through a trial-and-error adjustment procedure.

As an illustration of how different a deep neural network translator is from a human translator, consider how such a system typically represents a word to be translated. A translator with a 10,000 word vocabulary, for example, might represent each word by string of 10,000 ones and zeroes, with a one corresponding to the word’s position in the vocabulary list, and all the rest of the string zeroes. This way of representing words is called ‘one-hot’ encoding [Note 2]. Experimentation has shown it works well with deep neural networks doing language processing [Note 3].

For example, if the word ‘elephant’ is the 2897th word in the translator’s vocabulary, what the machine translator ‘sees’ when presented with the word is a string with 9,999 zeroes, with a single one in position 2897 of the string. All it ‘knows’ about ‘elephant’ at this point, is that it is word number 2897 in its vocabulary.

Contrast this with a human translator, who probably remembers many things about elephants, as soon as the word is encountered.

The power of a deep neural network comes from its ability to find patterns in word occurrence by analyzing millions of translated documents. Machine translation has always used patterns of word occurrence, for example, which words are more likely to follow other words. However deep neural networks take this to a whole new level, recognizing extremely complex patterns that link and relate many words to many other words.

That these huge deep neural network translators can be built and perform so well is a tribute to years of creative engineering and systematic experimentation in AI. Decades ago, no one really knew that it would be possible to train a network with hundreds of parameters, let alone hundreds of millions. And it was equally unexpected that good machine translation could be done by using only patterns of word occurrence, without any reference to word meaning.

GNMT is an engineering marvel, to be sure. However, its mechanistic translation incorporates nothing about what words refer to in the real world. It ‘knows’ the Japanese sentence ‘Watashi no kuruma wa doko desu ka? translates to ‘Where is my car?’, but it has no idea what a car is (other than the words ‘car’ associates with) or that the question refers to a location on the planet earth that the questioner is likely to walk to.

Today, we compare machine and human translation, and the machines are looking very good. But what does this tell us about how artificial and human intelligence compare? Is this an example of AI catching up to human intelligence? No, it is only machine translation catching up to human translation.

Note 1: Four years is a long time in AI, and further progress has been made since GNMT. Transformer architectures have replaced LSTM as the architecture of choice for many applications in language processing. The evolution from LSTM to Transformer is an example of a fascinating aspect of AI deep learning progress: simpler architectures often perform better, when more compute power becomes available. GNMT’s LSTM is an example of a ‘bi-directional recurrent neural network with memory states and attention’, which is as complicated as it sounds – sentences are sequentially processed through a neural network that updates states that represent how words depend on other words that come before and after, and how far ahead or backwards words make a difference. Transformers do away with a lot of that, and just take in whole sentences at once.

Note 2: Generally, encoding just means transforming one representation to another, according to a set of rules, like converting ‘elephant’ to a string of ones and zeroes. Encoding is used in a couple of other senses in GNMT. The diagram at the start of this blog shows that an LSTM-type neural network can be divided into a front-end encoder and a back-end decoder. The encoder and decoder in this case describe mapping from the input language to the neural network’s internal representation (encoding), and mapping from the internal representation to the output language (decoding). These mappings are what the neural network learns by training on millions of examples. Also note that language translation itself is a form of encoding – a transformation of the input language to the output language.

Note 3: Although one-hot encoding is frequently used in LSTM neural networks, GNMT actually uses a more sophisticated technique that encodes word segments instead of complete words. The network learns to break up words in ways that maximize its ability to make good guesses for word translations outside its vocabulary.

Reason, Emotion, and AI

Human intelligence has always been an inspiration for artificial intelligence. For example, early work in artificial neural networks was inspired by the interconnected axons and dendrites found in biological brains. 

Human intelligence also inspires the tasks AI researchers use to benchmark their machines: tasks are defined that require human intelligence, then researchers attemp to build machines that can perform those tasks. AI has progressively matched or out -performed humans in tasks such as chess, Go, and language translation.

Does this progress in AI mean machines are getting ‘smarter’, in the sense of being closer to having human intelligence? I would say ‘No’. Others would say skeptics like me just keep moving the goal post by saying “Anything a machine can do can’t be intelligence, it’s just code!”

But let’s look at human intelligence. It involves a lot more than the kind of skills demonstrated by a chess grandmaster in a championship game. Although chess mastery is a demonstration of exceptional human intelligence, this skill represents a narrow slice of the grandmaster’s intelligence, the totality of which relies on a complex cognitive architecture shared by all humans. Emotions are part of that architecture.

Emotions are often framed as the antithesis of intelligence and a human weakness. In the 1951 science fiction movie classic ‘The Thing from Another World’, a scientist, Dr. Carrington, marvels at the superiority of an alien mind: ‘No pleasure, no pain… no emotion, no heart. Our superior in every way’. Modern commentators cast our society as minds manipulated by social media, embracing conspiracy theories in the service of anger and resentment, at the expense of reason.

It is true that immense human progress has been made through science and reason, and emotions can stir up real troubles. However, it is clear that reason and emotion work hand-in-hand. Humans wouldn’t have evolved that way if emotions weren’t an essential part of our survival.

So, what role does emotion play in human intelligence? It provides essential context and motivation behind conscious analytical problem solving. It is the reason the chess grandmaster acquired her skill to begin with, and the architect of the path that led her there. The grandmaster’s skilled game play is just the tip of the iceberg. The intelligence required to create that highly specialized, analytical skill, in a brain evolved to survive in the broad context of human life, is truly awesome, and far beyond the rote learning of a deep neural network.

What role does emotion play in AI? There is a branch of AI called Emotion AI, which seeks to develop AIs that recognize and respond to human emotion. While this line of work benefits human – AI collaboration (and unfortunately, manipulation), in my view it doesn’t get at the essential role of emotions in human intelligence.

It’s not that AIs need to be able to ‘feel’ emotions to have human-like intelligence. Instead, AI problem solving would need to incorporate the immense informational context represented by human emotions. Emotions represent lifetimes of experience living embodied in the real world, incorporating a comprehensiveness and appreciation of causality and common sense that has been unmatched so far in AI.

AI and Emergency Management

Artificial Intelligence has found application in many areas, where its particular ability to find patterns in data makes it a useful tool. Let’s look at AI’s application to Emergency Management, a critical activity in today’s world of climate change, pandemics, and social unrest.

Emergency Management seeks to minimize the impact, on people and property, of emergencies such as earthquakes, hurricanes, floods, disease, and terrorism. EM involves four kinds of interrelated and sometimes overlapping activities:

  1. Mitigation (also called Disaster Risk Reduction) – steps taken to reduce the likelihood (e.g., forest management to reduce wildfire risk) or reduce impact (flood protection levies) of emergencies
  2. Preparation – equipping responders and the public with tools and knowledge that will minimize emergency impacts. Examples include stockpiling personal protective equipment for pandemics, and training Community Emergency Response Teams
  3. Response – actions taken during an emergency or in its aftermath, to prevent further suffering or financial loss. International relief efforts after a devastating earthquake are an example of this, as is medical care for the victims of a pandemic
  4. Recovery – work to return communities back to ‘normal’ after an emergency, for example rebuilding destroyed structures or re-opening a shut down economy.

Deep neural networks, currently a leading edge of AI, map patterns in data to useful interpretations of the data, such as the condition of a building, the likelihood of a flood, or the best evacuation route. This kind of information can be very useful for the planning, prediction, situation assessment, and decision making that are at the heart of Emergency Management.

Here are some examples of AI’s use in the four kinds of EM activity:


Mitigation seeks to reduce the risks associated with emergencies and disasters. Two ways this can be done are by recognizing human-made dangers and reducing them, or by predicting dangerous natural phenomena in time for actions to be taken to reduce their impact.

Poor urban areas are especially vulnerable to disasters and poverty data is in scare supply and difficult to collect. Researchers at Oak Ridge National Laboratory in the US have developed a AI-based technique to identify poor, informal settlements from high-resolution satellite imagery. Their approach uses a variety of spatial, structural, and contextual features to classify areas as formal, informal, and non-settlement classes. The method was tested in Caracas, Kabul, Kandahar, and La Paz, and demonstrated that good accuracy could be obtained using the same features in these diverse areas.

El Niño is a climate phenomenon that disrupts normal weather, leading to intense storms in some areas and droughts in others. It happens at irregular intervals of two to seven years, and lasts nine months to two years. The farther in advance an El Niño event can be predicted, the better a region can prepare for it. Recently deep neural networks have been able to forecast El Niño 18 months in advance, which is an improvement of 6 months over previously used methods.

In California, two high school students invented a device to predict the probability of a forest fire occurring. The device is placed in the forest and takes real-time photographs together with measurements of humidity, temperature, carbon monoxide/dioxide, and wind. This data is then used with a deep neural network to predict the probability of a fire.


A primary responsibility of emergency managers is to develop good plans to execute when disaster strikes. Such plans must deal with patterns of natural and social phenomena, and AI can help analyze these patterns and guide effective planning.

For example, Google has been partnering with India’s Central Water Commission to develop AI-enabled flood forecasting and early warning.  Google uses a variety of elements such as historical events, river level readings, terrain and elevation, to run hundreds of thousands of simulations for each location to create river flood forecasting models that can more accurately predict where and when a flood might occur, and also how severe it will be.


Emergency response must provide aid where it is needed. Knowing where and what sort of aid is needed is a challenge, especially in large-scale disasters. Our modern world is flooded with situational information from social media, surveillance cameras (fixed, drones, satellites), and internet-of-things sensors. However, it is very challenging for emergency managers to sort through and interpret this data. This is an ideal application for AI.

A system called Artificial Intelligence for Disaster Response (AIDR) has been developed to help analyze Twitter tweets during emergencies and disasters. The system is available as free and open software, and it is designed to be tailored to responder needs. The responder first identifies keywords and/or hashtags that are used as a preliminary filter for tweets. Next responders identify topics of interest such as “Medical Needs” or “Sheltering”, and manually tag example tweets in each category. A deep neural network then learns to classify relevant tweets in each category, and automatically streams relevant information to responders.

AI is being used in the fight against the ongoing COVID-19 pandemic. Deep neural networks are being used to identify patterns in medical imagery in lungs and heart that will allow early detection and personalized therapies. AI is also being used to identify research and drugs most likely to lead to COVID-19 treatments and vaccines, and to track the disease by monitoring the deluge of data on social media and the internet.


During disaster recovery a wide range of activities are undertaken to attend to casualties and survivors, restore buildings and infrastructure, and re-establish social systems and businesses. When international aid is involved, complex interactions among multiple organizations must be coordinated. Situation assessment, resource allocation, and planning can all be supported by AI’s ability to recognize patterns in data.

For example Google, in collaboration with the United Nations World Food Program Innovation Accelerator, has developed a system for automatic damage assessment using very high-resolution satellite imagery. The system uses a deep neural network to identify buildings and compare their condition before and after the disaster. This automated damage assessment can greatly improve the timeliness and effectiveness of recovery efforts for disasters that damage large numbers of structures, such as the 2010 Haiti earthquake, which required assessment of over 90,000 buildings in the Port-au-Prince area alone.

AI and EM

AI’s pattern recognition capability can be an invaluable asset for planning, prediction, situation assessment, and decision making. These activities are critical to many lines of work, especially Emergency Management.

Success! You're on the list.

AI’s Superpower

Just putting ‘artificial’ and ‘intelligence’ together in the same term is enough to get people pretty excited.

For some, ‘Artificial Intelligence’ can only be a misnomer. True intelligence is uniquely human, biologically evolved, embodied, necessarily shaped by environment and social relationships, non-algorithmic, and unknowable by mere human consciousness. Anything that becomes possible for human-built computers is by definition not really artificial intelligence.

For others, natural intelligence is simply computation performed on a relatively slow biological computer, that took hundreds of thousands of years to evolve. It is only a matter of time before the exponential improvement in computing technology will allow AI to surpass the power of human brains.

The loaded nature of the term AI has also led to a variety of definitions, and identification of subcategories such as Narrow AI and Artificial General Intelligence. Sometimes AI is differentiated from terms such as ‘machine learning’ or ‘automation’.

I prefer a simple and pragmatic definition for AI – technology that can perform tasks previously requiring human intelligence. This definition does not address the limits or scope of AI, it simply acknowledges that we have developed and will continue to develop systems that perform tasks previously requiring human intelligence.

This definition will be too broad for some people’s taste. After all, electronic calculators fit the definition, and nobody considers them AI. However, I think of AI as a pursuit rather than a destination, with a leading edge that continues to advance. In practice, when we talk about AI, we are usually talking about technology near the leading edge.

In a previous post, I addressed why I think AI is neither comparable to human intelligence, nor a threat to humans. But I also think the leading edge of AI, deep neural networks, is very impressive.

Deep neural networks map patterns in data to outputs that represent some useful interpretation of the data, such as the identity of a face or the translation of a spoken sentence. In a sense, this capability is pretty simple; these AIs can be dismissed as mere ‘curve fitters‘. What makes deep neural networks so useful?

Here are three things that give these AIs ‘superpowers’:

  • Patterns are everywhere
  • Data is abundant
  • AI learning extends human programming.

Patterns are everywhere

Recognizing patterns is central to the way we humans live, work, and play. For example:

  • Patterns in our environment tell us what we can eat, where we can find food, when we need to take shelter, and how to turn the wheel of our car
  • Social patterns bond children to mothers, attract mates, expose cheaters
  • Humans impose patterns on their environment – constellations in the stars, orbital mechanics – to enrich understanding and guide exploration
  • Patterns of language – spoken, written, schematic – communicate ideas and directions, and preserve the growing body of human knowledge
  • Patterns are used by detectives to fight crime, and by financial analysts to make money
  • We amuse and enrich ourselves through patterns in music and art, and in puzzles and games.

Data is abundant

Much of our reality these days is represented digitally, on the web or in databases. This gives unprecedented access to information about the patterns central to our lives. If only we had enough eyes and brains to examine and digest this huge volume of data! But this task is a perfect fit for deep neural networks: feed them enough data and they can discover extremely complex patterns.

For example, automatic speech-to-text recognition has been revolutionized by deep neural networks. One such network with 5 billion connections is possible only because it could be trained with lots of data: 3 million audio samples, together with 220 million text samples from a 495,00-word vocabulary. 

AI learning extends human programming

Obviously, it takes humans to program deep neural networks. But these networks are programmed to ‘learn’, in the sense that they adjust their own parameters during the training process.

The fact that very large deep neural networks can be trained and give good results is a relatively recent discovery in AI.  Why these networks work so well is not well understood theoretically, but extensive experimentation has led to innovative designs and good results. This work has been carried out by a thriving, innovative community of AI researchers and engineers, who are building and extending a shared body of open-source software, datasets, and results.

One of the things observed in these experiments is that as deeper neural networks have become feasible, human engineers have needed to do less preprocessing of the inputs to the networks, to identify important features in the data. By letting the networks ‘learn’ what features are important, better results are obtained with less human programming.

An example is automatic speech-to-text recognition, mentioned above. For decades engineers developed these systems using approaches that drew on linguistic analysis of human vocalization and language: speech as composed of elemental sounds, phonemes, which are then built up into words and sentences, all governed by language syntax and semantics. Up through the early 2000’s, systems mirrored this analysis: sounds were mapped to phonemes and possible words, sometimes using neural networks, then symbolic or statistical models of language were used to predict, correct, and make sense of words and sentences.

As effective deep neural networks became available, engineers put more and more of the linguistic analysis burden on the networks. Eventually, networks were trained to directly map sound (digital time-frequency plots) to words, resulting in a dramatic improvement in accuracy.

Artificial Intelligence?

Whether or not AI is really ‘intelligent’, AI research and development continues to move the limit of machine capability.