Cognitive computing is all the rage these days. But what is it, really? I’ve been thinking about it quite a bit lately, and I believe I have come to a few novel conclusions.
Wikipedia has a nice long article about Cognition. It expansively covers a great many things that I would agree are “cognitive”, but not (yet) “cognitive computing”. I’m interested in writing cognitive software; not is constructing an full, artificially intelligent, “faux human”. So, I’ll focus only on just cognitive computing.
Rob High, CTO of IBM’s Watson Group, defines cognitive computing as four “E”s.
- Cognitive systems are able to learn their behavior through education
- That support forms of expression that are more natural for human interaction
- Whose primary value is their expertise; and
- That continue to evolve as they experience new information, new scenarios, and new responses
- and does so at enormous scale.
I agree with education and evolve, although I see these two as similar concepts. To make these ideas fit my somewhat artificial classification system below, I rename this idea to adaptive.
However, I disagree with limiting the definition to human expression. There are many processes that I believe require cognitive skills that are not naturally interpreted by humans. Dolphin and bat echo-location are good examples; they are a kind of “seeing” but humans can’t do it. Any application that can monitor the network communication into and out of an organization and correctly identify data leakage gets my vote for “cognitive”, even though humans can’t do it.
Ambiguity is a better criterion than human expression.
Many human expressions are difficult to interpret because they are ambiguous. I offer the following two examples.
- Natural language is very ambiguous. The classic sentence “Time flies like an arrow; fruit flies like a banana” has many different possible interpretations. Sentences can have ambiguous parses (is “time” a noun, or is it an adjective modifying “flies”; is “flies” a verb or a noun; etc). Words can be ambiguous, commonly called word sense disambiguation (WSD). Is “bass” a fish or a kind of musical instrument?
- Human emotions are ambiguous. They require interpreting facial expressions, body language, sarcasm, etc. And people often disagree on the proper interpretation of a person’s emotion: “Is Bob angry at me?” “No, You know how he is. He was just making a joke.”
To be more precise, an input is ambiguous when there are multiple output interpretations consistent with that input. The goal is to determine which output interpretation(s) are, in some sense, most appropriate. Many elements (surrounding context, background knowledge, common sense, etc.) help decide which interpretations are most appropriate.
Pushing this idea farther, we should change from discussions of structured data vs. unstructured data and start discussing unambiguous data vs. ambiguous data.
From: structured data vs. unstructured data
To: unambiguous data vs. ambiguous data
There are many cases where structured vs. unstructured misses the point. A row of structured data is easy to process not because it is physically separated into separate fields. It is easy to process because here is only one way to interpret that row. Structured data can even be ambiguous, in which case we need to “clean the data” (remove ambiguity). Java code has exactly the same structure as natural language, but compilers are not “cognitive” because the Java programming language is unambiguous.
The fundamental problem is to accept an ambiguous input plus its available context, and search through the space of all possible interpretations for the most appropriate output(s). That is, a cognitive process is a search process.
In the Programmable Era, programmers were able to resolve low levels of ambiguity by the seat of their pants, either because there were few possible interpretations or because interpretation resolution could be “factored” into a sequence of more or less independent resolution steps. But as the amount of ambiguity increases, programmers are unable to satisfactorily resolve ambiguity by the seat of their pants. In the Cognitive Era, programmers need Ambiguity Resolution Frameworks (ARFs) to help them process large amounts of ambiguity. Machine learning is one kind of ARF which takes as input multiple features (each of which can be understood by the programmer) and combines all the features together to resolve down to few interpretations (note that I’m not requiring ARFs to perfectly resolve all ambiguity to a single interpretation). The Cognitive Era is largely populated by cases where imperfect resolution of large interpretation spaces is an unavoidable consequence of the input’s irreducible ambiguity.
I also disagree that expertise is a defining criterion for cognitive computing. A better, more inclusive, criterion is action.
Only humans can accept and interpret expertise. Requiring a cognitive system to output expertise necessarily forces a human “into the loop”. While appropriate is some cases, it is wrong to require a human in the loop of every cognitive system. Rather, we should encourage the development of autonomous systems that are able to act on their own. The distinction between expertise and action is not completely black and white: Watson’s Jeopardy! system did both by ringing a buzzer (action) and providing an response (expertise).
Many years ago, IBM defined the “autonomic MAPE loop” consisting of four steps: M: monitor or sense, A: analyze, P: plan, and E: execute or effectors or act. Not all cognitive systems must contain a MAPE loop, but I see it as more inclusive than the 4 E’s above. Expertise is best characterized as the output of the Analysis step, requiring a human to perform the Plan step. The Observe-Interpret-Evaluate-Decide loop is similar to the MAPE loop, with Observer=M, Interpret & Evaluate=A, Decide=P & E. But they both end with an action.
So instead of the 4 E’s, I suggest we define cognitive computing by the 3 A’s: Adaptive, Ambiguous, and Action.