I believe that we are now witnessing what Gartner calls the peak of inflated expectations relating to the Machine Learning (ML), Deep Learning (DL) and Cognitive computing hype. The hype has caused confusion in the market and is leaving behind many disillusioned early adopters. The irony in all this is that ML/DL has proved to be an extremely effective technology in many aspects of AI automation such as speech and image analytics but the hype risk damaging the wider adoption of the technology.

    At the heart of the hype being propagated by many vendors, journalists, consultants and ‘thought leaders’ is the myth that Machine learning is the ultimate free lunch whereby you can rapidly deploy an AI solution that can automatically learn decision automation tasks from historic or real-time data thereby removing the need for any rules/decision coding effort. The reality is that in most intelligent automation applications ML is required to be deployed in conjunction with the mature rules/decision automation technology.  My aim in this article is to cut through the ML/DL hype, outline the building blocks of intelligent automation, review where DL/ML has proved very effective and the basket of other AI technologies that are required to deliver successful automation solutions.

     

    Decision Automation Sensing

    This component provides the interface to both enterprise information systems and to the real physical world through devices and sensors. The important recent technology advances in this area are:

    • RPA/API : The maturity of Robotic Process Automation (RPA) and the increased availability of APIs and REST web services now make it easy for the intelligent automation solution to interface to enterprise information systems to consume data and app services.
    • Live video: The availability of live-streaming video data
    • IOT : The increasing availability of low cost sensors on physical assets and the ease of connecting to the sensed data using the Internet of Things (IOT) connectivity protocols

     

    Decision Automation Perception

    Perception is the process of interpreting the unstructured data that are captured through Sensing. This covers documents, videos and speech data. There have been major advances in AI perception and Deep Leaning has been a real success story here and played a key role in this.

    • Image / video processing. Deep Learning has now made it easy to analyse and identify patterns in images and videos (e.g. face recognition, analysing medical scan images etc.) and can exceed human abilities in certain domains of pattern recognition. Cognitive web services are readily available from IBM, Microsoft and others that can be easily consumed by intelligent automation applications.
    • Speech recognition and translation.  Again DL has been a real success story here and can achieve accuracies of more than 95% as can be seen from the success of Amazon echo, Siri, Google home and Skype translator.

    Natural Language Understanding & Processing (NLU/NLP) is an area of Perception where there has been both hype and some success stories. Successes include the ability to accurately classify/categorise documents (spam filters are a good example), the ability to measure sentiments in a document, the ability to measure similarity between documents, and the ability to parse structured data out of documents. However, true Natural Language Understanding remains beyond the reach of ML/DL. Take the simple example of a one page document describing a trouble-shooting procedure for a piece of equipment. This document can easily be interpreted and used by a human user but there is no Machine Learning technology that can convert this document into an automated intelligent task.

    Finally it is worth noting that Cognitive Services typically provide easily accessible DL algorithms that have already been trained on large volumes of data for use in image/speech/text processing.

    Decision Automation Reasoning

    Reasoning is the engine room enabling the intelligent automation system to make decisions, solve problems and take actions based on processing the sensed and perceived information obtained from the environment. Reasoning covers monitoring the environmental for events, looking for problems / anomalies, diagnosing problems and trouble shooting, identifying performance improvement opportunities, checking for regulatory compliance, predicting events, configuration, planning etc.

    For most intelligent automation solutions, reasoning is the area currently least suited for Deep Learning / Machine Learning contrary to the media / vendor hype. This is because the ‘Intelligence’ required for reasoning comes from 4 sources:

    i- Common sense knowledge about the world also known as General AI which is the most difficult to capture/automate using any currently available machine learning technology.

    ii- Vertical Domain specific expertise gained through studying text books, instruction manuals, regulations, and following advice/guidelines/training from other domain experts. Such expertise is most suited for automation using decision / rules automation technology and is currently impossible for DL to understand/process and turn into decision models.

    iii- Vertical Domain specific expertise gained through past experiences of real world events. Past experiences represent historic data which if sufficient in volume and event coverage can be used by Machine Learning to learn decision & predictive models. In some domains, such as retailing and high volume discrete manufacturing, large volumes of data are available on events / problems / outcomes and therefore ML can be used effectively to learn decision models. However in other domains, such as complex manufacturing / equipment, there may only be a small amount of data relating to historic major problems and therefore Machine Learning cannot be used to model decisions and decision automation is required to automate human expertise.

    iv- Problem solving expertise. Human experts have strategies for solving problems using available evidence/information. Such strategies can either be represented as structured Decision Flows that orchestrate the invocation of a number of sub decisions tasks and/or as a set of next best action of rules. It is very rare in real world domains that data is available covering a large number of strategy scenarios and therefore ML cannot be used to learn problem solving strategies. Google’s Deepmind go-playing AI worked very well in a very large but finite / bounded game domain where DL can simulate any number of required strategy scenarios in order to learn problem solving strategies. So whilst Deepmind represents a very impressive achievement, it has limited applications in non finite real world domains that cannot be fully simulated. Again Decision automation can be used to automate problem solving strategies.

     

    Human Interface

    Recent major advances in speech recognition and natural language processing, powered by Machine learning, have created the potential for a conversational / chat based computer-human interface. However, in order to achieve a true conversational interactions between the automation system and the human user, we need to overcome 2 technological challenges:

    i- Converting human speech into text and vice versa.

    ii- The ability of the automation system to hold a focussed/goal driven back and forth conversation with the human user.

     The first challenge has already been overcome as recent advances in DL has enabled many speech recognition technologies such Google Speech API to achieve accuracies exceeding 95%. The conversion of text to speech is of course relatively easier to achieve. So the current state of art allows voice interactions to be as viable as text chat interactions.

    The second challenge is where most attempts at conversational interface fail. There are two sub challenges here. Firstly, the Reasoning module of the automation system has to ‘understand’ the intent or meaning of the human text response. Secondly, the Reasoning module has to maintain an intelligent thread for a goal/purpose driven back and forth interaction with the human. For this reason, Conversational chat-bots (such as Amazon echo) have so far been limited to users making simple queries to find relevant content or to place simple orders with the automation system responding with intelligent results. Detailed conversational consultations or ordering complex products / services remains difficult to achieve.

    I believe that detailed / deep conversational interactions have been difficult to achieve because most chat-bots platforms are focussing on the wrong Reasoning technologies. Generally speaking there are 2 approaches to achieving a conversational chat bots:

    i- Scripted conversations: Conversations are driven by scripts either coded in if-then-else script syntax or drawn as simple decision trees. Text responses from users are parsed for key-words to provide answers to drive the scripts.

    ii- Machine learning driven conversations: This is the holy grail of automating computer-human interactions whereby the machine learning algorithm learns from its interactions with the human users to improve its conversational abilities both in terms of understanding human requests / answers and deciding how best to respond. This is such a complex machine learning task that it is likely to remain limited to generating conversations related to delivering content to users based on simple user queries. Don’t expect anytime soon for a machine learning algorithm to learn from interacting with users how to deliver best practice regulated financial advice or how to diagnose and trouble-shoot complex machinery!

    At XpertRule, we have adopted a different approach to developing complex conversational chat bots. Rather than attempting to re invent the wheel completely from scratch, we have built upon an established technology! Expert System technology, which is based on decision / rules automation, is very good at generating detailed question and answer based consultations. The user can select the initial topic of consultation but thereafter the thread of questioning is maintained by the decision engine. By adding a ‘chat layer’ to the decision engine we have managed to achieve impressively complex conversational chat bots. The ‘chat layer’ can apply state of the art text parsing and text similarity measures to derive, from chat text, the answers to the expert system questions. Because the expert system is driving the conversation, the scope of ‘chat ability’ for any individual question is highly localised and focused and therefore we can achieve excellent domain specific parsing / understanding of natural language text for any question. We have successfully used this approach to develop a fully voice driven complex chat-bots for financial advice, Local government Building regulation advice, diagnostic and trouble-shooting of complex powder processing equipment, and others.