Next IT software engineers present paper on novel machine-learning tactics

Whether you’re reentering the departure city you use for nearly every flight that you take, or you’re once again picking which of the many people in your contacts list named Cathy you want to text, you’ve no-doubt endured time-wasting, repetitive requests for information when using your PC, tablet or smartphone.

What if your smartphone could save you time by learning your preferences for a wide variety of inputs, automatically applying them the next time they’re called for? Next IT software engineers Ian Beaver and Joe Dumoulin have discovered a novel way to repurpose a Google-developed technology, allowing them to efficiently extend this data-intensive personalization technique to the customers of the world’s largest companies.

Learning from the past

A leading technique for programming computers to adapt “memories” of past actions to similar situations is called Case Based Reasoning, or CBR. CBR-enabled systems are designed to learn how to accomplish a task by saving information that’s specific enough to solve it, but that’s also generic enough to apply to situations that are similar. Constructing such a system begins with programming it to recognize which variables are considered to be the identifying features of a given task, and then testing those assumptions.

Let’s say that an airline wants to streamline their customers’ online ticket-purchase experience. Some aspects of a frequent traveler’s airline-ticket purchases – time of purchase, destination, etc. – are likely to differ from one trip to the next, but a CBR-enabled system will be designed to ignore these differences while detecting that the traveler’s departure city and preference of an aisle seat are nearly always the same.

Once these significant variables are identified, “cases,” or snapshots of what happens in the conversation (in CBR systems that Ian and Joe develop, information is gathered over the course of dialog between the user and an intelligent virtual assistant) before and after these pieces of information are called for, are saved. If there are enough cases where the user chooses “Seattle” as a departure city, the system asks the user if they’d like to automatically use it in the future. Once verified by the user, “departure city = Seattle” now becomes a rule that is applied in each subsequent interaction – or until the user decides that they’d like to modify their preferences.

In the system developed by Next IT’s Beaver and Dumoulin, users can even control how many repetitions it takes for a custom preference to be validated, and a statistical element is also introduced to account for one-off irregularities.

Dealing with data overload

For CBR to be effective, every task-specific user interaction must be stored, and for an enterprise client like an airline, this means the system must cope with millions upon millions of cases and a high demand for service. (In the CBR community, dealing with even tens of thousands of cases in a short time frame is considered impressive.) To provide value to the user, analysis of these cases must be performed in near real-time, as a traveler would expect the system to remember their preferences the next time they book a flight. Using the techniques currently employed by the CBR community, this would be impossible.

Ian and Joe saw a solution in MapReduce, a technology introduced in a 2004 Google research paper, but never applied to the field of CBR.

Rather than relying on large, expensive servers and storage, MapReduce and its related technologies deal with large amounts of data through a technique called parallel, distributed processing: data is split equally over a large number of low cost commodity servers, each of which perform the same tasks on its piece of data simultaneously. The technology is also unique in that it allows for horizontal scaling (the ability to distribute both the data and the load so that performance remains constant as storage demands increase) simply by adding more servers.

In testing, the team found that their personalization service had more than enough speed to satisfy the real-time needs of a large enterprise client, and work is currently underway to make the technology available commercially.

International Conference on Case Based Reasoning

At the International Conference on Case Based Reasoning, which took place July 8 – 11 in Saratoga Springs, NY, a gathering of academics – usually skeptical of presenters from the business world – were impressed by a poster session presented by Ian, describing how MapReduce technology, originally developed for free form text processing and indexing, could be applied to the field of CBR.

When asked why Next IT would make its discoveries available to the world, Ian replied, “We benefitted from another company’s technology, and we believe in making our insights available so others can benefit from them. Plus, the paper is a great stamp in time.”

To read more about the technology, read Ian and Joe’s paper, Applying MapReduce to Learning User Preferences in Near Real-Time.