Since the late 1960’s and the groundbreaking movie “2001: A Space Odyssey” the idea of learning machines has slowly crept into our consciousness and almost always in an ominous context: machines are getting too smart for our own good.
Fictional smart machines from Hal in the 1960s to The Gunslinger in the 1970s, The Terminator in the 1980s, and into the new century with the Matrix, all play on the same theme: one day we humans will invent artificial intelligence which will replace us as the dominant life on the planet.
These stories always include one very crucial element, machine learning, or more to the point, machine self-learning. The idea that machines can self-learn has been fascinating and terrifying audiences for more than 50 years. Although an entertaining albeit far fetched discussion, there are much more relevant and highly impactful business applications that require at least a basic understanding of Artificial Intelligence.
AI: The Basics
In the simplest definition, Artificial Intelligence is any machine action or perception that displays characteristics we associate with natural intelligence. Most often this takes the form of pattern matching (categorizing things based on similar things), problem-solving from incomplete or fuzzy data, and learning to solve a problem without an explicitly programmed answer.
Artificial Intelligence and Machine Learning
Over the last ten years or so the trend has been to interchange “AI” with “machine learning” and to assume that AI is all about the learning process. That is only partially true.
AI refers to a very broad set of algorithms and heuristics (we’ll get those in a moment). This includes everything “smart,” from facial recognition to expert systems to neural networks and of course machine learning. In other words, machine learning is a subset of AI that is concerned with how a machine gets smart (or smarter), as opposed to it actually being smart.
For example, at some point you learned addition (human learning). At a later point, you used addition. Both are “human intelligence” in action, but only the former is “learning,” while the latter is intelligent application.
Human learning is a subset of human intelligence, just as machine learning is a subset of artificial intelligence.
Algorithms and Heuristics
In the simplest terms, AI is simply a process that mimics human decision making and/or pattern matching. AI, much like natural intelligence, sacrifices some measure of accuracy in exchange for speed and broader applicability. Those processes are known as heuristics.
What we think of as a typical algorithm is a process guaranteed to produce an output according to what we already know is the correct (and presumably optimal) answer. In contrast, a heuristic is guided by an algorithm that has no such guarantee. A heuristic is looking for the “good enough” answer, and depending on parameters of the heuristic, can even produce different results with same or similar input.
So why use a heuristic? Speed and generalization. Heuristics give up a little accuracy for a lot of gains in speed, efficiency, and generalizability. Heuristics can even find answers we don’t know exist. This is also the basis for natural intelligence.
For example, the simple process of addition is an algorithm. We have math laws that tell us how to perform addition and they always work exactly the same. Two plus two is always four.
Now consider facial recognition. If you see a friend at a distance partially turned away from you, 99% of the time you can correctly identify your friend, but sometimes you get it wrong and maybe it’s just someone that looks like your friend. Your brain is using heuristics.
Natural intelligence is all about this trade-off. As we go about our day, we can’t afford the time to be perfectly accurate for every data point and decision. Our brains take in millions of data points per second and we can’t afford to process them all perfectly. We take what we need, make a good enough choice, and move on.
Artificial intelligence uses the same concept. Being correct 99.8% of the time, thousands of times a day, can be more important and more useful than being correct 100% of the time but only twice a day.
In addition, heuristics offer the ability to generalize. If you give an algorithm something it wasn’t designed to handle, it doesn’t work. It may give an answer, but it is likely wrong.
Heuristics are not so rigid. They are designed to be able to handle fuzzy data and return the best answer that is likely to be at least useful, if not perfectly correct.
Going back to the example of recognizing your friend from across a room: once you see it’s not your friend, but this person looks strikingly similar, you might say “You two could be sisters!” Maybe it turns out they are actually sisters. The facial recognition heuristic in your brain returned useful information based on unexpected input and led you to a new conclusion that was also useful.
It doesn’t take much to see how important this is for data science and business applications. Imagine predicting business trends or online sales potential second by second and being correct 99.8% of the time. Now imagine being able to take action on those predictions including being able to handle new situations on the fly without reprogramming the entire system. This is what AI offers the business world (not to mention science, medicine, computing, etc.).
Much of the focus today is on machine learning: the process by which a machine gets smart (or smarter), not the process by which a machine uses that intelligence.
Consider this e-commerce application: while shoppers are browsing I want to predict individuals customer’s buying potential so I can influence that potential.
Without machine learning, I could try and come up with every conceivable customer trait and action versus all products and then program all that into a complex proprietary algorithm. What is the likelihood I get all that correct—and account for every detail—and be correct 99.8% of the time—and that algorithm keeps working well in an ever-changing business world?
Now imagine instead of trying to program an algorithm explicitly to tell us what we want to know, we instead use machine learning techniques and let the machine learn about our problem.
In other words, instead of programming the answers ahead of time, we present the machine with data about our problem, we define the solution space (the answer we think we’re looking for), and let the machine draw its own conclusions. In this way, the machine learns much like a human would, through observation.
Take our e-commerce example again. We can gather hundreds of data points per second over the course of days, weeks, or months. All the information we need to predict sales is in that data somewhere, it’s just hard to see and even harder to process (for a human).
Now we give that data to a machine learning algorithm and let it process and it draws conclusions based on what we tell it we need to know. Once this learning is complete, that machine is ready to employ what it learned.
When a customer comes to the site, the machine can look at the current data, make inferences based on historical data in a matter of milliseconds, and make a sales prediction and recommendation. Not only is that prediction very likely to be correct, but also if never-before-seen information is presented, it’s very likely our machine can extrapolate from the new data and still come up with the right answer—or at least a useful answer.
The process of learning described above is what is known as “guided” or “supervised” learning. For the machine to be smart, a human presents data to the machine for learning.
The key point is that a human collects, or at least defines, the data to be used for learning, and tells the machine how to use that data in learning. This can create a bias. My machine is only as smart as the data I choose to give it, which can be a good or bad thing, depending on the problem and the person gathering the data.
Now consider a case where we let the machine make decisions by itself and learn by analyzing the feedback and impact of those decisions—still according to parameters we give it. This is known as “unsupervised” learning. The idea is simple and not unlike how humans learn by trial and error. Our smart machine initially starts out dumb, takes action, observes the results of that action, assesses the outcome, adjusts it’s “thinking” and then tries again—each time getting a little smarter.
In our e-commerce example, the computer would initially try to categorize shoppers in some fashion and likely be wrong most of the time. But occasionally it would get it right, even if it was just by luck. The machine learning heuristic will look at the input data, the prediction made, and the outcome; and then adjust itself slowly over time to be more and more correct.
While self-learning sounds better, it’s not necessarily. It all depends on your problem, your learning process, how data is gathered, what data is available, and many other factors—all of which humans still need to decide.
At Granify, we leverage machine learning to help many of the most successful large retailers in the world optimize the shopper experience. In 2018, we optimized over 1.1 billion shoppers and in 2019 we’re on pace to double that. We’d love to help you too.
Editor's Note: Original post has been updated in 2019 to provide more concise and up-to-date material