Newsletter 9 - Navigating the AI Maze: A Simple Language Guide to AI, Machine Learning and Beyond
As I have mentioned in prior editions, I do a fair bit of advising to private equity, career aspirants, founders and business owners on the implications of digital technology on their business and their sector. One of the common expressions I have heard, especially starting the fall of last year, is that any business will use “AI/ML” to fundamentally change the economic realities of their incumbent model. This way of thinking is a bit naive, because companies are business models first, not AI or technology first. We first decide what our project is, before selecting the tools; not select the tools first and then figure out what to build after that, do we?
I have no quarrel with businesses throwing out AI/ML, but I wasn't sure what “AI/ML” actually meant. I had read about Artificial Intelligence (AI) over the years and had an idea of what Machine Learning (ML) was. However, in many conversations, I couldn't discern which aspects of this technology would be utilized and how they would yield benefits. I assumed that it was because I didn't understand the topic well enough and having not worked in space ever, I didn't want to mistake the “map for the territory”.
The stoics say, "If you wish to improve, be content to appear clueless or stupid in extraneous matters—don’t wish to seem knowledgeable. And if some regard you as important, distrust yourself."
Then late last year the global phenomenon of Generative AI (genAI) and Large Language Model (LLM) was upon us and again I didn't know what that meant, clearly “Chat GPT'' portended a truly interesting future but what exactly was a LLM and how did it fit within Generative AI and what else might be possible?
In Newsletter 6, I worked to attain a common language understanding of LLMs and Gen AI, which I then articulated to you. It's hard to keep up but now I feel like I at least know what is being talked about in this space. I realize that people are conflating concepts, commingling fact and opinion and; as is our propensity as humans, purporting to be experts when the technology is really recent with a very small group of people with hands-on experience in building. So I decided to summarize and explain at the highest level in common language the terrain that constitutes AI and where ML fits in. I also decided to clarify ML itself and Deep Learning within it.
The Point: A simple english language overview of AI and its components in FAQ form, and how Machine Learning fits in.
Let me start with this symbolic representation to show how we can traverse from broader Computer Science all the way down to Gen AI.
Computer Science (CS)
|
|---> Artificial Intelligence (AI)
|
|---> Types of AI Systems:
| |
| |---> Reactive Machines AI
| |---> Limited Memory AI
| |---> Theory Of Mind AI
| |---> Self-aware AI (theoretical)
|
|---> Evolutionary Stages of AI:
| |
| |---> Artificial Narrow Intelligence (ANI)
| |---> Artificial General Intelligence (AGI)
| |---> Artificial Super Intelligence (ASI) (theoretical)
|
|---> Specific Applications within AI (Non-ML):
| |
| |---> AI Robotics
| |---> Expert Systems
| |---> Fuzzy Logic
|
|---> Machine Learning (ML)
|
|---> Learning Techniques in ML:
| |
| |---> Supervised Learning
| |---> Unsupervised Learning
| |---> Semi-Supervised Learning
| |---> Reinforcement Learning
|
|---> Neural Networks (NN) / Autonomous Neural Networks (ANN)
|
|---> Deep Learning (DL) (a specialized form of ANN)
|
|---> Large Language Models (LLMs) (e.g., Chat GPT)
|---> Other Generative AI (Gen AI) Applications
|---> Discriminative Models
|
|---> Specific Applications within ML:
|
|---> Natural Language Processing (NLP)
CS> AI
AI, in its simplest form, is a branch of Computer Science (CS) that deals with creating intelligent agents via software. These agents can reason, learn, and act autonomously. There are many formal definitions of AI but this is how I describe it to myself. If you wanted to classify AI, given the research into the subject you would end up into a catacomb of attributes, facets and nuance; which would be entirely unhelpful to a typical professional.
TL;DR: If you dont care about the theoretical classifications and want to delve straight into applied AI scroll down to the section focussed on AI used in building outcomes in the third subsection below. If you want to go straight to Machine Learning scroll down to (AI > ML) that section below.
How do we think AI will evolve?
Considering AI's evolution, it broadly progresses through three stages: Artificial Narrow Intelligence (ANI), focusing on specific tasks; Artificial General Intelligence (AGI) (Refer Newsletter 6), surpassing human capabilities in most tasks; and Artificial Super Intelligence (ASI), a theoretical stage where AI's abilities exceed all human intelligence.
Artificial Narrow Intelligence - AI that does a predefined set of tasks really well. Most of the AI applications so far in human history fall into this category. Alexa, Siri, Auto-Complete, Self-Driving Cars, Chess Playing to beat humans, Go Playing to beat humans - fascinating achievements that continue to add value to human endeavor.
Artificial General Intelligence - Highly autonomous systems that outperform humans at most economically valuable work. OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. I am not aware of any systems that meet this criteria, yet. Here is a link to OPenAIs charter, just so you can read for yourself. https://openai.com/charter
Artificial Super Intelligence - Artificial Super Intelligence is the stage of Artificial Intelligence when the capability of computers will surpass human beings. I am not going to worry about it, for now.
How do we classify the types of AI systems?
Now that we know the evolutionary stages, what “types” of AI systems can we have? This Is primarily pedantic, but for completeness there are four at the highest level:
Reactive Machines AI - This is the most common use (please note this is typically not “ML” no matter what you are hearing in the marketplace). These systems operate solely on present data, considering only the current situation. Like the software powering a decision system typically found in industry.
Limited Memory AI - can make informed and improved decisions by studying the past data from its memory. Uses temporary memory to store past experiences and evaluate future actions like self driving.
Theory Of Mind AI - as previously discussed in Newsletter 6, is cognitively aware of the emotional state of other entities including humans.
Self-aware AI - as the name suggests. It's science fiction at the moment. I don't worry about it.
What are ways we can use AI to build useful outcomes?
I find this the most useful way to classify AI and to then understand how ML fits in. There are applications and implications of each one of these techniques that we can internalize to be better informed as professionals in the context of current corporate jargon. Most importantly this understanding will help us discern when we are being subjected to facile inaccuracies or something more authentic and useful. Let's take six high level classifications that are in use in industry at the moment.
Machine Learning - systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and draw inferences from patterns in data. This is what has most people excited and I will further elaborate on this in the essay.
Deep Learning - a type of machine learning, where computers learn to think using structures modeled on the human brain. In Newsletter 6 I explained the layered architecture where a “neuron” is modeled as a mathematical function, which when arranged in layers can be very “deep”. Any such “Neural Network", with 3 or more layers is called “Deep Learning”. This is the type of ML that is showing the most promise from GenAI to autonomic mobility.
Natural Language Processing - this has been around for a while and was a key part of the work done initially by IBM Watson. It helps machines process and understand the human language so that they can automatically perform repetitive tasks. Examples include machine translation, summarization, ticket classification, and spell check.
AI Robotics - Robots are old technology separate and distinct from AI. However, in the context of using AI, robots are nearing the stage where self learning software can power physical objects. We see examples in restaurants where repetitive food prep such as frying is conducted by robotic appendages powered by ML which can use vision, thermal sensors to independently prepare fried foods. You've undoubtedly seen videos of the Boston Robotics robots doing amazing things all the while being slightly creepy.
Expert Systems - designed to emulate the decision-making ability of a human expert. Expert systems are typically rule-based and are built for specific, narrow tasks. They use a set of if-then rules to mimic the logic and reasoning of human experts in a particular field, such as medicine or engineering.
Fuzzy Logic - Fuzzy logic is used to handle the concept of partial truth, where the truth value may range between completely true and completely false. For example, in binary logic, values are limited to two states: 0 (false) and 1 (true). Fuzzy logic can be used in environmental control systems, such as air conditioners and heaters, to determine output based on factors such as current temperature and target temperature.
And there are more ..
As you can see ML is NOT synonymous with AI and there are many useful applications of AI that do not fit within the ML category. Within ML there are many applications using different techniques.
AI > ML
In AI, we have techniques that allow us to build agents that can learn without extensive programming to teach the agents, we broadly call this ML, the level of intervention or supervision creates various types of ML methods.
Within ML, computer scientists have been inspired by the human brain and created approaches that are referred to as Neural Networks or Autonomous Neural Networks (ANN). Even though other “classical ML” techniques exist, these ANNs are of particular interest because they allow us to build something called Deep Learning (DL) which underpins the Gen AI technology.
A quick word on “classical ML”, in general these techniques are dependent on human intervention. It needs experts to identify and label features of the data in question, ANNs (defined below) less so.
In general, if there isn't a lot of data available for the machine to learn from, ML is typically not a good solution.
What are the various ways that an ML model learns?
Even though in Newsletter 6, I explained how Gen AI LLMs learn and respond, there are four high level ways ML models can learn. These are:
Supervised Learning - As the name suggests, this is a form of learning where humans have to intervene and ensure labeled data is available to the ML Model. These data are then used to appropriately “fit” the model, without “overfitting” or “underfitting”. A key benefit of this approach is that it is possible to measure performance (i.e., accuracy) during training to determine how well the model has learned from the data. It's very good for classification problems and is used extensively in spam filtering, Image and speech recognition, recommendation systems, and fraud detection. It is highly likely that in your digital interactions you have encountered the use of such learning models. Some key techniques in use are Naive Bayes, Linear Regression, Logistic Regression, Random Forest etc
Unsupervised Learning - This is what has the world excited because it does not require labeling of the data or too much human intervention. It is therefore referred to as “scalable AI”. As explained in Newsletter 6, this kind of learning does self discovery and classification of the data. Therefore, these methods reveal patterns that humans might easily miss due to an abundance of data or bias in our thinking process. It explores raw data with an unknown structure, discovering patterns and structures that data scientists would otherwise have no idea about. It can therefore do things that were previously not possible. However, as previously mentioned, the pitfalls are that the results may be unpredictable or difficult to understand. Also it is difficult to measure accuracy or effectiveness due to lack of predefined answers during training. In addition to GenAI, these methods are very effective in developing cross-selling strategies, customer segmentation, image and pattern recognition in vast unlabelled datasets. Some key algorithms used are Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Neural Networks, K-Means Clustering and Probabilistic Clustering.
Reinforcement Learning - This technique trains software to make decisions to achieve the most optimal results and to maximize rewards. This method does not necessarily rely upon training data and rather it learns by trial and error. Reinforcement learning is particularly well-suited to problems that include a long-term versus short-term reward trade-off.Text summarization, question answering, machine translation, and predictive text are all NLP applications using reinforcement learning. Robotics. Deep learning and reinforcement learning can be used to train robots that have the ability to grasp various objects , even objects they have never encountered before. As an example, the IBM System that won Jeopardy relied upon RL. Techniques used to train reinforcement learning agents include Q-learning, policy gradient methods, and actor-critic methods.
Semi Supervised Learning - Semi-supervised learning uses labeled data to ground predictions, and unlabeled data to learn the shape of the larger data distribution. Practitioners can achieve strong results with fractions of the labeled data, and as a result, can save time and money. Facebook has successfully applied semi-supervised learning to its speech recognition models and improved them. Reputedly, Many search engines, including Google, apply SSL to their ranking component to better understand human language and the relevance of candidate search results to queries. SSL can make use of pretty much any supervised algorithm with some modifications. For example, merging clustering and classification algorithms mentioned above.
ML > ANN > DL > Gen AI
Ok, back to ANNs, which networks but imagine them as horizontal networks stacked on top of each other. Remember the Transformers in Newsletter 6? They are a type of ANN and as you may recall, it is useful to imagine them just like if we have a line of humans receiving something, they do an operation to it, then pass it to the next line to do another operation on it and so on and so forth. They are used a lot in unsupervised learning.
Deep learning is just referring to ANNs that have more than 3 layers, hence learning from a “deep” network. Given the forward pass and the backward propagation steps it learns on its own assuming it has a large corpus of data to learn from using very powerful chips. This architecture then is used to create Gen AI systems, including the ones based on LLMs, as discussed in Newsletter 6. In addition to generating information, DLs can also be used for “discriminative” work. These are to predict or classify data. Typically reliant on some labeled data (supervised or semi supervised learning) DLs learn the relationship between features, the raw data points and the labels.
That's it, that's the highest level, simplified, lay of the AI space and Machine Learning in particular. Next time someone throws out “AI/ML”, you will know how many things just got smushed together!
The counterpoint: A new arms race is ignited, exploitation and intrigue are sure to follow
Given the utility and possibility of using Machine Learning techniques for Generative work and also Discriminative work, the companies and the nations that control an advantage will dominate all rivals. Militaries see this, clandestine agencies see this (both institutions are probably already using this stuff extensively) and technology providers see this. This will create an ecosystem around the discovery, building and supplying of two things; chips designed for AI use and data to feed these hungry models.
Given these countries with rare earth materials required for semiconductor manufacturing, fab technology and indeed the supply chains are most likely to see the same kind of geo-political machinations that we saw with oil in the 20th Century. This will undoubtedly make for some strange bedfellows and the geopolitics of suppliers and consumers are always dynamic.
Then there is data. We all know the obvious privacy, bias and discrimination, data security, misuse of data concerns. There is also monopolization of data which is not only possible by big tech, it is also possible for governments and public institutions. Opec is a cartel, do we now face the prospects of international data cartels? It's not inconceivable to think that countries with preponderance of disease related data, sequester it and start monetizing it when used for ML work.
The ownership and use of data for machine learning (ML) training in impoverished nations pose significant geopolitical risks, including exploitation, dependency, and increased vulnerability to external influences. Wealthier nations or corporations might exploit these countries' data resources for their own benefit—a form of digital colonialism—without fair compensation, exacerbating economic disparities. For instance, multinational corporations harvesting local data for global AI applications without adequate benefit to local communities exemplify this risk. We ve already seen exploitation of workers in Africa (Kenya as an example) who were paid a pittance to conduct data labeling work for US company AI work. Additionally, these nations often become dependent on foreign AI technologies, compromising their sovereignty and leaving them susceptible to external manipulation.
So despite all of the value that is generated the impending global and societal impacts are just emerging. Given the history of humans, it's going to get messy.
The Aside
In our journey of unraveling AI and ML, it's pivotal to recognize the role of standard benchmarks like those provided by MLCommons. Formed in 2018 by a consortium of leading tech companies and researchers, MLCommons focuses on fostering innovation in machine learning. It's an initiative that brings together industry giants like Google, Facebook, and Alibaba, along with academic institutions to develop open-source benchmarks, datasets, and best practices in machine learning.
One of its significant projects, MLPerf, provides standardized tests that evaluate the performance of machine learning hardware, software, and systems. This benchmark suite has become a critical tool in measuring and understanding the capabilities of emerging AI technologies.
Highlighting the rapid advancement in the field, Nvidia's recent achievement in MLPerf benchmarks is noteworthy. Nvidia's Eos, a 10,752-GPU AI supercomputer, dramatically accelerated the GPT-3 training process, completing it in under four minutes—a task that took 34 days on 1024 GPUs two years ago. This achievement not only demonstrates Nvidia's technological prowess but also the accelerating pace of AI development. Such benchmarks serve as vital indicators of progress, helping us track and understand the rapid advancements in AI and machine learning technologies.
Take care of yourself,
-abhi