AI Glossary & Learning Hub
Quick Links: AI Learning Hub AI Glossary
AI Learning Hub
A
AI (Artificial Intelligence): The broad name for creating systems that can perform tasks that have historically required human intelligence (e.g. reasoning/analysis, perception, decision-making).
AI Agent: Autonomous or semi-autonomous AI entities that can perform tasks, make decisions, and call tools or APIs based on goals. In academic and enterprise settings, agents are often used to automate workflows like workflow automation, task routing, or multi-step reasoning.
Algorithm: A set of rules or instructions a computer follows to solve a problem, perform a task, or learn from data.
Anomaly Detection: The process of identifying outlier data points that significantly deviate from most of the dataset.
Artificial Neural Network (ANN): A data processing model inspired by the structure of the human brain, used to detect patterns and learn from data.
B
Bias: Systematic error in AI predictions due to flawed assumptions or poor data.
C
Chatbot/Virtual Assistant: An interactive AI system that simulates human conversation, often powered by Large Language Models (LLMs), and provides answers to user queries. (See LLM).
Classification: A supervised learning task where the output is a category label, e.g., whether identified risks in an incident report for a workplace accident indicate a potential for a serious injury or fatality (PSIF).
Clustering: An unsupervised learning technique that groups similar data points together (e.g. customer segmentation, the process of dividing a customer base into different groups based on shared characteristics).
Computer Vision: AI that enables computers to interpret and analyze visual information from the world (e.g. facial recognition software, or 3D motion capture ergonomics software).
Confusion Matrix: A table showing the number of true positives, true negatives, false positives, and false negatives for an AI model, and used to evaluate its performance.
Context window: The context window is the maximum number of tokens (words or parts of words) that an AI model can process and consider simultaneously when generating a response. It is essentially the “memory” capacity of the model during an interaction or task. Models with larger context windows can handle larger attachments/prompts/inputs and sustain “memory” of a conversation for longer.
D
Data Labeling: The process of annotating data (e.g. tagging photos with objects) to train supervised AI models.
Data Mining: A way of analyzing data to identify patterns and glean insights to identify the larger story behind the data.
Deep Learning: A subset of AI/ML that uses complex neural networks with many layers for tasks like speech recognition or image processing.
E
Embedding: The process of numerically representing non-numerical data (e.g. words or images) so that it enables modeling and becomes machine-readable in vector space.
Explainability (XAI): The ability to understand and interpret how an AI model makes decisions.
F
G
Generative AI: AI that creates new content (text, images, audio, etc.), often using models like GANs or transformers (e.g. ChatGPT, DALL·E).
Generative Pretrained Transformer (GTP): A large language model architecture designed to generate text that reads like something written by humans. ChatGPT’s output is a common example.
Ground Truth: A term for the accurate, real-world data used as a benchmark to train or evaluate AI models.
H
Hyperparameter: A configuration values that allows you to train AI models with specific characteristics (like learning rate or batch size) that you set before training a model. In this way, they’re different from parameters that the AI model learns.
I
Inference: The process of a trained AI model making predictions on new data.
Interpretability: Some machine learning models, particularly those trained with deep learning, are so complex that it may be difficult or impossible to know how the model produced the output. Interpretability often describes the ability to present or explain a machine learning system’s decision-making process in terms that can be understood by humans. Interpretability is sometimes referred to as transparency or explainability (see transparency and explainability).
J
K
L
Label: The target variable used in supervised learning (e.g. “dog” in an image of a dog).
Large Language Model (LLM): A deep learning model trained on massive text corpora (i.e., an extremely large collection of written text) to understand and generate natural language.
M
ML (Machine Learning): A subset of AI focused on training algorithms to learn from datasets and provide useful feedback to improve performance, without being explicitly programmed to do so.
Model: A mathematical system or algorithm trained to recognize patterns in data and use those recognized patterns to make predictions or generate new content.
Model Drift: Degradation of an AI model’s performance over time due to changing data patterns.
Multimodal Model: A multimodal model is an AI model capable of processing and generating multiple types of input/output — such as text, images, audio, and video. Multimodal tools (e.g., GPT-4o with vision) can, for example, describe an image and generate captions or code from a diagram.
N
Natural Language Processing (NLP): A field of AI focused on enabling machines to understand and generate human language (text or speech).
O
Overfitting: The name given to a situation in which an AI model learns the training data too well—including noise or irrelevant patterns—and therefore performs poorly on new data.
P
Prescriptive Analytics: Use of technology to analyze data for factors such as past and present performance and different scenarios to help organizations make better strategic decisions.
Prompt: Input that a user provides to an AI generative model to get certain types of output.
Prompt Engineering: The practice of designing effective inputs to guide the output of generative models like GPT.
Q
R
RAG (Retrieval-Augmented Generation): RAG is a method that combines a language model with external sources added by the user, such as documents, PDFs, or other materials. While language models can generate clear and human-like responses, they don’t automatically have access to this added content. RAG retrieves relevant information from those sources, allowing the model to give more accurate and grounded answers.
Regression: In AI, a regression model is trained on data in which inputs and outputs are known, to enable the model to predict outputs for new, previously unseen inputs.
Reinforcement Learning: A type of AI/ML where agents learn optimal behavior through rewards and penalties in an environment. Reinforcement learning is commonly used in robotics and modern video games.
S
Structured data: Data that is defined, formatted and searchable, e.g. data arranged into rows and columns.
Supervised Learning: An AI approach in which you train the model on labeled input-output pairs. One example would be training an AI model to scan email text (input) and provide a classification (output), so that it would correctly classify an email supposedly from the IRS that says, “click this link to pay your tax penalty immediately!” as phishing.
T
Token: In NLP, a basic unit of text (a word, a part of a word, or character) processed by a model.
Training: The process of feeding data into an AI model so it can learn relationships and patterns and provide useful output when given new data.
Transfer Learning: Reusing a pretrained model on a new, related task (e.g., adapting a model trained on general text to legal documents).
Transparency: Another term for explainability or interpretability (see explainability and interpretability),
Turing test: A test created by computer scientist Alan Turing to determine whether a machine or artificial intelligence can demonstrate intelligence comparable to that of humans, especially in language and behavior. Generally, a human evaluator assesses a conversation between a human and AI, and if the evaluator can’t distinguish which is which, the AI has passed the Turing test.
U
Unsupervised Learning: An ML approach where the model tries to find patterns or groupings in unlabeled data.
V
Validation Set: A portion of the dataset used to evaluate a model’s performance during training.