AI Glossary

Resources

A comprehensive glossary of essential AI terms, curated for academic and professional clarity.

A

Agentic AI refers to artificial intelligence systems that exhibit goal-directed behavior, autonomy in decision-making, and the capacity to initiate actions to achieve specific objectives. These systems operate as agents, meaning they can perceive their environment, evaluate options, and act upon it without requiring constant human intervention.

Agentic AI is distinguished from reactive or purely assistive systems by its ability to pursue goals over time, often using internal models, planning, and adaptive strategies. This raises critical questions about alignment, accountability, and safety, particularly as agentic systems gain influence in complex real-world settings.

A finite sequence of well-defined, unambiguous instructions or rules designed to solve a specific problem or perform a particular task. In the context of artificial intelligence, algorithms serve as the fundamental computational procedures that enable machines to process data, make decisions, and learn from experience. AI algorithms can range from simple rule-based systems to complex mathematical models such as neural networks, genetic algorithms, and reinforcement learning protocols. The effectiveness of an AI system is largely determined by the sophistication and appropriateness of its underlying algorithms, which must be designed to handle uncertainty, adapt to new data, and optimize performance metrics relevant to the specific application domain.

Systematic and unfair discrimination that occurs when artificial intelligence systems produce results that are prejudiced against certain individuals or groups due to flawed assumptions, incomplete data, or biased training processes. Algorithmic bias can manifest in various forms including historical bias (perpetuating past discrimination present in training data), representation bias (underrepresentation of certain groups in datasets), measurement bias (differences in data quality across groups), and evaluation bias (inappropriate benchmarks or metrics). This phenomenon is particularly concerning in high-stakes applications such as hiring, lending, criminal justice, and healthcare, where biased algorithms can reinforce or amplify existing social inequalities. Addressing algorithmic bias requires comprehensive approaches including diverse and representative training data, fairness-aware machine learning techniques, regular bias auditing, and interdisciplinary collaboration between computer scientists, domain experts, and social scientists.

Alignment in artificial intelligence refers to the degree to which an AI system’s goals, behaviors, and outputs are consistent with human values, intentions, and ethical principles. It is a central concern in the development of safe and trustworthy AI, especially for autonomous or highly capable systems.

Ensuring alignment involves designing systems that not only perform tasks accurately but also act in ways that are beneficial and non-harmful to humans. This includes addressing challenges such as value misalignment, reward hacking, and unintended consequences.

Annotation in artificial intelligence refers to the process of labeling data—such as text, images, audio, or video—with relevant metadata that enables machine learning models to understand and learn from the information. These labels can indicate categories, entities, sentiments, object boundaries, or other attributes depending on the task (e.g., classification, segmentation, named entity recognition).

Accurate annotation is critical for supervised learning, as the quality and consistency of annotated data directly impact model performance. Annotation can be manual, automated, or semi-automated, and often involves specialized tools and guidelines to ensure reliability and reproducibility.

A multidisciplinary field of computer science focused on creating systems capable of performing tasks that typically require human intelligence, including learning, reasoning, problem-solving, perception, language understanding, and decision-making. AI encompasses both the theoretical study of computational models of intelligence and the practical development of systems that can simulate or replicate intelligent behavior. The field is broadly categorized into narrow AI (systems designed for specific tasks such as image recognition or language translation) and general AI (hypothetical systems with human-level cognitive abilities across diverse domains). Modern AI relies heavily on machine learning techniques, particularly deep learning, statistical inference, and large-scale data processing. AI systems are characterized by their ability to adapt to new situations, generalize from training data, and improve performance through experience, making them valuable tools across numerous sectors including healthcare, finance, transportation, and scientific research.

A hypothetical form of artificial intelligence that possesses the ability to understand, learn, and apply intelligence across a wide range of tasks and domains at a level comparable to or exceeding human cognitive capabilities. Unlike narrow AI systems that are designed for specific applications, AGI would demonstrate flexible reasoning, abstract thinking, creative problem-solving, and the capacity to transfer knowledge between disparate domains without explicit programming for each task. AGI systems would theoretically exhibit human-level performance in cognitive tasks such as natural language understanding, mathematical reasoning, scientific discovery, artistic creation, and social interaction. The development of AGI remains a significant challenge in computer science, requiring breakthroughs in areas such as knowledge representation, causal reasoning, meta-learning, and consciousness modeling. The timeline and feasibility of achieving AGI remain subjects of considerable debate among researchers, with implications for society, economics, and the future of human-machine interaction that continue to drive both scientific inquiry and policy discussions.

B

A systematic deviation from accuracy, fairness, or representativeness in artificial intelligence systems that can occur at multiple stages of the AI development lifecycle. In AI contexts, bias refers to the tendency of algorithms or models to produce results that consistently favor certain outcomes, groups, or perspectives over others in ways that may be inappropriate or discriminatory. Bias can originate from various sources including training data that is unrepresentative or reflects historical prejudices, feature selection that inadvertently encodes discriminatory patterns, model architectures that amplify certain signals, and evaluation metrics that fail to account for fairness considerations. Common types include selection bias (non-representative sampling), confirmation bias (preferential treatment of information that confirms existing beliefs), and measurement bias (systematic errors in data collection or labeling). Understanding and mitigating bias is crucial for developing trustworthy AI systems, particularly in applications affecting human welfare such as hiring, lending, criminal justice, and healthcare, where biased outcomes can perpetuate or exacerbate social inequalities.

Benchmarking in artificial intelligence refers to the process of evaluating and comparing the performance of AI models or systems against standardized datasets, tasks, or metrics. It serves as a means to assess accuracy, efficiency, robustness, fairness, and other key properties in a reproducible and objective manner.

Benchmarks provide a common ground for measuring progress across different approaches and are essential for guiding research, validating claims, and identifying state-of-the-art methods. Examples include ImageNet for image classification, GLUE for natural language understanding, and HELM for large language models.

C

In artificial intelligence (AI), controllability refers to the degree to which a system’s behavior can be guided or constrained by human operators or predefined rules, especially in complex or autonomous settings. It encompasses the ability to steer the system toward desired outcomes, prevent unsafe actions, and ensure compliance with external constraints.

Controllability is a key concept in the design of AI systems that are reliable, safe, and aligned with human goals. It is particularly relevant in high-stakes domains such as autonomous vehicles, healthcare, and decision-making systems, where the consequences of AI actions must remain predictable and manageable.

D

A model compression technique that involves training a smaller, more efficient “student” model to replicate the behavior and performance of a larger, more complex “teacher” model. Knowledge distillation works by having the student model learn not only from the original training data but also from the soft predictions (probability distributions) generated by the teacher model, which contain richer information than hard labels alone. This process allows the student model to capture the nuanced decision-making patterns of the teacher while maintaining significantly reduced computational requirements, memory footprint, and inference time.

The technique is particularly valuable for deploying large foundation models in resource-constrained environments such as mobile devices, edge computing systems, or real-time applications. Knowledge distillation can be applied in various forms including offline distillation (where the teacher model is pre-trained), online distillation (where teacher and student models are trained simultaneously), and self-distillation (where a model serves as its own teacher). This approach has become essential for making advanced AI capabilities accessible across diverse deployment scenarios while maintaining acceptable performance levels, enabling the practical application of sophisticated AI models in environments where computational resources are limited.

E

Embeddings are dense, low-dimensional vector representations of data—such as words, sentences, images, or structured inputs—that capture semantic or structural relationships in a format suitable for machine learning models. Unlike one-hot encodings or raw features, embeddings encode similarities such that semantically or functionally related inputs are mapped to nearby points in the vector space.

Embeddings are essential in natural language processing (e.g., word2vec, GloVe, BERT), recommendation systems, and computer vision. They enable efficient comparisons, clustering, retrieval, and downstream learning tasks.

Well-learned embeddings serve as compact, information-rich features that enhance generalization and performance across a wide range of AI applications.

F

Few-shot learning is a machine learning paradigm in which a model is able to learn a new task or recognize new classes using only a small number of labeled examples—typically ranging from one to a few dozen. It aims to mimic the human ability to generalize from limited data.

FSL is particularly important in applications where labeled data is scarce, expensive, or time-consuming to obtain. It is commonly addressed using techniques such as meta-learning, metric learning, prompt-based learning (in the context of large language models), and transfer learning.

In natural language processing, few-shot learning often involves providing a model with a few example prompts and responses within the input, enabling it to infer the task structure and generate appropriate outputs without fine-tuning.

A machine learning technique that involves taking a pre-trained model and further training it on a smaller, task-specific dataset to adapt its capabilities for a particular application or domain. Fine-tuning leverages the general knowledge and representations learned during the initial training phase of a foundation model, allowing practitioners to achieve high performance on specialized tasks with significantly less computational resources and data than would be required for training from scratch. The process typically involves adjusting the model’s parameters using a lower learning rate to preserve the valuable pre-trained features while incorporating task-specific knowledge. Fine-tuning can be applied at various levels, from updating all model parameters (full fine-tuning) to modifying only specific layers or using parameter-efficient methods such as adapters or low-rank adaptation (LoRA). This technique has become essential in the deployment of foundation models, enabling customization for specific domains such as medical diagnosis, legal document analysis, or specialized language translation, while maintaining the broad capabilities of the original model. Fine-tuning represents a cost-effective approach to model specialization that has made advanced AI capabilities accessible to organizations with limited computational resources.

Large-scale artificial intelligence models trained on broad datasets that serve as the basis for a wide range of downstream applications through adaptation techniques such as fine-tuning, prompt engineering, or transfer learning. Foundation models are characterized by their substantial computational requirements, typically involving billions or trillions of parameters, and their ability to exhibit emergent capabilities that were not explicitly programmed. These models are pre-trained on diverse, heterogeneous data sources using self-supervised learning objectives, enabling them to develop generalizable representations of language, vision, or other modalities. Examples include large language models like GPT and BERT for natural language processing, vision transformers for computer vision, and multimodal models that can process multiple types of input simultaneously. Foundation models represent a paradigm shift in AI development, moving from task-specific model training to a more efficient approach where a single powerful model can be adapted for numerous applications. This approach has democratized access to advanced AI capabilities while raising important questions about computational resource concentration, model governance, bias propagation, and the environmental impact of training such large-scale systems.

Frontier AI refers to the most advanced and capable artificial intelligence systems at the cutting edge of current research and development. These models, often large-scale and general-purpose (such as foundation models and large language models), are distinguished by their potential to perform a broad range of tasks with high levels of autonomy, reasoning, and generalization.

The term is typically used in policy, governance, and safety discussions to highlight AI systems that pose both transformative benefits and significant risks, including societal, economic, and existential concerns. Managing the development and deployment of frontier AI involves rigorous safety evaluation, robust alignment, transparency, and international coordination.

G

A class of artificial intelligence systems designed to create new content, including text, images, audio, video, code, and other forms of media, by learning patterns and structures from large datasets during training. Generative AI models utilize sophisticated neural network architectures, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and transformer-based models, to generate outputs that are statistically similar to their training data while exhibiting novelty and creativity. These systems operate by learning probabilistic distributions over data and sampling from these distributions to produce new instances. Notable applications include large language models for text generation, diffusion models for image synthesis, and multimodal systems capable of cross-domain generation. Generative AI has revolutionized content creation, enabling applications in creative industries, software development, scientific research, and education. However, it also raises important considerations regarding intellectual property, authenticity, misinformation, and the potential for generating harmful or biased content, necessitating careful governance frameworks and responsible deployment practices.

A class of large language models based on the transformer architecture that are trained using unsupervised learning to generate human-like text by predicting the next word in a sequence. GPT models employ a decoder-only transformer architecture with self-attention mechanisms that enable them to capture long-range dependencies and contextual relationships within text. The training process involves two main phases: pre-training on vast amounts of text data using a next-token prediction objective, which allows the model to learn general language patterns and world knowledge, followed by optional fine-tuning on specific tasks or domains. GPT models demonstrate remarkable capabilities in text generation, completion, translation, summarization, question-answering, and various other natural language processing tasks through in-context learning and few-shot prompting. The architecture’s autoregressive nature means that each token is generated based on all preceding tokens, enabling coherent and contextually appropriate text generation. GPT models have evolved through multiple iterations, with each generation featuring increased model size, improved training techniques, and enhanced capabilities. These models have significantly advanced the field of natural language processing and have become foundational tools for numerous applications including chatbots, content creation, code generation, and educational assistance, while also raising important questions about AI safety, alignment, and societal impact.

H

The phenomenon in artificial intelligence systems, particularly large language models and generative AI, where models produce outputs that are factually incorrect, nonsensical, or entirely fabricated while presenting them with apparent confidence and coherence. Hallucinations occur when AI models generate content that is not grounded in their training data or real-world facts, often filling knowledge gaps with plausible-sounding but inaccurate information. This behavior stems from the probabilistic nature of generative models, which predict the most likely next tokens or elements based on learned patterns rather than accessing verified knowledge databases. Hallucinations can manifest in various forms including factual errors, fictional citations, non-existent entities, or logically inconsistent statements. The problem is particularly challenging because hallucinated content often appears authoritative and well-structured, making it difficult for users to distinguish from accurate information. Addressing hallucinations requires multiple approaches including improved training methodologies, retrieval-augmented generation, fact-checking mechanisms, uncertainty quantification, and user education about model limitations. Understanding and mitigating hallucinations is crucial for the responsible deployment of AI systems in high-stakes applications where accuracy and reliability are paramount.

I

Inference latency refers to the amount of time it takes for a trained AI model to produce an output (or prediction) after receiving an input. It begins when the input is provided to the model and ends when the output is returned, excluding any time spent on model training or data preprocessing.

Inference latency is a key consideration for deploying AI systems in real-time or user-facing applications, where responsiveness is critical. It depends on model architecture, hardware (e.g., CPU, GPU, or specialized accelerators), optimization techniques (e.g., quantization, pruning), and batch size. Balancing inference latency with model accuracy and throughput is central to efficient AI system design.

The degree to which a human can understand and explain the decision-making process, internal mechanisms, and reasoning behind an artificial intelligence system’s outputs or predictions. Interpretability encompasses both the ability to comprehend how a model arrives at specific decisions and the capacity to predict how the model will behave in different scenarios. This concept is crucial for building trust in AI systems, ensuring regulatory compliance, debugging model performance, and identifying potential biases or errors. Interpretability exists on a spectrum from inherently interpretable models (such as linear regression or decision trees) where the decision process is transparent, to post-hoc interpretability methods that attempt to explain complex black-box models after training. Common interpretability techniques include feature importance analysis, attention visualization, gradient-based methods, surrogate models, counterfactual explanations, and model-agnostic approaches such as LIME and SHAP. The field distinguishes between local interpretability (explaining individual predictions) and global interpretability (understanding overall model behavior), as well as between mechanistic interpretability (understanding internal model representations) and functional interpretability (understanding input-output relationships). Interpretability requirements vary significantly across applications, with high-stakes domains such as healthcare, finance, and criminal justice demanding greater transparency than less critical applications, creating ongoing tensions between model performance and explainability in AI system design.

J

K

L

A type of artificial intelligence system based on neural networks, typically transformer architectures, that has been trained on vast amounts of text data to understand and generate human language at scale. LLMs are characterized by their substantial size, often containing billions or trillions of parameters, and their ability to perform a wide range of language-related tasks without task-specific training through techniques such as few-shot learning, zero-shot learning, and in-context learning. These models are trained using self-supervised learning objectives, primarily next-token prediction, which enables them to learn statistical patterns in language, acquire factual knowledge, and develop reasoning capabilities. LLMs demonstrate emergent abilities that arise from their scale, including complex reasoning, code generation, mathematical problem-solving, creative writing, and multilingual communication. The training process involves pre-training on diverse text corpora followed by optional fine-tuning stages such as supervised fine-tuning and reinforcement learning from human feedback to improve alignment with human preferences and reduce harmful outputs. LLMs have become foundational tools in natural language processing, powering applications such as chatbots, content generation, translation services, and educational assistants. However, they also present challenges including computational requirements, potential biases, hallucinations, and questions about data privacy, intellectual property, and societal impact that require careful consideration in their development and deployment.

Latency in artificial intelligence refers to the time delay between an input being received by a system and the corresponding output being produced. It is a critical performance metric, particularly in real-time or interactive applications such as conversational agents, autonomous vehicles, or edge AI systems.

Latency is influenced by factors such as model size, computational complexity, hardware efficiency, and network communication. Reducing latency is essential for improving user experience, ensuring timely decisions, and enabling deployment in resource-constrained or safety-critical environments.

A parameter-efficient fine-tuning technique that enables the adaptation of large pre-trained models by learning low-rank decompositions of weight updates rather than modifying all model parameters. LoRA operates on the principle that the weight updates during fine-tuning have low intrinsic dimensionality, allowing the adaptation process to be represented through much smaller matrices. The technique introduces trainable low-rank matrices (typically with ranks between 1-64) that are added to the frozen pre-trained weights, dramatically reducing the number of parameters that need to be updated during fine-tuning. This approach can reduce trainable parameters by several orders of magnitude while maintaining competitive performance with full fine-tuning. LoRA offers significant practical advantages including reduced memory requirements, faster training times, lower computational costs, and the ability to store multiple task-specific adaptations as lightweight modules that can be easily swapped or combined. The technique has become particularly valuable for customizing large language models and other foundation models for specific applications, enabling organizations with limited computational resources to effectively adapt state-of-the-art models. LoRA represents a key advancement in making large-scale AI model customization more accessible and economically viable.

M

A subset of artificial intelligence that focuses on developing algorithms and statistical models that enable computer systems to automatically improve their performance on specific tasks through experience and data, without being explicitly programmed for each scenario. Machine learning systems identify patterns, relationships, and structures in data to make predictions, classifications, or decisions about new, unseen examples. The field encompasses three primary paradigms: supervised learning (learning from labeled examples to make predictions), unsupervised learning (discovering hidden patterns in unlabeled data), and reinforcement learning (learning optimal actions through trial and error with reward feedback). Machine learning algorithms range from traditional statistical methods such as linear regression and decision trees to sophisticated neural networks including deep learning architectures. The machine learning pipeline typically involves data collection and preprocessing, feature engineering, model selection and training, validation and testing, and deployment with ongoing monitoring. ML has become fundamental to modern AI applications, powering systems for image recognition, natural language processing, recommendation engines, fraud detection, autonomous vehicles, and scientific discovery. The success of machine learning depends critically on data quality, appropriate algorithm selection, proper evaluation methodologies, and consideration of ethical implications including fairness, transparency, and accountability in automated decision-making systems.

MLOps (short for “Machine Learning Operations”) refers to a set of practices, tools, and principles that aim to streamline and automate the end-to-end lifecycle of machine learning models—from development and training to deployment, monitoring, and maintenance in production environments.

Inspired by DevOps practices in software engineering, MLOps bridges the gap between data science and IT operations by enabling reproducibility, scalability, collaboration, and continuous integration and delivery (CI/CD) of ML systems. It addresses challenges such as model versioning, data drift, pipeline orchestration, testing, and compliance.

MLOps is essential for building robust, reliable, and maintainable AI solutions, particularly in enterprise and regulated contexts.

N

A multidisciplinary field that combines computational linguistics, machine learning, and artificial intelligence to enable computers to understand, interpret, and generate human language in a meaningful and useful way. NLP encompasses both the theoretical understanding of how language works and the practical development of systems that can process text and speech data for various applications. The field addresses fundamental challenges including syntactic parsing (understanding grammatical structure), semantic analysis (extracting meaning), pragmatics (understanding context and intent), and discourse processing (managing coherence across longer texts). Traditional NLP approaches relied heavily on rule-based systems and statistical methods, but the field has been revolutionized by deep learning techniques, particularly transformer architectures and large language models. Key NLP tasks include machine translation, sentiment analysis, named entity recognition, question answering, text summarization, information extraction, and dialogue systems. Modern NLP systems demonstrate remarkable capabilities in understanding context, handling ambiguity, and generating coherent text, though challenges remain in areas such as common sense reasoning, cultural nuance, and maintaining consistency across long documents. NLP applications span numerous domains including search engines, virtual assistants, content moderation, automated customer service, and language education, making it one of the most practically impactful areas of artificial intelligence research.

A neural network is a computational model inspired by the structure and functioning of the human brain. It consists of layers of interconnected nodes (called neurons) that process data by applying weights, biases, and activation functions to transform inputs into outputs.

Neural networks are the foundation of many modern AI systems and are particularly effective at learning patterns from large datasets. They are used in tasks such as image recognition, natural language processing, speech recognition, and game playing. Common types include feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs).

Training a neural network involves adjusting its parameters using optimization algorithms (e.g., gradient descent) to minimize a loss function based on the difference between predicted and actual outcomes.

O

P

The practice of designing, refining, and optimizing input prompts to effectively communicate with and elicit desired responses from large language models and other AI systems. Prompt engineering involves crafting precise instructions, examples, and context to guide AI models toward producing accurate, relevant, and useful outputs for specific tasks or applications. This discipline encompasses various techniques including few-shot learning (providing examples within the prompt), chain-of-thought prompting (encouraging step-by-step reasoning), role-playing (instructing the model to adopt specific personas or expertise), and template-based approaches for consistent formatting. Effective prompt engineering requires understanding the model’s capabilities and limitations, the structure of effective instructions, and the nuances of natural language that influence model behavior. The field has emerged as a critical skill for maximizing the utility of foundation models, enabling users to achieve sophisticated results without fine-tuning or additional training. Prompt engineering techniques can significantly impact model performance on tasks such as reasoning, creative writing, code generation, and domain-specific analysis. As AI systems become more capable and widely deployed, prompt engineering has evolved into both an art and a science, requiring iterative experimentation, systematic evaluation, and deep understanding of how language models process and respond to different types of input structures and content.

Q

R

Reinforcement learning is a type of machine learning in which an agent learns to make sequential decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. The agent’s goal is to learn a policy that maximizes cumulative reward over time.

RL is formally grounded in the framework of Markov Decision Processes (MDPs) and involves key concepts such as states, actions, rewards, policies, and value functions. It is widely used in domains where optimal behavior must be learned through trial and error, such as robotics, game playing, and resource management.

Variants include model-free methods (e.g., Q-learning, policy gradients), model-based methods, and deep reinforcement learning, where deep neural networks are used to approximate policies or value functions.

Retrieval-Augmented Generation (RAG) is a hybrid architecture that combines information retrieval with text generation to enhance the performance of language models. In RAG systems, an external knowledge source (such as a document database) is queried to retrieve relevant context, which is then used as input to a generative model to produce more accurate, grounded, and up-to-date responses.

This approach addresses limitations of purely generative models by enabling access to external information beyond the model’s training data. RAG is particularly useful in tasks requiring factual accuracy, domain-specific knowledge, or dynamic content, such as question answering, customer support, and legal or medical AI systems.

Typical RAG systems include components for query formulation, document retrieval (e.g., using vector search), and conditional generation based on the retrieved context.

S

A Small Language Model (SLM) is a language model with a relatively low number of parameters—typically in the range of millions to a few billion—designed to perform natural language understanding or generation tasks efficiently, often on edge devices or with limited computational resources.

SLMs aim to offer faster inference, lower memory usage, and greater deployability compared to large language models (LLMs), while maintaining sufficient task-specific performance. They are well-suited for applications requiring low latency, privacy preservation, or offline functionality.

Recent research focuses on optimizing SLMs through techniques such as knowledge distillation, quantization, and efficient architectures (e.g., distilled transformers, LoRA, or Mixture-of-Experts for small models).

Supervised learning is a machine learning paradigm in which a model is trained on a labeled dataset, where each input example is paired with a corresponding target output. The goal is to learn a mapping from inputs to outputs that generalizes well to unseen data.

Supervised learning is used for tasks such as classification (e.g., spam detection, image recognition) and regression (e.g., predicting housing prices, temperature forecasting). The learning process involves minimizing a loss function that quantifies the error between the model’s predictions and the true labels.

High-quality labeled data, appropriate model selection, and proper evaluation (e.g., using training/validation/test splits) are critical to the success of supervised learning systems.

T

In artificial intelligence, particularly in natural language processing (NLP), a token is a discrete unit of text used as input to a model. Tokens typically correspond to words, subwords, characters, or symbols, depending on the tokenization strategy applied (e.g., whitespace tokenization, Byte Pair Encoding, WordPiece).

Language models process sequences of tokens rather than raw text. For example, the sentence “Machine learning is powerful.” may be split into tokens such as [“Machine”, “learning”, “is”, “power”, “##ful”, “.”] depending on the tokenizer used.

Tokens are fundamental to model input length, vocabulary size, and performance. They also affect inference cost and are often the basis for usage-based pricing in large language models.

Tokenization is the process of breaking down raw input data—typically text—into smaller units called tokens, which serve as the basic inputs for natural language processing (NLP) models. Tokens can be words, subwords, characters, or symbols, depending on the tokenization strategy employed.

Different tokenization methods affect how efficiently models represent and understand language. Common approaches include:

Whitespace tokenization (splitting on spaces),
Subword tokenization (e.g., Byte Pair Encoding, WordPiece, SentencePiece),
Character-level tokenization.

Advanced models like BERT or GPT use subword tokenization to handle out-of-vocabulary words and balance vocabulary size with representational power. Tokenization plays a crucial role in model performance, generalization, and computational efficiency.

A neural network architecture introduced in 2017 that has become the dominant paradigm for natural language processing and many other AI applications. Transformers are built around the self-attention mechanism, which allows the model to weigh the importance of different parts of the input sequence when processing each element, enabling efficient parallel computation and the capture of long-range dependencies without the sequential processing limitations of recurrent neural networks. The architecture consists of encoder and decoder components, each containing multiple layers of multi-head attention and feed-forward networks, along with residual connections and layer normalization. The attention mechanism computes relationships between all pairs of positions in a sequence simultaneously, making transformers highly effective at understanding context and relationships in data. Key innovations include positional encoding to handle sequence order, multi-head attention to capture different types of relationships, and the ability to process sequences in parallel rather than sequentially. Transformers have enabled breakthrough performance in language modeling, machine translation, text summarization, and have been successfully adapted for computer vision (Vision Transformers), audio processing, and multimodal applications. The architecture’s scalability and effectiveness have made it the foundation for most modern large language models and have fundamentally transformed the landscape of artificial intelligence research and applications.

U

A machine learning paradigm where algorithms identify patterns, structures, and relationships in data without access to labeled examples or explicit target outputs. Unlike supervised learning, unsupervised learning systems must discover hidden structures in data through statistical analysis and pattern recognition, making it particularly valuable for exploratory data analysis and knowledge discovery. Common unsupervised learning tasks include clustering (grouping similar data points), dimensionality reduction (finding lower-dimensional representations while preserving important information), density estimation (modeling the probability distribution of data), and anomaly detection (identifying unusual or outlying observations). Key algorithms include k-means clustering, hierarchical clustering, principal component analysis (PCA), independent component analysis (ICA), autoencoders, and generative models such as variational autoencoders and generative adversarial networks. Unsupervised learning is fundamental to many AI applications, including data preprocessing, feature learning, customer segmentation, recommendation systems, and the pre-training phase of large language models where systems learn language representations from vast amounts of unlabeled text. The approach is particularly valuable when labeled data is scarce, expensive to obtain, or when the goal is to understand the underlying structure of complex datasets. Evaluation of unsupervised learning models presents unique challenges since there are no ground truth labels, requiring alternative metrics and validation approaches.

V

Value alignment refers to the challenge of ensuring that an AI system’s objectives and decision-making processes are aligned with human values, norms, and ethical principles. It is a subset of the broader alignment problem, focusing specifically on aligning machine behavior with complex, often implicit human preferences.

Achieving value alignment requires techniques from machine learning, human-computer interaction, and ethics, such as preference learning, inverse reinforcement learning, and participatory design. It is especially critical in advanced AI systems with high autonomy or open-ended goals, where misalignment can lead to harmful or unintended outcomes.

W

X

Y

Z

Zero-shot learning is a machine learning paradigm in which a model is able to perform a task or recognize classes it has never seen during training, by leveraging auxiliary information such as semantic descriptions, natural language prompts, or knowledge transfer from related tasks.

ZSL enables generalization to novel categories without requiring labeled examples for each specific class. It is especially useful in scenarios where data collection is costly or impractical. In natural language processing, large language models (LLMs) often demonstrate zero-shot capabilities by responding to new instructions or queries without additional fine-tuning.

This approach relies on the model’s ability to understand relationships between known and unknown concepts, often using embedding spaces, ontologies, or pretrained knowledge representations.