AWS AI Pracioner

Index: Ai Subsets > Machine Learning > Foundation Models > GenAI > Prompt Engineering > Responsible AI > Security, Compliance, Governance > Badrock > SageMaker

This post focuse in AWS AI. If you need more detail about cloud infra you can see the AWS Conteps post.

AI Subsets

AI > Machine Learning > Deep Learning > Generative AI

Artificial Intelligence (AI)

Artificial intelligence (AI) is technology that enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy. [1]
Techniques:
- Computer Vision: based on the computer video analysis of images in real time. Whereas computer vision involves interpreting and understanding the content of images to make decisions, image processing focuses on enhancing and manipulating images for visual quality. CNNs are used for single image analysis and RNNs are used for video analysis. ResNet is a deep neural network architecture used mainly in computer vision tasks, such as image classification and object detection
- Deep Learning: uses multilayered neural networks, called deep neural networks, to simulate the complex decision-making power of the human brain. CNNs are a type of deep learning model
- Natural Language Processing (NLP): uses machine learning to enable computers to understand and communicate with human language

Machine Learning (ML)

Machine Learning creates models from training an algorithm to make predictions or decisions based on data.
The performance is improved when they are exposed to more data
Machine Learning models can be deterministic (e.g Decision Trees) or probabilistic (e.g Bayesian Networks) or a mix of both (e.g neural networks and random forests)
Use cases: spam detection, recommendation systems, predictive analytics

Deep Learning (DL)

Subset of ML
It utilizes artificial neural networks with multiple layers to model complex patterns in data. Deep neural networks can automatically discover representations needed for future detection
Use cases: image, speech recognition, NLP, autonomous vehicles
Involves using large datasets to adjust the weights and biases of a neural network through multiple iterations

Generative AI (GenAI)

GenAI: generating new content similar to the data they were trained on. They rely on the Transformer Architecture
Unlabeled Data > Foundation Model > used for general tasks (chatbot, text generation, etc)
Large Language Models (LLM): Type of AI designed to generate coherent human-like text (GPT-4)
Foundation Models: trained on a wide variety of input data (GPT-4o)
Generative Language Models: interact with the LLM by a prompt. The model generate new content. Non-deterministic.
Example of Models: GPT for text generation and DALL-E for image creation

Machine Learning

Types:

Supervised Learning (labeled) [1][2]: involves training models on labeled data to make predictions. It can be Classification, Regression (e.g: Linear regression - predicting a price; spam detection), Neural Network
Unsupervised Learning (groups without label): clustering, Association rule learning, Probability density, Anomaly detection, dymnesionality regression, dymnesionality reduction
Semi-supervised (mix) (e.g speech recognition, Document classification, Fraud Detection, Sentiment analysis)
Self-supervised (try by itself): create its own label - Predict and infer (e.g Gmail smart compose)
Reinforcement Learning (try fit - learn by success and failure): focuses on an agent learning optimal actions through interactions with the environment and feedback (robotics, game playing, and industrial automation)

Algorithms:

Decision Trees are highly interpretable models that provide a clear and straightforward visualization of the decision-making process (deterministic).
Logistic Regression is primarily designed for binary classification problems. It can be adapted for multiclass classification but it may not perform effectively with a large number of categories or a complex dataset.
Neural Networks involves multiple layers of neurons and nonlinear transformations.
Support Vector Machines (SVMs) are effective for classification tasks, especially in high-dimensional spaces. SVMs create a hyperplane to separate classes.
K-Means is an unsupervised learning algorithm used for clustering data points into groups
KNN is a supervised learning algorithm used for classifying data points based on their proximity to labeled examples

Process

Pipeline: Fetch -> Clean -> Prepare -> Train and tune model -> Evaluate model -> Deploy to prod

Data:

The training set is used to train the model
The validation set (optional) is used for periodically measure model performance as training is happening and also tune any hyperparameters of the model, selecting the best model during the training process
The test set is used for evaluating the final performance of the model on unseen data (how well the model generalizes)

Process: Collect data > [Pre-process data + Data Preparation] (repeat ETL operation until data is prepared) -> [Apply ML Algorithm to ETL Data + Potential Models] (repeat to find the best model) > Deploy selected model

Data Collection:

Gathering the necessary data from various sources (Fetch)

Data Preprocessing:

Cleaning and preparing the data for training
Handle missing values(filling or removing), Inconsistent data (standardize data types, units of measurement), Duplicate (removing or merging)
Feature Enginnering can help to transform insufficient raw data into a format that is easier for a ML model to learn from
- selecting, modifying, or creating features from raw data to improve the performance of machine learning models
- for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization
Techniques:
- Feature selection: selecting a subset of the most relevant features from the original dataset
- Feature Extraction: transforming the data into a new feature space
- Dimension Reduction: reduce the number of input variables by transforming them into a smaller subset of features (simplify the model, prevent overfitting, and improve performance)
- Category Enconding;
- Data augmentation: artificially increase the size and variability of the training dataset by creating modified versions of the existing data
Generating Data:
- Exploratory Data Analysis (EDA):
  - Purpose: Identify patterns, correlations, and anomalies in data before any model training or analysis; formulate potential hypothesis
  - Techniques: creating visual charts; summarizing the main features
  - This phase is crucial for understanding the dataset’s structure and characteristics.
- Correlation Matrix: quantify relationships between variables

Model Training:

Using the preprocessed data to train a machine learning algorithm, resulting in a trained model.
stage where the data is split into training and validation sets, and the model is fine-tuned to optimize its performance.
Parameters: used to represent relationship between the data
Hyperparameter[1][2]
- Allows adjust the settings that control the learning process of the model.
- Configured before training and they remain fixed during training; they can impact the speed and quality of the learning process;
- Common hyperparameters among algorithms: Learning Rate; Batch Size; Number of Epochs
- Tuning a Model: adjusting a machine learning model's configurations to improve its performance on a specific task
- Poor ML models may be caused by hyperparameters: the optimal values can be find by Grid Search (test all possible combination) or Random Search (random combination)

Model Evaluation:

stage where the model's performance is assessed
Assessing the performance of the trained model using a separate test set to ensure it generalizes well to new, unseen data.
Metrics used to evaluate the effectiveness of a classification system: accuracy, precision, recall, or F1 score.

Metrics

Metrics for classification Models: [1][2]

Accuracy; Precision; Recall; F1 score -> measure using Confusion Matrix
Confusion matrix: designed to evaluate the performance of classification models by displaying the number of true positives, true negatives, false positives, and false negatives (correctly or incorrectly classification).
Precision: Measures the accuracy of the positive predictions, calculated as the ratio of true positives to the sum of true positives and false positives. It measures the exact matches between the candidate text (generated by a machine) and the reference text (written by a human)
Recall (Sensitivity): Measures the ability of the classifier to identify all positive instances, calculated as the ratio of true positives to the sum of true positives and false negatives.
F1-Score: The harmonic mean of Precision and Recall, providing a single metric that balances both concerns.

Metrics of Regression Models:

Mean Absolute Error (MAE): measures the average magnitude of errors in a set of predictions (accuracy of a continuous variable's predictions)
Mean Square Error (MSE)
Root Mean Square Error (RMSE): calculating the square root of the average squared differences between predicted and actual values.
Rˆ2 (R Squared)

For both:

Average Response Time: measures how long it takes for the model to process input data and generate a prediction
Number of Training Sessions
Customer Feedbacks
Return on Investment (ROI)

AWS Managed AI/ML Services and Applications

AWS AI Services: Support real-time and batch. Pay by use

Vision:

Rekognition: images and videos using ML
- Use cases: Labeling, Content Moderation, Text Detection, Face Detection
- Rekognition Analysis: tracking people, analyzing Faces, Facial Emotions
- Rekognition Detection: objects, scenes, text, brands, activities, inappropriate content
- Rekognition Custom Moderation Adaptor to tailor the moderation process to your needs -> Providers training dataset with labelled images
- Custom Labels: Label your training images and upload them to Amazon Rekognition
Textract[1]:
- designed for extracting text, handwriting, and structured data from scanned documents (scanned forms, images, tables and grids).
- Data Process: Real-time Analysis (single doc in a synchronous fashion); Async Analysis (multiple docs in batch process)
- Form extraction: extract data from forms and documents with structured layouts
- Document analysis: extract text, tables, and other elements from documents. This feature provides a comprehensive analysis of the document, including identifying the layout and structure
- Key-value pairs: extract structured data such as key-value pairs from documents like invoices and receipts
- Table detection: identify and extract tables from documents

Language:

Comprehend [1][2] : Analyze Text Data
- Use Natural Language processing (NPL)
- It is designed to analyze unstructured text and identify entities (name, date, etc)
- Amazon Comprehend can classify documents based on the content
- Get insights by classifying data
- Break down text via: tokenization; Parts of Speech (PoS)
- Custom Classification providing a training dataset with labelled categories
- Custom Entity Recognition
- Data Process: Real-time analysis
- Amazon Comprehend Medical detects and returns useful information in unstructured clinical text. Comprehend Medical is HIPAA-eligible and can quickly identify protected health information (PHI)
- The real-time API provided by Amazon Comprehend is specifically designed for applications that require on-the-fly analysis (e.g live feedback monitoring)
Translate: Natural and accurate language translation

Speech:

Polly [1][2]: text-to-speech
- use cases: Audiobook
- Data Process: Real-time, bach
- Use deep learning
- used to deploy high-quality, natural-sounding human voices in dozens of languages
- Interactive voice response (IVR) system that dynamically adjusts speech output based on user inputs
- Speech Synthesis Markup Language (SSML) allows the developer to control various aspects of speech
Transcribe: Speech-to-text
- Data Process: real-time; batch
- Automized by Automatic Speech Recognition (ASR) Service
- Improving Accuracy: custom vocabulary
- Custom Language Models (for context)
- use cases: closed caption and subtitles; transcribe customer service calls
- Amazon Transcribe Medical is an automatic speech recognition (ASR) service that makes it easy for you to add medical speech-to-text capabilities to your voice-enabled applications

Chatbots

Lex: converts speech to text to build chat bots
- The essence of a Bot Conversation: Intents ; Slots
- Key integration: Lambda, Connect, Comprehend

Forecasting:

use historical data to predict future trends
Labelled training dataset
Uses statistical and machine learning algorithms to deliver highly accurate time-series forecasts.
Use case: Retail demand planning; Supply chain planning; Resource planning; Operational planning
Kendra: powerful search service
- NLP; Contextual relevance
- Data sources: MS SharePoint, Google Drive, S3, RDS
- Semantic search: It provides accurate and relevant search results from a variety of document types.

Recommendation [1][2]

Personalize: uses your data to generate product and content recommendations for your users
Recipes (algorithms of Personalize): USER_PERSONALIZATION (based on activities); PERSONALIZED_RANKING; PERSONALIZED_ACTIONS; POPULAR_ITEMS; USER_SEGMENTATION
Use cases: retail stores, media and entertainment

Others ML application

SageMaker
Bedrock

AWS chip:

AWS Trainium: AWS purpose-built for deep learning (DL) training of 100B+ parameter models.
AWS Inferentia: deliver high-performance inference at a low cost.
Accelerated Computing P type instances - powered by high-end GPUs like NVIDIA Tesla, are optimized for maximum computational throughput, particularly for machine learning and HPC tasks. However, they consume significant amounts of power and are not specifically designed with energy efficiency in mind
Accelerated Computing G type instances - designed for graphics-heavy applications like gaming, rendering, or video processing. While they offer high computational power for specific tasks, they are not specifically optimized for energy efficiency or low environmental impact
Compute Optimized C type instances - designed to maximize compute performance for applications such as web servers, gaming, and scientific modeling. While they provide excellent compute power, they are not optimized for energy efficiency

Foundation Models

Foundation Models:

Large, general-purpose pre-trained models
They can be adapted for various tasks
ChatGPT: trained model
A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image

Large language models (LLM)[1][2]

It is a subset of foundation models that can understand and generate human language
Focused on language-based tasks (summarization, text generation, classification, open-ended conversation, and information extraction)
They are very large deep learning models that are pre-trained on vast amounts of data
It uses DL to analyze and predict word sequences
Examples: OpenAI's generative pre-trained transformer (GPT) models
Services for LLM: AWS Bedrock and Amazon SageMaker JumpStart

Data:

Training Data: Inputs variables (e.g image) and Target Variables (e.g label that identify the image)
Type of data: Structured data (Tabular) and Unstructured data (text, images, audio)

Goals of training:

Training data --> Model --> Trained Model
New Data --> Trained model --> Correct label (prediction/inference)
Goals of training for GenAI: input --> Trained Model --> Content Generation

Model Fit: How accurate do predictions (Probability) [1][2][3]

Underfitting occurs when a model cannot capture the underlying patterns in the data, resulting in poor performance on both the training data and new data. When underfit models experience high bias they give inaccurate results.
Overfitting happens when a model learns the training data too well, including noise and outliers, leading to excellent performance on the training data but poor generalization to new, unseen data. When overfit models experience high variance they give accurate results. Prevention: cross-validation, regularization (L1 and L2 - penalization), and pruning to simplify the model and improve its generalization. A model with unnecessarily high complexity (e.g too many parameters) is a common indicator of overfitting
It's necessary a balance

Design Considerations for Foundation Model Applications

Criteria used to select Foundation Model:

Modality: type of data a model is trained to handle (text, images, audios)
Latency: real-time, batch
Multilingual support
Complexity
Customization
Input, output length

Inference Parameter:

Temperature: regulates the creativity of the model's responses (0 - more deterministic | 1 - creative)
Top-P: represents the percentage of most likely candidates that the model considers for the next token.
Top-K: represents the number of most likely candidates that the model considers for the next token. helps introduce controlled diversity in the generated text while avoiding low-probability and nonsensical outputs
Stop sequences specify the sequences of characters that stop the model from generating further tokens.
Input/output length: how much information the model can process and generate
Length: limit the length of the response (Response length, Penalties, Stop sequences)

Retrieval-augmented Generation (RAG)[1][2]

It is the process of optimizing the output of a large language model (LLM)
External knowledge source (e.g database, documents)
RAG extends the capabilities of LLMs to specific domains without retrain the model
Combines retrieval generation

Vector Storage Solution

Foundation models process inputs and convert them into vectors embeddings, or mathematical representations of data
The embeddings capture the semantic meaning of inputs
Embeddings are how AI models represent data in a way that machines can understand
Vectors are the numerical array of embeddings
Vector databases store and manage embeddings
Vector databases uses vector seach algorithms to index and query vector embeddings based on their similarity

AWS Services:

Amazon OpenSearch: Combines vector and text search. Use case: search engines, recommendation systems
Amazon Aurora: Scalable, relational database capabilities. Use case: E-commerce, real-time recommendations
Amazon Neptune: Graph-based queries for embedding data. Use case: Knowledge graphs, social networks
Amazon Document DB: Schema-less storage for varied embeddings. Use case: Chatbots, personalization
Amazon RDS for PostgreSQL: PGVector extension for similarity searches. Use case: Multimedia search, AI applications.

Cost trad-offs for customization:

Pre-training: high costs, full controll over model behavior
Fine-tuning: moderate costs, balances control and efficiency
in-context learning: low costs, great for flexibility
RAG: cost-effective, uses external data for specialized tasks

Agents:

A way to extend the functionality of foundation models by enabling them to automate multi-step workflow
They follow predefined instructions, interact with data sources and generate outputs based on goals.
Ideal for tasks with multi-step automation
Integrate with RAG
Dynamically generate code
Orchestrate tasks via API calls

Training and Fine-tuning Foundation Models

Pre-training:

initial stage where the model learns from unstructured data
goal: understand patterns or generating coherent responses
usually done by large company like AWS
Continuous pre-trained is a process that allows LLM to learn new information while retaining what they've already learned

Fine-tuning:

Customizing pre-trained model with specific data (particular use cases)
Methods: Instruction Tuning, Transfer learning (allows a model to utilize the knowledge learned from one task or dataset to improve its performance on a new, but related task)
Preparing Data: Data curation; Data Governance; Data size and Representativeness; Data Labeling; Reinforcement Learning From Human Feedback (RLHF)

Evaluating Foundation Model Performance

Desired performance metrics: accuracy, fairness, usability

Methods:

Human Evaluation:
- access outputs based on specific criteria
- time-consuming and subjective
Benchmark Datasets
- prebuilt collections of labeled data used to test model performance against industry standards
- objective and require less administrator effort

Metrics: [1][2]

Accuracy is a broad metric typically used to evaluate classification tasks where the model's output is compared against the correct label.
ROUGE (Recall-oriented Understudy for Gisting Evaluation): measures the overlap between generated and reference texts (quality of text summaries based on exact word matches). It is testing for recall ability
BLEU (Bilingual Evaluation Understudy): primarily used for machine translation (based on exact word matches). How closely a generated translations matches a reference by comparing word sequence.
BERTScore: evaluates the quality of the text (based on the meaning of words in context) generator by leveraging embbeding. It used pre-trained Burt models or bi-directional encoder representations from transformers.

How well Foundation models meet business objectives:

productivity: high output quality with minimum human intervention
user engagement: how often and deeply users interact with the model
task engineering: how effective the model can complete specific tasks

GenAI

Tokens:

building blocks of language models.
Tokens are the individual units that generative AI modes work with
It’s a process of converting raw text into a sequence of tokens
Ex: Words, part of words
Context windows: number of tokens an LLM can take in when generating text

Chancking:

break larger inputs into smaller, manageable sections
helps manage larger datasets by breaking them down into smaller pieces

How AI models represent and understand data:

Embeddings: Embedding models are algorithms trained to encapsulate information into dense representations in a multi-dimensional space. Ex: BERT, Word2Vec (use static embeddings)
Vectors: Numerical arrays of embeddings that indicate where a value or object is located
Tokens + Embeddings: Input Text > Tokens ("Create", "an", "image") > Each token is converted into a vector embeddings ("0.56","0.3","0.87") > Model Processing

Different GenAI Models:

Foundation models
- large scale pre-trained models designeds to be adaptable across a wide range of tasks
- provide a robust starting point
- Expensive
- Examples: Titan (Amazon), Cloude (Anthropic), Stable Diffusion (Stability AI), Llama (Meta)
- These models undergo a multi-stage process:
  - Pre-trained on massive datasets to learn patterns, structures and relatioships between the data
  - Fine tunned for specific tasks
Multimodal models [1][2]
- extension of foundation models
- can process and understand multiple types of input data and create a mix of output types
- typically used for generating new content rather than interpreting and responding to queries
- handle and integrate multi types of data
- Chatbot (understand voice command and respond visually)
- Dall-E, Titan
Diffusion models
- generate high quality images
- Take a noise image and gradually refine it
- Steps:
  - Forward diffusion: propagating information from an initial source to other nodes or layers in the model, allowing for the flow of data in a specific direction.
  - Backward diffusion (reverse process): propagating information from the output or final layer back to the input or initial source, allowing for the flow of data in the opposite direction to refine and adjust the model
- work by corrupting data with noise through a forward diffusion process and then learning to reverse this process to denoise the data.
- use neural networks to predict and remove the noise step by step, ultimately generating new, structured data from random noise.
- Example: DALL-E, Adobe Firefly, Stable Diffusion (produces unique photorealistic images from text and image prompts)
Transformer-based LLMs
- LLMs that are transformer-based has a transformer architechture that basically transforms the input, words or tokens, to achieve a desired output, words, images or videos
- It uses self-attention: it allows the model to weigh the importance of different words in a sentence when encoding a particular word, identifing relationships and dependencies between words.
- The transformer-based generative model employs multiple encoder layers called attention heads to capture different types of relationships between words.
- The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities.
- This models play a key role in generative AI by enabling systems to process and generating human-Like text
- Examples:
  - GPT-4
  - BERT (Bidirectional Encoder Representations from Transformers): free foundation modal. It is specifically designed to capture the contextual meaning of words by looking at both the words that come before and after them (bidirectional context). BERT creates dynamic word embeddings that change depending on the surrounding text, allowing it to understand the different meanings of the same word in various contexts.

Foundation Model Lifecycle [Best Practices]

Data Selection: choose relevant and high quality data to train the model
Model Selection: chosen base on requirements
Pre-training: Epoch (complete iteration through a training dataset during model training); The number of epoch is a hyperparameter (underfitting, overfitting). Increasing the number of epochs allows the model to learn from the training data for a longer period. Multiple epochs are run until the accuracy of the model reaches an acceptable level, or when the error rate drops below an acceptable level.
Fine-tuning: where pretreined model is customized with task-specific data. Involves training an LLM on a smaller, specialized, labeled dataset
Evaluation: test the accuracy
Deployment: put the model in production
Feedback: get real word return

Advantages:

Adaptability: different types of tasks
Responsiveness: produce real-time response
Simplicity: frameworks and platforms for this

Disadvantages:

Hallucinations: the output seems correct but it’s not (reasons: overfitting, bias, high model complexity )
Lack of interpretability: complexity
Nondeterminism
Bias

Model Selection Factors:

Model types: Foundation models, transformer-based LLMs, diffusion model, multimodal models
Performance and Capabilities Requirements: latency, response time, throughput
Compliance: regulations
Constraints: hardware requirement, scalability, model size

Metrics:

Efficiency: how well perform task with minimal resources. How cost-effectively and quickly the AI model can be deployed, focusing on resource utilization and time to market. (Latency,Throughput, Resource Utilization). Often evaluated using benchmark datasets
Accuracy (percentage of correct prediction). Often evaluated using benchmark datasets
Conversion rate: percentage of users who take a desired action (purchases, sign-ups, or task completion ). Effectiveness of driving desired outcomes.
Revenue (average amount of money)
Customer lifetime value
Cross-domain performance: measure a model’s ability to perform well across various tasks or domain
Average Revenue Per User (ARPU): used to measure how much revenue is generated per user over a given period. ARPU is calculated by dividing the total revenue generated by the total number of users, providing insights into the average revenue generated by each user.
Scalability: measure how well a model can handle an increasing amount of data or workload. commonly evaluated using benchmark datasets

AWS Services:

SageMaker: key feature is JumpStart
Badrock: strong layer that supports everything above it
- solid foundation for building generative AI applications
- fully manage service that allows build and scale AI applications using foundation models
- access models from multiple providers
PartyRock: interactive environment where developers can experiment with and deploy generative AI models. Collaborative space for testing different models and configurations
Amazon Q: service that helps users generate visual insights from complex business data
- generative AI–powered assistant for accelerating software development and leveraging companies' internal data. Amazon Q generates code, tests, and debugs
- Amazon Q with QuickSight: allows non-technical users to query data using natural language. Natural Language interface, fast insights, scalability and security. QuickSight is specifically designed for creating interactive visualizations and dashboards for a wide range of data sources
- Amazon Q Business: dashboard generation, Executive summaries, data stories. The chat responses can be generated using model knowledge and enterprise data. To maintain controler over data and ensure security: Encryption and Permissions, Guardrails
- Amazon Q Developer is powered by Amazon Bedrock. It can get answers to AWS cost-related questions using natural language (use AWS Cost Explorer for it). It helps you understand and manage your cloud infrastructure on AWS

Generative Adversarial Network (GAN)

generating synthetic data that is statistically similar to real data.
evaluate and classify data as real or fake.
two primary parts: the generator and the discriminator. The discriminator has as fundamental role revolves around evaluating the authenticity of data.
Primary Role: Evaluating and Classifying Data as Real or Fake
Techniques: Generator (creating fake data) and Discriminator (neural network that evaluates and classifies data as either real (from the training set) or fake (generated by the generator)
Encoder and Coder are not standar components but can be integrated

Prompt Engineering

Concepts

Give instruction to AI
Clear, spefific, and contextually relevant
Techniques:
- Instructions – a task for the model to do (description, how the model should perform)
- Context – external information to guide the model
- Input data – the input for which you want a response
- Output Indicator – the output type or format
negative prompts, model latent space (model use to connect concepts)

Tecniques:

zero-shot prompting
few-shot prompt
chain-of-thought prompting: breaks down a complex question into smaller
prompt tempates

Benefits:

clear, consise prompts lead to high-quality outputs
experimentation uncovers new insights and discoveries
guardrails ensure safe and relevant answers
multiple comment improve depth and structure

Risks and limitations [1]

exposure: risk of exposing sensitive or confidential information to a model during training or inference
poisoning: intentional introduction of malicious or biased data into the training dataset of a model which leads to the model producing biased, offensive, or harmful outputs (intentionally or unintentionally)
hijacking: involves manipulating an AI system to serve malicious purposes or to misbehave in unintended ways.
jailbreaking: bypassing the built-in restrictions and safety measures of AI systems to unlock restricted functionalities or generate prohibited content.
Prompt Injection: influencing the outputs by embedding specific instructions within the prompts themselves
Prompt Leaking: unintentional disclosure or leakage of the prompts or inputs used within a model. It can expose protected data or other data used by the model, such as how the model works.

Prevention:

Creating a prompt template that specifically guides the LLM to detect and respond appropriately to potential attack patterns is an effective way to mitigate prompt engineering attacks. (Best Practices)
Validating (input formats, lengths, and types) and sanitizing (Cleans the input by removing or encoding harmful elements) user input before processing it in the model is a primary method for mitigating the risk of prompt injection attacks in generative AI models. (injection)

Responsible AI

Responsible AI is a set of principles that help guide the design, development, deployment and use of AI—building trust in AI solutions that have the potential to empower organizations and their stakeholders. It ensures that the ML algorithms are ethical, transparent, and trustworthy.

Dimensions:

Fairness and Bias Mitigation: Inclusivity; Diversity in data; Criteria for Curating Data (Balanced Representation, Multiple high-quality sources, Ethical Data Labeling)
Explainability (1): focuses on providing understandable reasons for the model’s predictions and behaviors to stakeholders. It goes a step further by providing insights into why a model made a specific prediction, especially when the model itself is complex and not inherently interpretable
Interpretability: understanding the internal mechanisms of a machine learning model. How easily a human can understand the reasoning behind a model’s predictions or decisions
Transperency
Robustness (adapt)
Veracity (reliable and truthful)
Controllability
Model selection and Environmental Susteinability
Safety
Privacy and Security

Trade-offs:

Bias vs Variance trade-off: challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias)
Controllability vs. Complexity: trade-off between the level of control a user has over the model’s behavior and the complexity of the model itself
Safety vs. Transparency: trade-off between ensuring the model’s predictions are safe and ethical while also being transparent and understandable to users
Interpretability vs. Performance: trade-off between the model’s ability to be easily interpreted and understood by humans versus its overall performance in terms of accuracy and efficiency.
Transparency vs interpretability vs Performance: High transparency = Hight Interpretability = Poor Performance

Poor Model:

Bias: difference between the predictive and actual values (level of error)
High bias (underfitting): the model doesn’t learn enough from the training set and performs poorly on both the training and test data
High Variance (overfitting): the model learns the data too well, including the noise. It does well on training set, but poorly on inseen data
Variance: extent to which the model’s predictions change when trained on different data

Types of Bias:

Measurement Bias: faulty data. it involves inaccuracies in data collection, such as faulty equipment or inconsistent measurement processes.
Sampling Bias: data is not representative of the population as a whole (train the model does not accurately reflect the diversity of the real-world population)
Confirmation Bias: try to confirm what you believe. it involves selectively searching for or interpreting information to confirm existing beliefs.
Observer Bias: collecting or labeling the data has their own subjective opinions of preference. it relates to human errors or subjectivity during data analysis or observation.

Risks:

Hallucination
prompt misuses
Intellectual property

Governance:

Policies, practices, and tools
Amazon Augmented AI (A21): human judgement, allowing for reviews and corrections of model predictions Amazon Augmented AI (Amazon A2I) is a service that makes it easy to build the workflows required for human review of ML predictions. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers
Amazon Badrock Guardrails evaluate the input and output of foundation models
Data residency is concerned with the physical location of data storage, whereas data retention defines the policies for how long data should be stored and maintained

Security, Compliance and Governance

Features:

Roles and Policies
Services and features (Encryption, Macie, AWS PrivateLink)
Data History: Origin, Source Citation, Lineage, Catalog, SageMaker Model Card
Secure data Engineering: Data Quality and Integrity, Data Access Control, Compliance (PII, NIST, HIPAA, GDPR)

Threat Detection

Training Data Poisoning, Misuse, Misconfiguration
GuardDuty, Inspector, Detective
Incident response: Preparation; Detection and Analysis; Containment, Eradication and Recovery; Post-incident

OWASP (nonprofit organization dedicated to researching application security)

top 10 for LLM Application security risk
Control Tower Fuardrail for Badrock AI

AWS Services:

WAF - AWS Web Application Firewall
AWS Shield (DDos)
AWS Cognito
AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements.
AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources.
AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards
Amazon Inspector: assesses applications for exposure, vulnerabilities, and deviations from best practices
AWS Security Services: AWS Config + Inspector + Detective + Audit Manager + Artifact + Trusted Advisor
Algorithm Accountability Laws

SOC (System and Organization Control)

Financial report
trust services
publuc summary

Data governance (1):

Primarily focuses on ensuring data quality, integrity, and security.
Implementing data validation and cleansing techniques

Bedrock

Amazon Badrock is a Fully-managed service
Charged for model inference and customization.
Model customization methods:
- Continued pre-training uses unlabeled data to pre-train a model, whereas, fine-tuning uses labeled data to train a model
- Involves further training and changing the weights of the model to enhance its performance.
Model evaluation
- preparing data, training models, selecting appropriate metrics, testing and analyzing results, ensuring fairness and bias detection, tuning performance, and continuous monitoring.
- helps you to incorporate Generative AI into your application by giving you the power to select the foundation model
Pay-per-use
Pricing [1][2]:
- On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments.
- Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.
Build Generative AI (Gen-AI) applications
Keep control of your data used to train the model
Unified APIs
Access to a wide range of Foundation Models (FM): Meta, amazon, anthropic, etc
Copy of the FM, available only to you, which you can further fine-tune with your own data
None of your data is used to train the FM
RAG, LLM Agents…
Security, Privacy, Governance and Responsible AI features
Knowledge Bases: supports popular databases for vector storage, including vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon). If you do not have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you.
OpenSearch supports full-text search, vector search, and advanced data indexing, which are essential for the Retrieval-Augmented Generation (RAG) framework.

Guardrails[1][2]

Control the interaction between users and Foundation Models (FMs)
Filter undesirable and harmful content
Detects and remove sensitive information (Personally Identifiable Information (PII)) in input prompts or model responses
Enhanced privacy
Reduce hallucinations
Guardrails for Amazon Bedrock enables you to implement safeguards for your generative AI applications based on your use cases and responsible AI policies.
You can use guardrails with text-based user inputs and model responses.

Agents: (1)(2)

AI agents act as intermediaries, translating user inputs or model outputs into actionable operations
fully managed capabilities that make it easier for developers to create generative AI-based applications that can complete complex tasks for a wide range of use cases and deliver up-to-date answers based on proprietary knowledge sources
Manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities
Task coordination: perform tasks in the correct order and ensure information is passed correctly between tasks
Agents are configured to perform specific pre-defined action groups
Integrate with other systems, services, databases and API to exchange data or initiate actions
Leverage RAG to retrieve information when necessary

Encryption:

Always encrypted in transit and at rest
can encrypt the data using an own keys.
can use AWS PrivateLink to establish private connectivity between your FMs and your Amazon Virtual Private Cloud (Amazon VPC) without exposing your traffic to the Internet.

Playground:

testing and fine-tuning prompts and parameters before using them in an application
provides a safe and controlled environment to experiment with different inputs and configurations
interactive environment

SageMaker

Concepts

General View:

Fully managed service for developers / data scientists to build ML models
Build > Train > Deploy
Support version control
Supports: supervised, unsupervised, reinforcement, deep learning
Comprehensive Toolbox for ML: IDE, One-stop shop for building, training and deploying ML models (Notebooks, Canvas, Datapreparation and visualization, collaboration tools)

Some features:

Built-in algorithms for common machine learning tasks
One-click deployment of models to scalable endpoints
Automatic model tuning using hyperparameter optimization

Prepare Data:

import: s3, Athena, Redshift
clean: identify missing values, deplicates, and outliers
feature engineer: create new features from existing data using built-in transformations
Visualize (EDA): visualize distributions, summary statistics, and relationships between variables

MLFlow

Open-source tool which helps ML teams manage the entire ML lifecycle
Manage ML Experiments: track, organize, view, analyze, and compare iterative ML experimentation.
Part of SageMaker Studio

Automatic Model Tuning (AMT)

Define the Object Metric
AMT automatically chooses hyperparameter ranges, search strategy, maximum runtime of a tuning job, and early stop condition

AutoML(Autopilot)

Automates the model selection and hyperparameter tuning process
Allows you to quickly generate a model with minimal manual interaction
Part of SageMaker Studio

Model Deployment and Inference (1)(2)

Deploy with one click, automatic scaling, no servers to manage
creates endpoints > provides access > users and Applications send data or receive predictions
Managed solution: reduced overhead
Inference types:
- Real-time (sync):
  - One prediction at a time.
  - ideal for inference workloads where you have real-time, interactive, low latency requirements
  - Use case: Fast, near-instant predictions for web/mobile apps; Chatbots
- Serverless
  - good choice for workloads with unpredictable traffic or sporadic requests
  - Tolerate more latency
  - automatically scales to accommodate varying loads
  - Used for workloads that have idle periods between traffic spikes and can tolerate cold starts
- Asynchronous
  - submit request and then check later the result
  - For large payload sizes up to 1GB
  - Long processing times
  - near real-time latency requirements
  - Use case: Images and videos analysis
- Batch (async)
  - Prediction for an entire dataset (multiple predictions)
  - Request and responses are in Amazon S3
  - Use case: Bulk processing for large datasets Concurrent processing; Processing large datasets

Tools

Sage Maker Studio:

Prepare: data Wrangger, processing, Data Sources, Feature store, Clarify
Build: Studio Notebooks, Algorithms, Autopilot, JumpStart
Train & Tune: OnClick Training, Experiments, Automatic Model Tuning, Debugger, Managed Spot Training
Deploy and Manage: One-click Deploy, Multi-model endpoint, Model Monitor, Pipelines

Responsible AI:

Amazon SageMaker Clarify[1][2]
- helps developers detect biases and explain the predictions made by machine learning models.
- Explain how machine learning (ML) models make predictions.
- Uses a model-agnostic feature attribution approach
- Produces partial dependence plots (PDPs) that show the marginal effect features have on the predicted outcome of a machine learning model
- Evaluate Foundation Models
- Evaluating human-factors such as friendliness or humor
- Leverage an AWS-managed team or bring your own employees
- Use built-in datasets or bring your own dataset
- Built-in metrics and algorithms
- Part of SageMaker Studio
- Ability to detect and explain biases in your datasets and models
- Measure bias using statistical metrics
- Specify input features and bias will be automatically detected
- Use Shapley values to explain individual predictions and PDP to understand the model's behavior at a dataset level
SageMaker Ground Truth:
- SageMaker Ground Truth enables the creation of high-quality labeled datasets by incorporating human feedback in the labeling process, which can be used to improve reinforcement learning models
- RLHF, humans for model grading and data labeling
- Data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
- you can use workers, a vendor company, or an internal, private workforce along with machine learning to enable you to create a labeled dataset. You can use the labeled dataset output from Ground Truth to train your models. You can also use the output as a training dataset for an Amazon SageMaker model.
- SageMaker GroundTruth Plus: fully managed data labeling service. It uses a combination of human labelers and machine learning-assisted labeling to ensure accuracy and consistency in the labels.
SageMaker Model Cards: transparency and information about the intended use, limitations, and potential impacts of AWS AI services

Governance:

SageMaker Model Cards: document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting. Data History.
SageMaker Model Dashboard: Centralized portal where you can view, search, and explore all of your models
SageMaker Role Manager
SageMaker Model Monitor
- Monitor the quality of your model in production: continuous or on-schedule
- Alerts for deviations in the model quality: fix data & retrain model
- Support Responsible AI

SageMaker Mechanical Turk:

provides a marketplace for outsourcing various tasks to a distributed workforce
provides an on-demand, scalable, human workforce to complete jobs that humans can do better than computers
formalizes job offers to the thousands of Workers willing to do piecemeal work at their convenience.

SageMaker JumpStart

ML model hub & pre-built ML solutions
ML Hub to find pre-trained Foundation Model (FM), computer vision models, or natural language processing models
Large collection of models from Hugging Face, Databricks, Meta, Stability AI…
Models can be fully customized for your data and use-case
Models are deployed on SageMaker directly
Pre-built ML solutions for demand forecasting, credit rate prediction, fraud detection and computer vision
You can evaluate, compare, and select Foundation Models quickly based on pre-defined quality and responsibility metrics

SageMaker Canvas

offers a no-code interface that can be used to create highly accurate machine learning models.
Build ML models using a visual interface
Access to ready-to-use models from Bedrock or JumpStart
Build your own custom model using AutoML powered by SageMaker Autopilot
Part of SageMaker Studio
Leverage Data Wrangler for data preparation

SageMaker Data Wrangler

reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes
supports tabular, time-series, and image data, offering 300+ pre-configured data transformations to prepare these different data modalities.
Designg for the first part of the ML process (Generating Data)
Messy data -> Data Wrangling -> Clean, transformed, and organized data
simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface

SageMaker Feature Store

fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models
Features are inputs to ML models used during training and inference
centralized hub for storing and retrieve ML features
Ingests features from a variety of sources
Define the transformation of data
Can publish directly from SageMaker Data Wrangler into SageMaker Feature Store
Features are discoverable within SageMaker Studio
Fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models.

SageMaker Model Registry

Centralized repository allows you to track, manage, and version ML models
Catalog models, manage model versions, associate metadata with a model
Manage approval status of a model, automate model deployment, share models

SageMaker Pipelines:

workflow that automates the process of building, training, and deploying a ML model (CI/CD)
Steps: Processing > Training > Tuning > AutoML > Model > ClarifyCheck >QualityCheck

AI Subsets

Machine Learning

Process

Metrics

AWS Managed AI/ML Services and Applications

Foundation Models

Design Considerations for Foundation Model Applications

Training and Fine-tuning Foundation Models

Evaluating Foundation Model Performance

GenAI

Prompt Engineering

Responsible AI

Security, Compliance and Governance

Bedrock

SageMaker

Concepts

Tools

References