AWS AI Pracioner
Index: Ai Subsets > Machine Learning > Foundation Models > GenAI > Prompt Engineering > Responsible AI > Security, Compliance, Governance > Badrock > SageMaker
This post focuse in AWS AI. If you need more detail about cloud infra you can see the AWS Conteps post.

AI Subsets
Artificial Intelligence (AI)
- Artificial intelligence (AI) is technology that enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy. [1]
- Techniques:
- Computer Vision: based on the computer video analysis of images in real time. Whereas computer vision involves interpreting and understanding the content of images to make decisions, image processing focuses on enhancing and manipulating images for visual quality. CNNs are used for single image analysis and RNNs are used for video analysis. ResNet is a deep neural network architecture used mainly in computer vision tasks, such as image classification and object detection
- Deep Learning: uses multilayered neural networks, called deep neural networks, to simulate the complex decision-making power of the human brain. CNNs are a type of deep learning model
- Natural Language Processing (NLP): uses machine learning to enable computers to understand and communicate with human language
Machine Learning (ML)
- Machine Learning creates models from training an algorithm to make predictions or decisions based on data.
- The performance is improved when they are exposed to more data
- Machine Learning models can be deterministic (e.g Decision Trees) or probabilistic (e.g Bayesian Networks) or a mix of both (e.g neural networks and random forests)
- Use cases: spam detection, recommendation systems, predictive analytics
Deep Learning (DL)
- Subset of ML
- It utilizes artificial neural networks with multiple layers to model complex patterns in data. Deep neural networks can automatically discover representations needed for future detection
- Use cases: image, speech recognition, NLP, autonomous vehicles
- Involves using large datasets to adjust the weights and biases of a neural network through multiple iterations
Generative AI (GenAI)
- GenAI: generating new content similar to the data they were trained on. They rely on the Transformer Architecture
- Unlabeled Data > Foundation Model > used for general tasks (chatbot, text generation, etc)
- Large Language Models (LLM): Type of AI designed to generate coherent human-like text (GPT-4)
- Foundation Models: trained on a wide variety of input data (GPT-4o)
- Generative Language Models: interact with the LLM by a prompt. The model generate new content. Non-deterministic.
- Example of Models: GPT for text generation and DALL-E for image creation
Machine Learning
Types:
- Supervised Learning (labeled) [1][2]: involves training models on labeled data to make predictions. It can be Classification, Regression (e.g: Linear regression - predicting a price; spam detection), Neural Network
- Unsupervised Learning (groups without label): clustering, Association rule learning, Probability density, Anomaly detection, dymnesionality regression, dymnesionality reduction
- Semi-supervised (mix) (e.g speech recognition, Document classification, Fraud Detection, Sentiment analysis)
- Self-supervised (try by itself): create its own label - Predict and infer (e.g Gmail smart compose)
- Reinforcement Learning (try fit - learn by success and failure): focuses on an agent learning optimal actions through interactions with the environment and feedback (robotics, game playing, and industrial automation)
- Decision Trees are highly interpretable models that provide a clear and straightforward visualization of the decision-making process (deterministic).
- Logistic Regression is primarily designed for binary classification problems. It can be adapted for multiclass classification but it may not perform effectively with a large number of categories or a complex dataset.
- Neural Networks involves multiple layers of neurons and nonlinear transformations.
- Support Vector Machines (SVMs) are effective for classification tasks, especially in high-dimensional spaces. SVMs create a hyperplane to separate classes.
- K-Means is an unsupervised learning algorithm used for clustering data points into groups
- KNN is a supervised learning algorithm used for classifying data points based on their proximity to labeled examples
Process
Pipeline: Fetch -> Clean -> Prepare -> Train and tune model -> Evaluate model -> Deploy to prod
- The training set is used to train the model
- The validation set (optional) is used for periodically measure model performance as training is happening and also tune any hyperparameters of the model, selecting the best model during the training process
- The test set is used for evaluating the final performance of the model on unseen data (how well the model generalizes)
Process: Collect data > [Pre-process data + Data Preparation] (repeat ETL operation until data is prepared) -> [Apply ML Algorithm to ETL Data + Potential Models] (repeat to find the best model) > Deploy selected model
Data Collection:
- Gathering the necessary data from various sources (Fetch)
Data Preprocessing:
- Cleaning and preparing the data for training
- Handle missing values(filling or removing), Inconsistent data (standardize data types, units of measurement), Duplicate (removing or merging)
- Feature Enginnering can help to transform insufficient raw data into a format that is easier for a ML model to learn from
- selecting, modifying, or creating features from raw data to improve the performance of machine learning models
- for structured data often involves tasks such as normalization and handling missing values, while for unstructured data, it involves tasks such as tokenization and vectorization
- Techniques:
- Feature selection: selecting a subset of the most relevant features from the original dataset
- Feature Extraction: transforming the data into a new feature space
- Dimension Reduction: reduce the number of input variables by transforming them into a smaller subset of features (simplify the model, prevent overfitting, and improve performance)
- Category Enconding;
- Data augmentation: artificially increase the size and variability of the training dataset by creating modified versions of the existing data
- Generating Data:
- Exploratory Data Analysis (EDA):
- Purpose: Identify patterns, correlations, and anomalies in data before any model training or analysis; formulate potential hypothesis
- Techniques: creating visual charts; summarizing the main features
- This phase is crucial for understanding the dataset’s structure and characteristics.
- Correlation Matrix: quantify relationships between variables
- Exploratory Data Analysis (EDA):
Model Training:
- Using the preprocessed data to train a machine learning algorithm, resulting in a trained model.
- stage where the data is split into training and validation sets, and the model is fine-tuned to optimize its performance.
- Parameters: used to represent relationship between the data
- Hyperparameter[1][2]
- Allows adjust the settings that control the learning process of the model.
- Configured before training and they remain fixed during training; they can impact the speed and quality of the learning process;
- Common hyperparameters among algorithms: Learning Rate; Batch Size; Number of Epochs
- Tuning a Model: adjusting a machine learning model's configurations to improve its performance on a specific task
- Poor ML models may be caused by hyperparameters: the optimal values can be find by Grid Search (test all possible combination) or Random Search (random combination)
Model Evaluation:
- stage where the model's performance is assessed
- Assessing the performance of the trained model using a separate test set to ensure it generalizes well to new, unseen data.
- Metrics used to evaluate the effectiveness of a classification system: accuracy, precision, recall, or F1 score.
Metrics
Metrics for classification Models: [1][2]
- Accuracy; Precision; Recall; F1 score -> measure using Confusion Matrix
- Confusion matrix: designed to evaluate the performance of classification models by displaying the number of true positives, true negatives, false positives, and false negatives (correctly or incorrectly classification).
- Precision: Measures the accuracy of the positive predictions, calculated as the ratio of true positives to the sum of true positives and false positives. It measures the exact matches between the candidate text (generated by a machine) and the reference text (written by a human)
- Recall (Sensitivity): Measures the ability of the classifier to identify all positive instances, calculated as the ratio of true positives to the sum of true positives and false negatives.
- F1-Score: The harmonic mean of Precision and Recall, providing a single metric that balances both concerns.
- Mean Absolute Error (MAE): measures the average magnitude of errors in a set of predictions (accuracy of a continuous variable's predictions)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE): calculating the square root of the average squared differences between predicted and actual values.
- Rˆ2 (R Squared)
For both:
- Average Response Time: measures how long it takes for the model to process input data and generate a prediction
- Number of Training Sessions
- Customer Feedbacks
- Return on Investment (ROI)
AWS Managed AI/ML Services and Applications
AWS AI Services: Support real-time and batch. Pay by use
Vision:
- Rekognition: images and videos using ML
- Use cases: Labeling, Content Moderation, Text Detection, Face Detection
- Rekognition Analysis: tracking people, analyzing Faces, Facial Emotions
- Rekognition Detection: objects, scenes, text, brands, activities, inappropriate content
- Rekognition Custom Moderation Adaptor to tailor the moderation process to your needs -> Providers training dataset with labelled images
- Custom Labels: Label your training images and upload them to Amazon Rekognition
- Textract[1]:
- designed for extracting text, handwriting, and structured data from scanned documents (scanned forms, images, tables and grids).
- Data Process: Real-time Analysis (single doc in a synchronous fashion); Async Analysis (multiple docs in batch process)
- Form extraction: extract data from forms and documents with structured layouts
- Document analysis: extract text, tables, and other elements from documents. This feature provides a comprehensive analysis of the document, including identifying the layout and structure
- Key-value pairs: extract structured data such as key-value pairs from documents like invoices and receipts
- Table detection: identify and extract tables from documents
Language:
- Comprehend [1][2] : Analyze Text Data
- Use Natural Language processing (NPL)
- It is designed to analyze unstructured text and identify entities (name, date, etc)
- Amazon Comprehend can classify documents based on the content
- Get insights by classifying data
- Break down text via: tokenization; Parts of Speech (PoS)
- Custom Classification providing a training dataset with labelled categories
- Custom Entity Recognition
- Data Process: Real-time analysis
- Amazon Comprehend Medical detects and returns useful information in unstructured clinical text. Comprehend Medical is HIPAA-eligible and can quickly identify protected health information (PHI)
- The real-time API provided by Amazon Comprehend is specifically designed for applications that require on-the-fly analysis (e.g live feedback monitoring)
- Translate: Natural and accurate language translation
Speech:
- Polly [1][2]: text-to-speech
- use cases: Audiobook
- Data Process: Real-time, bach
- Use deep learning
- used to deploy high-quality, natural-sounding human voices in dozens of languages
- Interactive voice response (IVR) system that dynamically adjusts speech output based on user inputs
- Speech Synthesis Markup Language (SSML) allows the developer to control various aspects of speech
- Transcribe: Speech-to-text
- Data Process: real-time; batch
- Automized by Automatic Speech Recognition (ASR) Service
- Improving Accuracy: custom vocabulary
- Custom Language Models (for context)
- use cases: closed caption and subtitles; transcribe customer service calls
- Amazon Transcribe Medical is an automatic speech recognition (ASR) service that makes it easy for you to add medical speech-to-text capabilities to your voice-enabled applications
Chatbots
- Lex: converts speech to text to build chat bots
- The essence of a Bot Conversation: Intents ; Slots
- Key integration: Lambda, Connect, Comprehend
- use historical data to predict future trends
- Labelled training dataset
- Uses statistical and machine learning algorithms to deliver highly accurate time-series forecasts.
- Use case: Retail demand planning; Supply chain planning; Resource planning; Operational planning
- Kendra: powerful search service
- NLP; Contextual relevance
- Data sources: MS SharePoint, Google Drive, S3, RDS
- Semantic search: It provides accurate and relevant search results from a variety of document types.
- Personalize: uses your data to generate product and content recommendations for your users
- Recipes (algorithms of Personalize): USER_PERSONALIZATION (based on activities); PERSONALIZED_RANKING; PERSONALIZED_ACTIONS; POPULAR_ITEMS; USER_SEGMENTATION
- Use cases: retail stores, media and entertainment
Others ML application
- SageMaker
- Bedrock
AWS chip:
- AWS Trainium: AWS purpose-built for deep learning (DL) training of 100B+ parameter models.
- AWS Inferentia: deliver high-performance inference at a low cost.
- Accelerated Computing P type instances - powered by high-end GPUs like NVIDIA Tesla, are optimized for maximum computational throughput, particularly for machine learning and HPC tasks. However, they consume significant amounts of power and are not specifically designed with energy efficiency in mind
- Accelerated Computing G type instances - designed for graphics-heavy applications like gaming, rendering, or video processing. While they offer high computational power for specific tasks, they are not specifically optimized for energy efficiency or low environmental impact
- Compute Optimized C type instances - designed to maximize compute performance for applications such as web servers, gaming, and scientific modeling. While they provide excellent compute power, they are not optimized for energy efficiency
Foundation Models
- Large, general-purpose pre-trained models
- They can be adapted for various tasks
- ChatGPT: trained model
- A multimodal model can accept a mix of input types such as audio/text and create a mix of output types such as video/image
Large language models (LLM)[1][2]
- It is a subset of foundation models that can understand and generate human language
- Focused on language-based tasks (summarization, text generation, classification, open-ended conversation, and information extraction)
- They are very large deep learning models that are pre-trained on vast amounts of data
- It uses DL to analyze and predict word sequences
- Examples: OpenAI's generative pre-trained transformer (GPT) models
- Services for LLM: AWS Bedrock and Amazon SageMaker JumpStart
Data:
- Training Data: Inputs variables (e.g image) and Target Variables (e.g label that identify the image)
- Type of data: Structured data (Tabular) and Unstructured data (text, images, audio)
Goals of training:
- Training data --> Model --> Trained Model
- New Data --> Trained model --> Correct label (prediction/inference)
- Goals of training for GenAI: input --> Trained Model --> Content Generation
Model Fit: How accurate do predictions (Probability) [1][2][3]
- Underfitting occurs when a model cannot capture the underlying patterns in the data, resulting in poor performance on both the training data and new data. When underfit models experience high bias they give inaccurate results.
- Overfitting happens when a model learns the training data too well, including noise and outliers, leading to excellent performance on the training data but poor generalization to new, unseen data. When overfit models experience high variance they give accurate results. Prevention: cross-validation, regularization (L1 and L2 - penalization), and pruning to simplify the model and improve its generalization. A model with unnecessarily high complexity (e.g too many parameters) is a common indicator of overfitting
- It's necessary a balance
Design Considerations for Foundation Model Applications
Criteria used to select Foundation Model:
- Modality: type of data a model is trained to handle (text, images, audios)
- Latency: real-time, batch
- Multilingual support
- Complexity
- Customization
- Input, output length
- Temperature: regulates the creativity of the model's responses (0 - more deterministic | 1 - creative)
- Top-P: represents the percentage of most likely candidates that the model considers for the next token.
- Top-K: represents the number of most likely candidates that the model considers for the next token. helps introduce controlled diversity in the generated text while avoiding low-probability and nonsensical outputs
- Stop sequences specify the sequences of characters that stop the model from generating further tokens.
- Input/output length: how much information the model can process and generate
- Length: limit the length of the response (Response length, Penalties, Stop sequences)
Retrieval-augmented Generation (RAG)[1][2]
- It is the process of optimizing the output of a large language model (LLM)
- External knowledge source (e.g database, documents)
- RAG extends the capabilities of LLMs to specific domains without retrain the model
- Combines retrieval generation
- Foundation models process inputs and convert them into vectors embeddings, or mathematical representations of data
- The embeddings capture the semantic meaning of inputs
- Embeddings are how AI models represent data in a way that machines can understand
- Vectors are the numerical array of embeddings
- Vector databases store and manage embeddings
- Vector databases uses vector seach algorithms to index and query vector embeddings based on their similarity
AWS Services:
- Amazon OpenSearch: Combines vector and text search. Use case: search engines, recommendation systems
- Amazon Aurora: Scalable, relational database capabilities. Use case: E-commerce, real-time recommendations
- Amazon Neptune: Graph-based queries for embedding data. Use case: Knowledge graphs, social networks
- Amazon Document DB: Schema-less storage for varied embeddings. Use case: Chatbots, personalization
- Amazon RDS for PostgreSQL: PGVector extension for similarity searches. Use case: Multimedia search, AI applications.
Cost trad-offs for customization:
- Pre-training: high costs, full controll over model behavior
- Fine-tuning: moderate costs, balances control and efficiency
- in-context learning: low costs, great for flexibility
- RAG: cost-effective, uses external data for specialized tasks
Agents:
- A way to extend the functionality of foundation models by enabling them to automate multi-step workflow
- They follow predefined instructions, interact with data sources and generate outputs based on goals.
- Ideal for tasks with multi-step automation
- Integrate with RAG
- Dynamically generate code
- Orchestrate tasks via API calls
Training and Fine-tuning Foundation Models
Pre-training:
- initial stage where the model learns from unstructured data
- goal: understand patterns or generating coherent responses
- usually done by large company like AWS
- Continuous pre-trained is a process that allows LLM to learn new information while retaining what they've already learned
- Customizing pre-trained model with specific data (particular use cases)
- Methods: Instruction Tuning, Transfer learning (allows a model to utilize the knowledge learned from one task or dataset to improve its performance on a new, but related task)
- Preparing Data: Data curation; Data Governance; Data size and Representativeness; Data Labeling; Reinforcement Learning From Human Feedback (RLHF)
Evaluating Foundation Model Performance
Desired performance metrics: accuracy, fairness, usability
Methods:
- Human Evaluation:
- access outputs based on specific criteria
- time-consuming and subjective
- Benchmark Datasets
- prebuilt collections of labeled data used to test model performance against industry standards
- objective and require less administrator effort
- Accuracy is a broad metric typically used to evaluate classification tasks where the model's output is compared against the correct label.
- ROUGE (Recall-oriented Understudy for Gisting Evaluation): measures the overlap between generated and reference texts (quality of text summaries based on exact word matches). It is testing for recall ability
- BLEU (Bilingual Evaluation Understudy): primarily used for machine translation (based on exact word matches). How closely a generated translations matches a reference by comparing word sequence.
- BERTScore: evaluates the quality of the text (based on the meaning of words in context) generator by leveraging embbeding. It used pre-trained Burt models or bi-directional encoder representations from transformers.
How well Foundation models meet business objectives:
- productivity: high output quality with minimum human intervention
- user engagement: how often and deeply users interact with the model
- task engineering: how effective the model can complete specific tasks
GenAI
- building blocks of language models.
- Tokens are the individual units that generative AI modes work with
- It’s a process of converting raw text into a sequence of tokens
- Ex: Words, part of words
- Context windows: number of tokens an LLM can take in when generating text
Chancking:
- break larger inputs into smaller, manageable sections
- helps manage larger datasets by breaking them down into smaller pieces
How AI models represent and understand data:
- Embeddings: Embedding models are algorithms trained to encapsulate information into dense representations in a multi-dimensional space. Ex: BERT, Word2Vec (use static embeddings)
- Vectors: Numerical arrays of embeddings that indicate where a value or object is located
- Tokens + Embeddings: Input Text > Tokens ("Create", "an", "image") > Each token is converted into a vector embeddings ("0.56","0.3","0.87") > Model Processing
Different GenAI Models:
- Foundation models
- large scale pre-trained models designeds to be adaptable across a wide range of tasks
- provide a robust starting point
- Expensive
- Examples: Titan (Amazon), Cloude (Anthropic), Stable Diffusion (Stability AI), Llama (Meta)
- These models undergo a multi-stage process:
- Pre-trained on massive datasets to learn patterns, structures and relatioships between the data
- Fine tunned for specific tasks
- Multimodal models [1][2]
- extension of foundation models
- can process and understand multiple types of input data and create a mix of output types
- typically used for generating new content rather than interpreting and responding to queries
- handle and integrate multi types of data
- Chatbot (understand voice command and respond visually)
- Dall-E, Titan
- Diffusion models
- generate high quality images
- Take a noise image and gradually refine it
- Steps:
- Forward diffusion: propagating information from an initial source to other nodes or layers in the model, allowing for the flow of data in a specific direction.
- Backward diffusion (reverse process): propagating information from the output or final layer back to the input or initial source, allowing for the flow of data in the opposite direction to refine and adjust the model
- work by corrupting data with noise through a forward diffusion process and then learning to reverse this process to denoise the data.
- use neural networks to predict and remove the noise step by step, ultimately generating new, structured data from random noise.
- Example: DALL-E, Adobe Firefly, Stable Diffusion (produces unique photorealistic images from text and image prompts)
- Transformer-based LLMs
- LLMs that are transformer-based has a transformer architechture that basically transforms the input, words or tokens, to achieve a desired output, words, images or videos
- It uses self-attention: it allows the model to weigh the importance of different words in a sentence when encoding a particular word, identifing relationships and dependencies between words.
- The transformer-based generative model employs multiple encoder layers called attention heads to capture different types of relationships between words.
- The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities.
- This models play a key role in generative AI by enabling systems to process and generating human-Like text
- Examples:
- GPT-4
- BERT (Bidirectional Encoder Representations from Transformers): free foundation modal. It is specifically designed to capture the contextual meaning of words by looking at both the words that come before and after them (bidirectional context). BERT creates dynamic word embeddings that change depending on the surrounding text, allowing it to understand the different meanings of the same word in various contexts.
Foundation Model Lifecycle [Best Practices]
- Data Selection: choose relevant and high quality data to train the model
- Model Selection: chosen base on requirements
- Pre-training: Epoch (complete iteration through a training dataset during model training); The number of epoch is a hyperparameter (underfitting, overfitting). Increasing the number of epochs allows the model to learn from the training data for a longer period. Multiple epochs are run until the accuracy of the model reaches an acceptable level, or when the error rate drops below an acceptable level.
- Fine-tuning: where pretreined model is customized with task-specific data. Involves training an LLM on a smaller, specialized, labeled dataset
- Evaluation: test the accuracy
- Deployment: put the model in production
- Feedback: get real word return
Advantages:
- Adaptability: different types of tasks
- Responsiveness: produce real-time response
- Simplicity: frameworks and platforms for this
Disadvantages:
- Hallucinations: the output seems correct but it’s not (reasons: overfitting, bias, high model complexity )
- Lack of interpretability: complexity
- Nondeterminism
- Bias
Model Selection Factors:
- Model types: Foundation models, transformer-based LLMs, diffusion model, multimodal models
- Performance and Capabilities Requirements: latency, response time, throughput
- Compliance: regulations
- Constraints: hardware requirement, scalability, model size
Metrics:
- Efficiency: how well perform task with minimal resources. How cost-effectively and quickly the AI model can be deployed, focusing on resource utilization and time to market. (Latency,Throughput, Resource Utilization). Often evaluated using benchmark datasets
- Accuracy (percentage of correct prediction). Often evaluated using benchmark datasets
- Conversion rate: percentage of users who take a desired action (purchases, sign-ups, or task completion ). Effectiveness of driving desired outcomes.
- Revenue (average amount of money)
- Customer lifetime value
- Cross-domain performance: measure a model’s ability to perform well across various tasks or domain
- Average Revenue Per User (ARPU): used to measure how much revenue is generated per user over a given period. ARPU is calculated by dividing the total revenue generated by the total number of users, providing insights into the average revenue generated by each user.
- Scalability: measure how well a model can handle an increasing amount of data or workload. commonly evaluated using benchmark datasets
AWS Services:
- SageMaker: key feature is JumpStart
- Badrock: strong layer that supports everything above it
- solid foundation for building generative AI applications
- fully manage service that allows build and scale AI applications using foundation models
- access models from multiple providers
- PartyRock: interactive environment where developers can experiment with and deploy generative AI models. Collaborative space for testing different models and configurations
- Amazon Q: service that helps users generate visual insights from complex business data
- generative AI–powered assistant for accelerating software development and leveraging companies' internal data. Amazon Q generates code, tests, and debugs
- Amazon Q with QuickSight: allows non-technical users to query data using natural language. Natural Language interface, fast insights, scalability and security. QuickSight is specifically designed for creating interactive visualizations and dashboards for a wide range of data sources
- Amazon Q Business: dashboard generation, Executive summaries, data stories. The chat responses can be generated using model knowledge and enterprise data. To maintain controler over data and ensure security: Encryption and Permissions, Guardrails
- Amazon Q Developer is powered by Amazon Bedrock. It can get answers to AWS cost-related questions using natural language (use AWS Cost Explorer for it). It helps you understand and manage your cloud infrastructure on AWS
Generative Adversarial Network (GAN)
- generating synthetic data that is statistically similar to real data.
- evaluate and classify data as real or fake.
- two primary parts: the generator and the discriminator. The discriminator has as fundamental role revolves around evaluating the authenticity of data.
- Primary Role: Evaluating and Classifying Data as Real or Fake
- Techniques: Generator (creating fake data) and Discriminator (neural network that evaluates and classifies data as either real (from the training set) or fake (generated by the generator)
- Encoder and Coder are not standar components but can be integrated
Prompt Engineering
- Give instruction to AI
- Clear, spefific, and contextually relevant
- Techniques:
- Instructions – a task for the model to do (description, how the model should perform)
- Context – external information to guide the model
- Input data – the input for which you want a response
- Output Indicator – the output type or format
- negative prompts, model latent space (model use to connect concepts)
Tecniques:
- zero-shot prompting
- few-shot prompt
- chain-of-thought prompting: breaks down a complex question into smaller
- prompt tempates
Benefits:
- clear, consise prompts lead to high-quality outputs
- experimentation uncovers new insights and discoveries
- guardrails ensure safe and relevant answers
- multiple comment improve depth and structure
Risks and limitations [1]
- exposure: risk of exposing sensitive or confidential information to a model during training or inference
- poisoning: intentional introduction of malicious or biased data into the training dataset of a model which leads to the model producing biased, offensive, or harmful outputs (intentionally or unintentionally)
- hijacking: involves manipulating an AI system to serve malicious purposes or to misbehave in unintended ways.
- jailbreaking: bypassing the built-in restrictions and safety measures of AI systems to unlock restricted functionalities or generate prohibited content.
- Prompt Injection: influencing the outputs by embedding specific instructions within the prompts themselves
- Prompt Leaking: unintentional disclosure or leakage of the prompts or inputs used within a model. It can expose protected data or other data used by the model, such as how the model works.
Prevention:
- Creating a prompt template that specifically guides the LLM to detect and respond appropriately to potential attack patterns is an effective way to mitigate prompt engineering attacks. (Best Practices)
- Validating (input formats, lengths, and types) and sanitizing (Cleans the input by removing or encoding harmful elements) user input before processing it in the model is a primary method for mitigating the risk of prompt injection attacks in generative AI models. (injection)
Responsible AI
Responsible AI is a set of principles that help guide the design, development, deployment and use of AI—building trust in AI solutions that have the potential to empower organizations and their stakeholders. It ensures that the ML algorithms are ethical, transparent, and trustworthy.
Dimensions:
- Fairness and Bias Mitigation: Inclusivity; Diversity in data; Criteria for Curating Data (Balanced Representation, Multiple high-quality sources, Ethical Data Labeling)
- Explainability (1): focuses on providing understandable reasons for the model’s predictions and behaviors to stakeholders. It goes a step further by providing insights into why a model made a specific prediction, especially when the model itself is complex and not inherently interpretable
- Interpretability: understanding the internal mechanisms of a machine learning model. How easily a human can understand the reasoning behind a model’s predictions or decisions
- Transperency
- Robustness (adapt)
- Veracity (reliable and truthful)
- Controllability
- Model selection and Environmental Susteinability
- Safety
- Privacy and Security
Trade-offs:
- Bias vs Variance trade-off: challenge of balancing the error due to the model’s complexity (variance) and the error due to incorrect assumptions in the model (bias)
- Controllability vs. Complexity: trade-off between the level of control a user has over the model’s behavior and the complexity of the model itself
- Safety vs. Transparency: trade-off between ensuring the model’s predictions are safe and ethical while also being transparent and understandable to users
- Interpretability vs. Performance: trade-off between the model’s ability to be easily interpreted and understood by humans versus its overall performance in terms of accuracy and efficiency.
- Transparency vs interpretability vs Performance: High transparency = Hight Interpretability = Poor Performance
Poor Model:
- Bias: difference between the predictive and actual values (level of error)
- High bias (underfitting): the model doesn’t learn enough from the training set and performs poorly on both the training and test data
- High Variance (overfitting): the model learns the data too well, including the noise. It does well on training set, but poorly on inseen data
- Variance: extent to which the model’s predictions change when trained on different data
Types of Bias:
- Measurement Bias: faulty data. it involves inaccuracies in data collection, such as faulty equipment or inconsistent measurement processes.
- Sampling Bias: data is not representative of the population as a whole (train the model does not accurately reflect the diversity of the real-world population)
- Confirmation Bias: try to confirm what you believe. it involves selectively searching for or interpreting information to confirm existing beliefs.
- Observer Bias: collecting or labeling the data has their own subjective opinions of preference. it relates to human errors or subjectivity during data analysis or observation.
Risks:
- Hallucination
- prompt misuses
- Intellectual property
Governance:
- Policies, practices, and tools
- Amazon Augmented AI (A21): human judgement, allowing for reviews and corrections of model predictions Amazon Augmented AI (Amazon A2I) is a service that makes it easy to build the workflows required for human review of ML predictions. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers
- Amazon Badrock Guardrails evaluate the input and output of foundation models
- Data residency is concerned with the physical location of data storage, whereas data retention defines the policies for how long data should be stored and maintained
Security, Compliance and Governance
Features:
- Roles and Policies
- Services and features (Encryption, Macie, AWS PrivateLink)
- Data History: Origin, Source Citation, Lineage, Catalog, SageMaker Model Card
- Secure data Engineering: Data Quality and Integrity, Data Access Control, Compliance (PII, NIST, HIPAA, GDPR)
Threat Detection
- Training Data Poisoning, Misuse, Misconfiguration
- GuardDuty, Inspector, Detective
- Incident response: Preparation; Detection and Analysis; Containment, Eradication and Recovery; Post-incident
OWASP (nonprofit organization dedicated to researching application security)
- top 10 for LLM Application security risk
- Control Tower Fuardrail for Badrock AI
AWS Services:
- WAF - AWS Web Application Firewall
- AWS Shield (DDos)
- AWS Cognito
- AWS Artifact provides on-demand access to AWS’ compliance reports and online agreements.
- AWS Config is a service that enables you to assess, audit, and evaluate the configurations of your AWS resources.
- AWS Audit Manager helps you continuously audit your AWS usage to simplify how you assess risk and compliance with regulations and industry standards
- Amazon Inspector: assesses applications for exposure, vulnerabilities, and deviations from best practices
- AWS Security Services: AWS Config + Inspector + Detective + Audit Manager + Artifact + Trusted Advisor
- Algorithm Accountability Laws
SOC (System and Organization Control)
- Financial report
- trust services
- publuc summary
Data governance (1):
- Primarily focuses on ensuring data quality, integrity, and security.
- Implementing data validation and cleansing techniques
Bedrock
- Amazon Badrock is a Fully-managed service
- Charged for model inference and customization.
- Model customization methods:
- Continued pre-training uses unlabeled data to pre-train a model, whereas, fine-tuning uses labeled data to train a model
- Involves further training and changing the weights of the model to enhance its performance.
- Model evaluation
- preparing data, training models, selecting appropriate metrics, testing and analyzing results, ensuring fairness and bias detection, tuning performance, and continuous monitoring.
- helps you to incorporate Generative AI into your application by giving you the power to select the foundation model
- Pay-per-use
- Pricing [1][2]:
- On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments.
- Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.
- Build Generative AI (Gen-AI) applications
- Keep control of your data used to train the model
- Unified APIs
- Access to a wide range of Foundation Models (FM): Meta, amazon, anthropic, etc
- Copy of the FM, available only to you, which you can further fine-tune with your own data
- None of your data is used to train the FM
- RAG, LLM Agents…
- Security, Privacy, Governance and Responsible AI features
- Knowledge Bases: supports popular databases for vector storage, including vector engine for Amazon OpenSearch Serverless, Pinecone, Redis Enterprise Cloud, Amazon Aurora (coming soon), and MongoDB (coming soon). If you do not have an existing vector database, Amazon Bedrock creates an OpenSearch Serverless vector store for you.
- OpenSearch supports full-text search, vector search, and advanced data indexing, which are essential for the Retrieval-Augmented Generation (RAG) framework.
- Control the interaction between users and Foundation Models (FMs)
- Filter undesirable and harmful content
- Detects and remove sensitive information (Personally Identifiable Information (PII)) in input prompts or model responses
- Enhanced privacy
- Reduce hallucinations
- Guardrails for Amazon Bedrock enables you to implement safeguards for your generative AI applications based on your use cases and responsible AI policies.
- You can use guardrails with text-based user inputs and model responses.
- AI agents act as intermediaries, translating user inputs or model outputs into actionable operations
- fully managed capabilities that make it easier for developers to create generative AI-based applications that can complete complex tasks for a wide range of use cases and deliver up-to-date answers based on proprietary knowledge sources
- Manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities
- Task coordination: perform tasks in the correct order and ensure information is passed correctly between tasks
- Agents are configured to perform specific pre-defined action groups
- Integrate with other systems, services, databases and API to exchange data or initiate actions
- Leverage RAG to retrieve information when necessary
- Always encrypted in transit and at rest
- can encrypt the data using an own keys.
- can use AWS PrivateLink to establish private connectivity between your FMs and your Amazon Virtual Private Cloud (Amazon VPC) without exposing your traffic to the Internet.
- testing and fine-tuning prompts and parameters before using them in an application
- provides a safe and controlled environment to experiment with different inputs and configurations
- interactive environment
SageMaker
Concepts
General View:
- Fully managed service for developers / data scientists to build ML models
- Build > Train > Deploy
- Support version control
- Supports: supervised, unsupervised, reinforcement, deep learning
- Comprehensive Toolbox for ML: IDE, One-stop shop for building, training and deploying ML models (Notebooks, Canvas, Datapreparation and visualization, collaboration tools)
Some features:
- Built-in algorithms for common machine learning tasks
- One-click deployment of models to scalable endpoints
- Automatic model tuning using hyperparameter optimization
Prepare Data:
- import: s3, Athena, Redshift
- clean: identify missing values, deplicates, and outliers
- feature engineer: create new features from existing data using built-in transformations
- Visualize (EDA): visualize distributions, summary statistics, and relationships between variables
- Open-source tool which helps ML teams manage the entire ML lifecycle
- Manage ML Experiments: track, organize, view, analyze, and compare iterative ML experimentation.
- Part of SageMaker Studio
Automatic Model Tuning (AMT)
- Define the Object Metric
- AMT automatically chooses hyperparameter ranges, search strategy, maximum runtime of a tuning job, and early stop condition
AutoML(Autopilot)
- Automates the model selection and hyperparameter tuning process
- Allows you to quickly generate a model with minimal manual interaction
- Part of SageMaker Studio
Model Deployment and Inference (1)(2)
- Deploy with one click, automatic scaling, no servers to manage
- creates endpoints > provides access > users and Applications send data or receive predictions
- Managed solution: reduced overhead
- Inference types:
- Real-time (sync):
- One prediction at a time.
- ideal for inference workloads where you have real-time, interactive, low latency requirements
- Use case: Fast, near-instant predictions for web/mobile apps; Chatbots
- Serverless
- good choice for workloads with unpredictable traffic or sporadic requests
- Tolerate more latency
- automatically scales to accommodate varying loads
- Used for workloads that have idle periods between traffic spikes and can tolerate cold starts
- Asynchronous
- submit request and then check later the result
- For large payload sizes up to 1GB
- Long processing times
- near real-time latency requirements
- Use case: Images and videos analysis
- Batch (async)
- Prediction for an entire dataset (multiple predictions)
- Request and responses are in Amazon S3
- Use case: Bulk processing for large datasets Concurrent processing; Processing large datasets
- Real-time (sync):
Tools
Sage Maker Studio:
- Prepare: data Wrangger, processing, Data Sources, Feature store, Clarify
- Build: Studio Notebooks, Algorithms, Autopilot, JumpStart
- Train & Tune: OnClick Training, Experiments, Automatic Model Tuning, Debugger, Managed Spot Training
- Deploy and Manage: One-click Deploy, Multi-model endpoint, Model Monitor, Pipelines
Responsible AI:
- Amazon SageMaker Clarify[1][2]
- helps developers detect biases and explain the predictions made by machine learning models.
- Explain how machine learning (ML) models make predictions.
- Uses a model-agnostic feature attribution approach
- Produces partial dependence plots (PDPs) that show the marginal effect features have on the predicted outcome of a machine learning model
- Evaluate Foundation Models
- Evaluating human-factors such as friendliness or humor
- Leverage an AWS-managed team or bring your own employees
- Use built-in datasets or bring your own dataset
- Built-in metrics and algorithms
- Part of SageMaker Studio
- Ability to detect and explain biases in your datasets and models
- Measure bias using statistical metrics
- Specify input features and bias will be automatically detected
- Use Shapley values to explain individual predictions and PDP to understand the model's behavior at a dataset level
- SageMaker Ground Truth:
- SageMaker Ground Truth enables the creation of high-quality labeled datasets by incorporating human feedback in the labeling process, which can be used to improve reinforcement learning models
- RLHF, humans for model grading and data labeling
- Data generation and annotation to model review, customization, and evaluation, either through a self-service or an AWS-managed offering.
- you can use workers, a vendor company, or an internal, private workforce along with machine learning to enable you to create a labeled dataset. You can use the labeled dataset output from Ground Truth to train your models. You can also use the output as a training dataset for an Amazon SageMaker model.
- SageMaker GroundTruth Plus: fully managed data labeling service. It uses a combination of human labelers and machine learning-assisted labeling to ensure accuracy and consistency in the labels.
- SageMaker Model Cards: transparency and information about the intended use, limitations, and potential impacts of AWS AI services
Governance:
- SageMaker Model Cards: document critical details about your machine learning (ML) models in a single place for streamlined governance and reporting. Data History.
- SageMaker Model Dashboard: Centralized portal where you can view, search, and explore all of your models
- SageMaker Role Manager
- SageMaker Model Monitor
- Monitor the quality of your model in production: continuous or on-schedule
- Alerts for deviations in the model quality: fix data & retrain model
- Support Responsible AI
SageMaker Mechanical Turk:
- provides a marketplace for outsourcing various tasks to a distributed workforce
- provides an on-demand, scalable, human workforce to complete jobs that humans can do better than computers
- formalizes job offers to the thousands of Workers willing to do piecemeal work at their convenience.
SageMaker JumpStart
- ML model hub & pre-built ML solutions
- ML Hub to find pre-trained Foundation Model (FM), computer vision models, or natural language processing models
- Large collection of models from Hugging Face, Databricks, Meta, Stability AI…
- Models can be fully customized for your data and use-case
- Models are deployed on SageMaker directly
- Pre-built ML solutions for demand forecasting, credit rate prediction, fraud detection and computer vision
- You can evaluate, compare, and select Foundation Models quickly based on pre-defined quality and responsibility metrics
SageMaker Canvas
- offers a no-code interface that can be used to create highly accurate machine learning models.
- Build ML models using a visual interface
- Access to ready-to-use models from Bedrock or JumpStart
- Build your own custom model using AutoML powered by SageMaker Autopilot
- Part of SageMaker Studio
- Leverage Data Wrangler for data preparation
SageMaker Data Wrangler
- reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes
- supports tabular, time-series, and image data, offering 300+ pre-configured data transformations to prepare these different data modalities.
- Designg for the first part of the ML process (Generating Data)
- Messy data -> Data Wrangling -> Clean, transformed, and organized data
- simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface
SageMaker Feature Store
- fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models
- Features are inputs to ML models used during training and inference
- centralized hub for storing and retrieve ML features
- Ingests features from a variety of sources
- Define the transformation of data
- Can publish directly from SageMaker Data Wrangler into SageMaker Feature Store
- Features are discoverable within SageMaker Studio
- Fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models.
SageMaker Model Registry
- Centralized repository allows you to track, manage, and version ML models
- Catalog models, manage model versions, associate metadata with a model
- Manage approval status of a model, automate model deployment, share models
SageMaker Pipelines:
- workflow that automates the process of building, training, and deploying a ML model (CI/CD)
- Steps: Processing > Training > Tuning > AutoML > Model > ClarifyCheck >QualityCheck