Practices Questions - Advances in ML

Q. Define bagging and give one example algorithm.

Bagging (Bootstrap Aggregating) is an ensemble learning technique used to reduce variance and improve model stability. It works by creating multiple training subsets through bootstrapping—random sampling with replacement. A separate model is trained on each subset, and predictions are combined, usually by averaging (regression) or majority voting (classification). Bagging is especially effective for high-variance models like decision trees, which tend to overfit individual datasets. By aggregating many weak learners, bagging provides a more robust and generalized model.
Example: Random Forest is the most popular bagging-based algorithm. It trains hundreds of decision trees on different bootstrapped samples and aggregates their outputs, resulting in high accuracy and reduced overfitting.


Q. What is the main idea behind boosting?

Boosting is an ensemble method that focuses on converting many weak learners into one strong learner by training models sequentially. Each new model pays more attention to incorrectly predicted samples from previous iterations. The goal is to progressively reduce errors by “boosting” model performance step by step. Boosting works well for complex datasets because it adaptively adjusts weights, allowing the ensemble to focus on challenging patterns.
Example: AdaBoost (Adaptive Boosting) starts by training a decision stump, identifies misclassified points, increases their weights, and trains the next stump to correct them. The final model is a weighted combination of all weak learners, achieving strong accuracy even with simple base models.


Q. State one key difference between Kernel SVM and linear SVM.

Linear SVM works only when the data is linearly separable, meaning a straight hyperplane can separate classes. Kernel SVM, however, can classify non-linear data by transforming it into a higher-dimensional feature space using kernel functions. This transformation allows complex boundaries to be learned without explicitly computing high-dimensional mappings—a concept known as the kernel trick.
Example: With concentric circular data, linear SVM fails because no straight line can separate classes. Using an RBF kernel, Kernel SVM projects points into a higher-dimensional space where separation becomes possible. Thus, Kernel SVM provides flexibility for real-world problems involving curved or irregular decision boundaries.


Q. What is Q-value in Q-Learning?

A Q-value (Quality value) represents the expected future reward obtained by taking an action in a given state and then following the optimal policy. It captures both immediate and long-term rewards, making Q-learning suitable for sequential decision-making tasks. Q-values are updated iteratively using the Bellman equation, allowing an agent to learn optimal behavior without prior environment knowledge.
Example: In a grid-world game, if the agent is at cell (2,3) and moves right, the Q-value estimates how good that action is based on potential rewards like reaching the goal or avoiding penalties. Over many episodes, Q-values converge, enabling the agent to choose the best action in each state.


Q. What is the purpose of transition probabilities in HMM?

In a Hidden Markov Model (HMM), transition probabilities represent the likelihood of moving from one hidden state to another in a sequential process. They capture temporal dependencies and allow modeling systems where the underlying state is unobserved but emits observable outputs. Transition probabilities help determine the most probable state sequence given observations using algorithms like Viterbi or Forward–Backward.
Example: In speech recognition, hidden states correspond to phonemes, and observed states correspond to audio signals. Transition probabilities determine how likely one phoneme transitions to another, such as “s” → “t.” These transitions help decode spoken words by identifying the most probable phoneme sequence generating the sound pattern.


Q. Define model interpretability.

Model interpretability refers to how easily humans can understand the reasoning behind a model’s predictions. It involves identifying which features influence decisions and by how much. Interpretability is essential for trust, fairness, debugging, regulatory compliance, and ethical AI deployment—especially in healthcare, finance, and legal sectors where explanations are mandatory.
Example: A loan approval model that denies a loan should explain whether factors like low income, high credit utilization, or poor repayment history contributed most to the decision. Interpretability tools like feature importance, SHAP, or LIME reveal how each input shaped the outcome, making complex models human-understandable.


Q. What does SHAP stand for?

SHAP stands for SHapley Additive exPlanations. It is an explainable AI technique based on Shapley values from cooperative game theory. SHAP assigns each feature a contribution score showing how much it pushed a prediction higher or lower. SHAP is model-agnostic, consistent, and provides both global and local explanations.
Example: For a medical ML model predicting heart disease, SHAP may show that high cholesterol contributes +0.20 to the risk score, while regular exercise reduces the risk by −0.15. This breakdown helps doctors understand individual predictions and validate whether the model aligns with clinical reasoning, increasing trust in AI-driven diagnosis.


Q. Give one limitation of LIME.

LIME (Local Interpretable Model-Agnostic Explanations) approximates complex models with simple surrogate models near the instance being explained. One limitation is its instability—repeat explanations for the same input can vary because LIME relies on random perturbations of features. This reduces reliability in sensitive domains like healthcare. LIME also assumes linearity in small regions, which may not reflect true model behavior.
Example: When explaining an image classifier, LIME perturbs pixel regions randomly. Different runs may highlight different segments as important, even though the model behaves consistently. This inconsistency makes it difficult for users to trust the explanation, especially for high-risk tasks like medical imaging classification.


Q. What is the role of filters in CNN?

Filters (kernels) in CNNs slide over the input image to detect local patterns such as edges, textures, shapes, or objects. Each filter learns specific features through training, enabling hierarchical feature extraction from low-level to high-level representations. Filters reduce the need for manual feature engineering, making CNNs highly effective for vision tasks.
Example: In an image classification task, one filter may detect vertical edges, another horizontal edges, and deeper layers may detect complex features like eyes or wheels. For a cat vs. dog classifier, CNN filters autonomously learn distinguishing patterns such as fur texture or ear shapes, improving accuracy without explicit programming.


Q. Define transfer learning.

Transfer learning involves using knowledge gained from training a model on one task and applying it to a related task with limited data. This reduces training time, improves accuracy, and minimizes data requirements. Pretrained deep networks like VGG, ResNet, BERT, and GPT are commonly used for transfer learning.
Example: A CNN pretrained on ImageNet (1M images) can be fine-tuned to classify X-ray images, even if only 1,000 medical images are available. The early layers already know how to detect edges and shapes, so the model quickly adapts to medical features. Transfer learning is widely used in NLP, vision, and speech applications.

Q. (a) Explain the working of Random Forest as an ensemble method.

(b) Compare bagging, boosting, and stacking.

(a) Random Forest Working

Random Forest is an ensemble learning method that builds multiple decision trees using bootstrapped samples of data. Each tree trains on a random subset of features, ensuring diversity among trees. During prediction, the outputs of all trees are aggregated—majority voting for classification or averaging for regression. This reduces variance and prevents overfitting, a common issue with single decision trees.
Example: In a medical diagnosis dataset, each tree may focus on different combinations of symptoms, and the forest collectively produces a more stable and accurate diagnosis than any individual tree.

(b) Comparison

  • Bagging reduces variance by training independent models on bootstrapped data (e.g., Random Forest).

  • Boosting reduces bias by sequentially training models that correct prior errors (e.g., AdaBoost, XGBoost).

  • Stacking combines predictions of diverse models using a meta-learner for final prediction.


Q. (a) Explain the Kernel Trick.

(b) How RBF Kernel improves classification?

(a) Kernel Trick

The kernel trick enables algorithms like SVM to learn non-linear patterns by implicitly transforming data into high-dimensional space without explicitly computing coordinates. Instead, it computes similarity using a kernel function. This makes non-linear separations feasible with low computational cost.
Example: XOR data cannot be separated linearly, but using a polynomial kernel transforms it into a space where a hyperplane cleanly separates the classes.

(b) RBF Kernel Benefits

The Radial Basis Function (RBF) kernel measures similarity based on distance, creating flexible, curved decision boundaries. It adapts to complex patterns and handles clusters of varying shapes.
Example: In handwriting recognition, RBF-SVM can separate digits with irregular shapes by forming smooth, non-linear boundaries.


Q. Describe the working cycle of Q-Learning with update equation and example.

Q-Learning is a model-free reinforcement learning algorithm that learns optimal actions by updating Q-values based on rewards. The agent interacts with the environment by choosing actions, observing rewards, and updating Q-values using the Bellman equation:

Q(s,a):=Q(s,a)+α(r+γmax⁡a′Q(s′,a′)−Q(s,a))

The agent explores states, gradually shifting to exploitation as Q-values converge.
Example: In a 4×4 grid maze, the agent receives +10 for reaching the goal and −1 for hitting walls. Over repeated episodes, Q-values guide the agent to choose the shortest, safest path. Eventually, the agent learns a policy that maximizes cumulative rewards.


Q. (a) What are SHAP values?

(b) Compare SHAP and LIME.

(a) SHAP Values

SHAP values are explainability metrics that quantify the contribution of each feature to a prediction. Based on Shapley game theory, SHAP fairly distributes the output difference among features by examining all possible feature combinations. SHAP provides consistent, local, and global explanations.
Example: For a loan approval model, SHAP may show: income (+0.30), credit score (+0.25), previous defaults (−0.50), explaining exactly why a loan was denied.

(b) SHAP vs LIME

  • LIME builds a local linear model using random perturbations; fast but unstable.

  • SHAP provides consistent, theoretically grounded values; more computationally expensive.

  • LIME approximates local behavior; SHAP explains both local and global feature impacts.


Q. (a) Explain CNN architecture.

(b) Give one CNN application.

(a) CNN Architecture

A Convolutional Neural Network consists of convolution layers, pooling layers, and fully connected layers. Convolution layers use filters to extract features like edges and shapes. Pooling layers reduce spatial dimensions, improving efficiency and preventing overfitting. Fully connected layers map extracted features to output classes. CNNs automatically learn hierarchical features—from low-level edges to high-level objects.
Example: Training a CNN on CIFAR-10 learns filters that detect object parts like wheels, wings, or animal faces.

(b) Application

In medical imaging, CNNs classify diseases from X-rays or MRIs. For example, a CNN can detect pneumonia from chest X-rays with high accuracy by identifying cloudy lung regions.


Q. (a) Explain attention in Transformers.

(b) How self-attention differs from RNN-based modeling?

(a) Attention Mechanism

Attention allows a model to focus on important parts of an input sequence rather than processing all tokens equally. It computes weighted relationships between tokens using queries, keys, and values. Transformers use multi-head attention to capture multiple contextual relationships simultaneously.
Example: In machine translation, attention helps model focus on relevant source words when generating each output word.

(b) Difference from RNNs

RNNs process sequences sequentially, making them slow and poor at capturing long-range dependencies due to vanishing gradients. Self-attention processes all tokens in parallel and computes global relationships in one step.
Example: In summarization, Transformers easily capture connections between distant sentences, unlike RNNs which lose context over long text.


Q. (a) Explain GAN architecture.

(b) Give two applications.

(a) GAN Architecture

A Generative Adversarial Network consists of two neural networks: a generator that creates synthetic data and a discriminator that distinguishes between real and fake samples. Both networks train simultaneously in a minimax game. The generator improves by learning to fool the discriminator, eventually producing highly realistic data.
Example: A GAN trained on human faces learns to generate new, photorealistic faces that do not belong to real individuals.

(b) Applications

  1. Image Synthesis: Generating high-quality images for animation, fashion, or virtual worlds.

  2. Data Augmentation: Synthetic medical images help improve model training when real data is scarce.


Q. Short notes on (choose any two):

(a) Deep Reinforcement Learning
(b) BERT Architecture
(c) Domain Adaptation

(a) Deep Reinforcement Learning

Deep RL combines neural networks with reinforcement learning to handle high-dimensional inputs like images. Models like Deep Q-Networks (DQN) replace Q-tables with CNNs, allowing RL to scale to complex environments.
Example: DeepMind’s DQN learned to play Atari games directly from pixel input, outperforming humans.

(b) BERT Architecture

BERT (Bidirectional Encoder Representations from Transformers) uses stacked Transformer encoder layers to learn deep bidirectional context. It uses masked language modeling to understand both left and right context simultaneously.
Example: BERT improves tasks like question answering by understanding full sentence meaning.

(c) Domain Adaptation

Domain adaptation transfers knowledge from a source domain to a different target domain with limited labels.
Example: A model trained on daytime traffic images adapts to nighttime images using adversarial domain adaptation.

Q. Ensemble Methods Case Study: Bagging & Stacking

Bagging is effective for high-variance, noisy datasets because it reduces overfitting by training multiple models on different bootstrapped subsets. Each model learns slightly different decision boundaries, and combining their predictions through averaging or voting leads to a more reliable output. In a noisy dataset like customer purchase behavior, individual trees may overfit noise, but a Bagging ensemble such as Random Forest smooths irregularities and improves generalization.

Stacking works differently: it trains diverse base models such as SVM, Random Forest, and Gradient Boosting, and then uses a meta-learner (often Logistic Regression) to learn how to best combine their outputs. The meta-learner identifies patterns like which model performs best on specific feature regions.

Workflow example:

  1. Train Level-1 models (e.g., Decision Tree, SVM, ANN).

  2. Collect their predictions as new features.

  3. Train a Level-2 meta-model to make final predictions.

Stacking excels when different models capture complementary information, producing superior predictive performance.


Q. Explainable AI in Healthcare Using SHAP and LIME

Explainability is essential in healthcare because doctors must understand why an AI model recommends a diagnosis or assigns a risk score. Without transparency, clinicians cannot validate predictions or ensure fairness. For example, a mortality prediction model in an ICU must show which clinical features—blood pressure, oxygen saturation, or age—are influencing the risk. Black-box predictions without reasoning could lead to harmful decisions.

SHAP provides patient-specific explanations by quantifying how each feature increases or decreases risk. For a patient with sepsis, SHAP may reveal that low platelets (+0.30), high lactate (+0.22), and rapid breathing (+0.15) significantly raise mortality risk. These values align with clinical reasoning, making the model trustworthy.

LIME can also explain predictions but suffers from instability due to random perturbations. In clinical environments, repeatable and consistent explanations are critical; hence SHAP is generally preferred. SHAP’s global and local interpretability makes it suitable for hospital audits, policy compliance, and building clinician confidence in AI models.


Q. Deep Learning Architectures: Transformer Encoder & Comparison

The Transformer encoder architecture uses multi-head self-attention, allowing each token to attend to every other token in the input sequence. It computes attention scores using queries, keys, and values, capturing long-range dependencies that RNNs struggle with. Positional encoding is added to preserve word order since Transformers do not rely on sequence recurrence. Each encoder block consists of multi-head attention followed by feed-forward layers and residual connections for stable training. This structure enables parallel processing of sequences, making Transformers significantly faster and more scalable.

Comparison:

  • CNNs excel in spatial data like images using localized filters.

  • RNNs process data sequentially but suffer from vanishing gradients.

  • Transformers capture global relationships instantly and parallelize training, outperforming others in most NLP tasks.

GPT uses a decoder-only Transformer, generating text autoregressively. It predicts the next word using masked self-attention, maintaining coherence over long passages. Pretrained on massive corpora, GPT adapts easily to tasks like summarization, reasoning, and conversation.


No comments:

Post a Comment