Basics of Machine Learning
- What is machine learning?
- Answer: Machine learning is a branch of artificial intelligence (AI) that enables systems to learn from data and improve performance over time without explicit programming.
- Differentiate between supervised and unsupervised learning.
- Answer:
- Supervised Learning: Involves training a model on labeled data to make predictions or classify new data.
- Unsupervised Learning: Involves training on unlabeled data to discover patterns, clusters, or relationships.
- Explain the bias-variance trade-off in machine learning.
- Answer: The bias-variance trade-off balances model complexity (variance) and generalization ability (bias). High bias leads to underfitting, and high variance leads to overfitting.
Machine Learning Algorithms
- Name some popular machine learning algorithms and their applications.
- Answer: Algorithms include:
- Linear Regression: Predicting continuous outcomes.
- Decision Trees: Classification and regression tasks.
- Random Forest: Ensemble learning for classification and regression.
- K-Nearest Neighbors (KNN): Classification based on nearest data points.
- Support Vector Machines (SVM): Classification and regression using hyperplanes.
- Explain the difference between classification and regression.
- Answer:
- Classification: Predicts categorical outcomes or class labels.
- Regression: Predicts continuous numerical values.
- What is overfitting in machine learning, and how can it be prevented?
- Answer: Overfitting occurs when a model learns noise or irrelevant patterns in the training data, leading to poor generalization on new data. Prevention methods include cross-validation, regularization, and reducing model complexity.
Evaluation Metrics in Machine Learning
- Name some evaluation metrics for classification models.
- Answer: Metrics include accuracy, precision, recall, F1-score, ROC-AUC (Receiver Operating Characteristic - Area Under Curve), and confusion matrix.
- How is accuracy calculated for a machine learning model?
- Answer: Accuracy measures the proportion of correctly predicted instances among all predictions, calculated as (TP + TN) / (TP + TN + FP + FN), where TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative.
- What is precision and recall? How are they related?
- Answer:
- Precision: Measures the proportion of correctly predicted positive instances among all predicted positives, calculated as TP / (TP + FP).
- Recall: Measures the proportion of correctly predicted positive instances among all actual positives, calculated as TP / (TP + FN).
- Precision and recall are inversely related; improving one often reduces the other.
Feature Engineering and Data Preprocessing
- Why is feature scaling important in machine learning?
- Answer: Feature scaling standardizes the range of features, preventing some features from dominating others, and improves the performance of models that rely on distance metrics or gradients (e.g., SVM, KNN).
- Name techniques for handling missing data in a dataset.
- Answer: Techniques include deletion (listwise or pairwise), imputation (mean, median, mode), prediction models (regression), and using algorithms that handle missing values (e.g., XGBoost).
- Explain feature extraction and its importance in machine learning.
- Answer: Feature extraction involves transforming raw data into a set of relevant features that facilitate model training and improve predictive performance by capturing meaningful information from the data.
Deep Learning and Neural Networks
- What is deep learning, and how does it differ from traditional machine learning?
- Answer: Deep learning is a subset of machine learning using neural networks with multiple layers to learn representations of data. It differs by automatically learning hierarchical features from data without explicit feature extraction.
- Describe the structure of a typical neural network.
- Answer: A neural network consists of an input layer, hidden layers (with neurons and activation functions), and an output layer. Connections between layers have weights adjusted during training to optimize performance.