Deep Learning
Interview Questions with Answers

Explore Our Courses
1. What is Deep Learning and how is it different from Machine Learning?
Answer:
Deep Learning is a subset of Machine Learning that uses neural networks with many layers to learn from data. Unlike ML, deep learning can automatically extract features and is especially useful for image, text, and audio data.
2. What is a neuron in a neural network?
Answer:
A neuron (or node) is a basic unit in a neural network that takes input, applies a weight and bias, passes through an activation function, and produces output.
3. How do you create a simple neural network using Keras in Python?
Answer:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, input_shape=(5,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Lorem Ispum
4. What is an activation function? Give examples.
Answer:
Activation functions introduce non-linearity in neural networks.
Common examples:
- ReLU: max(0, x)
- Sigmoid: squashes between 0 and 1
- Tanh: squashes between -1 and 1
5. What is forward propagation in deep learning?
Answer:
Forward propagation refers to how inputs are passed through the layers of the network to generate predictions.
6. What is backpropagation?
Answer:
Backpropagation is the process of updating weights in a neural network using the error/loss computed from output and propagating it backward to adjust weights via gradient descent.
7. What is an epoch in deep learning?
Answer:
One epoch is one complete pass through the entire training dataset by the model.
8. How do you compile and train a neural network model in Keras?
Answer:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32)
9. What is the difference between batch size and epoch?
Answer:
- Epoch: One full pass through the training data.
- Batch Size: Number of samples processed before updating the weights once.
10. What is overfitting in neural networks and how to prevent it?
Answer:
Overfitting happens when the model learns noise instead of patterns.
Prevention:
- Use Dropout
- Apply Early Stopping
- Reduce model complexity
- Use more data
11. What is dropout in deep learning?
Answer:
Dropout randomly turns off neurons during training to prevent overfitting.
from tensorflow.keras.layers import Dropout
model.add(Dropout(0.5)) # 50% of neurons dropped during training
12. How do you evaluate a deep learning model in Python?
Answer:
loss, accuracy = model.evaluate(X_test, y_test)
print(f”Test Accuracy: {accuracy}”)
13. What is a loss function and why is it important?
Answer:
A loss function measures how far the predicted output is from the actual output. It guides the training by calculating gradients.
Examples: mean_squared_error, binary_crossentropy, categorical_crossentropy
14. What is the difference between shallow and deep neural networks?
Answer:
- Shallow NN: 1 or 2 hidden layers
- Deep NN: More than 2 hidden layers
Deep networks can learn more complex representations.
15. What Python libraries are used for Deep Learning?
Answer:
- TensorFlow – Open-source DL framework by Google
- Keras – High-level API for TensorFlow
- PyTorch – Dynamic computation graph (by Facebook)
- OpenCV – For computer vision tasks
- NumPy, Pandas – For data processing
16. What is the difference between softmax and sigmoid activation functions?
Answer:
- Sigmoid: Outputs a value between 0 and 1, used for binary classification.
- Softmax: Outputs a probability distribution across multiple classes, used for multi-class classification.
from tensorflow.keras.layers import Activation
# For multi-class:
model.add(Dense(3, activation=’softmax’))
17. How do you prevent exploding or vanishing gradients?
Answer:
- Use ReLU instead of sigmoid/tanh
- Use Batch Normalization
- Apply Gradient Clipping
- Use appropriate weight initialization like He or Xavier
18. What is Batch Normalization and how is it implemented in Python?
Answer:
It normalizes the output of a layer to improve convergence speed and stability.
from tensorflow.keras.layers import BatchNormalization
model.add(BatchNormalization())
19. What is the function of different optimizers in Deep Learning?
Answer:
- SGD: Simple gradient descent
- Adam: Adaptive learning rates, fast and commonly used
- RMSprop: Good for RNNs
- Adagrad: Works well with sparse data
model.compile(optimizer=’adam’, …)
20. What is weight initialization and why is it important?
Answer:
Proper weight initialization avoids problems like slow convergence or vanishing gradients.
Examples:
- He initialization for ReLU
- Xavier initialization for sigmoid/tanh
Lorem Ispum
21. How do Convolutional Neural Networks (CNNs) work?
Answer:
CNNs use convolutional layers to extract spatial features from input (e.g., images) by sliding filters over it.
Layers: Conv2D → Activation → Pooling → Flatten → Dense
22. How do you define a CNN layer using Keras?
Answer:
from tensorflow.keras.layers import Conv2D, MaxPooling2D
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
23. What is padding in CNN and what are its types?
Answer:
Padding preserves the input size or adjusts feature map dimensions.
Types:
- Valid: No padding (output size reduces)
- Same: Adds padding so output size = input size
24. What is pooling and why is it used?
Answer:
Pooling reduces the spatial size of feature maps and computation.
- Max Pooling: Takes the max value
- Average Pooling: Takes the average
25. What is early stopping in deep learning?
Answer:
Early stopping stops training when validation loss stops improving, preventing overfitting.
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(patience=3)
model.fit(X, y, callbacks=[early_stop])
26. What is model checkpointing and how is it used?
Answer:
Model checkpointing saves the model after each epoch when performance improves.
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint("best_model.h5", save_best_only=True)
27. What are the steps to preprocess image data for CNN?
Answer:
- Resize all images to the same size
- Convert to array using img_to_array()
- Normalize pixel values (0 to 1)
- Augment data if needed
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)
28. What is a confusion matrix and how do you plot it in deep learning classification tasks?
Answer:
from sklearn.metrics import confusion_matrix
import seaborn as sns
y_pred = model.predict(X_test)
cm = confusion_matrix(y_true, y_pred.argmax(axis=1))
sns.heatmap(cm, annot=True)
It shows how well the model performs across different classes.
29. How do you calculate precision, recall, and F1-score in Python for deep learning models?
Answer:
from sklearn.metrics import precision_score, recall_score, f1_score
precision = precision_score(y_true, y_pred, average='macro')
recall = recall_score(y_true, y_pred, average='macro')
f1 = f1_score(y_true, y_pred, average='macro')
30. What is Transfer Learning and how is it implemented using Keras?
Answer:
Transfer Learning uses pre-trained models (e.g., VGG, ResNet) and fine-tunes them on your custom dataset.
from tensorflow.keras.applications import VGG16
base_model = VGG16(include_top=False, input_shape=(224, 224, 3))
for layer in base_model.layers:
layer.trainable = False
31. What is an RNN and when should you use it?
Answer:
RNN (Recurrent Neural Network) is used for sequential data like time series or text. It maintains memory of previous inputs through hidden states.
Use cases: Sentiment analysis, text generation, stock prediction.
32. How do you define an RNN in Keras?
Answer:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
model = Sequential([
SimpleRNN(64, input_shape=(timesteps, features), return_sequences=False),
Dense(1, activation='sigmoid')])
33. What is the vanishing gradient problem and how is it solved?
Answer:
In deep RNNs, gradients become too small during backpropagation, causing slow learning.
Solutions:
- Use LSTM or GRU
- Apply ReLU activations
- Use gradient clipping
34. What is the difference between LSTM and GRU?
Answer:
Feature | LSTM | GRU |
Gates | 3 (input, forget, output) | 2 (reset, update) |
Memory | Separate memory cell | Combined hidden state |
Performance | Slightly better | Faster to train |
35. How do you implement an LSTM layer in Keras?
Answer:
from tensorflow.keras.layers import LSTM
model.add(LSTM(64, input_shape=(timesteps, features)))
36. What are embeddings in NLP and how are they used in deep learning?
Answer:
Embeddings are dense vector representations of text. They capture semantic relationships between words.
from tensorflow.keras.layers import Embedding
model.add(Embedding(input_dim=10000, output_dim=128, input_length=100))
37. What is the role of return_sequences and return_state in RNNs/LSTMs?
Answer:
- return_sequences=True: returns the output for each timestep (for stacked RNNs)
- return_state=True: returns the final hidden state (for decoding or transfer learning)
38. What is categorical crossentropy and when is it used?
Answer:
Used for multi-class classification where labels are one-hot encoded. It measures the difference between predicted probabilities and actual classes.
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’)
39. How do you apply learning rate scheduling in training deep models?
Answer:
from tensorflow.keras.callbacks import LearningRateScheduler
def lr_schedule(epoch):
return 0.001 * (0.1 ** (epoch // 10))
scheduler = LearningRateScheduler(lr_schedule)
It dynamically adjusts the learning rate during training.
40. How do you visualize the training progress of a model?
Answer:
import matplotlib.pyplot as plt
history = model.fit(...)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
Shows training/validation loss or accuracy over epochs.
Lorem Ispum
41. What is model overfitting and how can you detect it from training curves?
Answer:
If training accuracy is high and validation accuracy is low, the model is overfitting.
A gap between training and validation loss curves is a strong indicator.
42. What is model fine-tuning and how is it done in Keras?
Answer:
Fine-tuning involves unfreezing some layers of a pre-trained model and retraining it on a new dataset.
for layer in model.layers[-5:]:
layer.trainable = True
43. What is the difference between functional and sequential API in Keras?
Answer:
- Sequential API: For linear stack of layers.
- Functional API: For building non-linear, multi-input/output models.
# Functional
input = Input(shape=(32,))
x = Dense(64)(input)
output = Dense(1)(x)
model = Model(inputs=input, outputs=output)
44. What is a custom loss function in Keras and how do you define it?
Answer:
import tensorflow.keras.backend as K
def custom_loss(y_true, y_pred):
return K.mean(K.square(y_true – y_pred), axis=-1)
model.compile(loss=custom_loss, optimizer=’adam’)
Used when built-in loss functions aren’t sufficient.
45. What is data augmentation and how do you use it for training image models?
Answer:
Data augmentation artificially expands your dataset with transformations (rotation, flip, zoom).
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20, zoom_range=0.2, horizontal_flip=True)
Helps reduce overfitting and improve generalization.
46. What is the Attention Mechanism in deep learning, and how is it implemented?
Answer:
The attention mechanism allows a model to focus on relevant parts of the input sequence during output generation. It computes weighted importance (attention scores) for each input token.
Implementation:
import tensorflow as tf
def scaled_dot_product_attention(q, k, v):
matmul_qk = tf.matmul(q, k, transpose_b=True)
d_k = tf.cast(tf.shape(k)[-1], tf.float32)
scaled_attention_logits = matmul_qk / tf.math.sqrt(d_k)
attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)
return tf.matmul(attention_weights, v)
47. What are Transformers and why are they important?
Answer:
Transformers are sequence models that replace RNNs and CNNs using self-attention and positional encoding. They enable parallelization and outperform in NLP and Vision tasks.
Popular models: BERT, GPT, ViT (Vision Transformer)
48. What are Generative Adversarial Networks (GANs)?
Answer:
GANs consist of two networks:
- Generator: Produces fake data
- Discriminator: Distinguishes real from fake
They are trained adversarially:
# Pseudo-code
for step in training_steps:
train_discriminator()
train_generator()
Used in image synthesis, super-resolution, and deepfake generation.
49. What is Layer Normalization and how does it differ from Batch Normalization?
Answer:
- BatchNorm normalizes across a batch
- LayerNorm normalizes across features for a single sample
Useful in Transformer models
from tensorflow.keras.layers import LayerNormalization
model.add(LayerNormalization())
50. How does BERT work and how is it fine-tuned for a classification task?
Answer:
BERT is a Transformer-based language model pre-trained on masked language modeling and next sentence prediction.
Implementation to fine-tune:
from transformers import BertTokenizer, TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
51. What is multi-head attention and why is it useful?
Answer:
It allows the model to learn different attention patterns in parallel, capturing diverse relationships.
from tensorflow.keras.layers import MultiHeadAttention
attention = MultiHeadAttention(num_heads=4, key_dim=64)
output = attention(query, value, key)
52. How do you implement a custom callback in Keras for monitoring metrics?
Answer:
from tensorflow.keras.callbacks import Callback
class CustomLogger(Callback):
def on_epoch_end(self, epoch, logs=None):
print(f"Epoch {epoch}: Val Loss = {logs['val_loss']:.4f}")
Used for early stopping, custom logging, or model behavior control.
53. What is label smoothing and how is it implemented?
Answer:
Label smoothing replaces 1-hot encoded labels (0s and 1s) with soft values (e.g., 0.9 and 0.1) to prevent overconfidence.
model.compile(loss=’categorical_crossentropy’, label_smoothing=0.1)
54. How can you deploy a deep learning model using TensorFlow Serving?
Answer:
- Export the model:
model.save(‘my_model’)
- Start TensorFlow Serving:
tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path=/models/my_model
- Make predictions via REST API using curl or requests.
55. What is model quantization and when is it used?
Answer:
Model quantization reduces model size and speeds up inference by converting weights from float32 to int8.
converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
Used in mobile and edge deployments.
56. Explain knowledge distillation in deep learning.
Answer:
A student model learns to mimic a teacher model’s soft labels (probability distribution).
Improves smaller model accuracy with fewer parameters.
57. What is a Vision Transformer (ViT) and how does it differ from CNNs?
Answer:
ViTs treat images as patch sequences, like tokens in NLP.
They do not use convolution but apply self-attention to image patches.
ViTs outperform CNNs with sufficient data and compute.
58. How do you implement gradient clipping in Keras?
Answer:
from tensorflow.keras.optimizers import Adam
opt = Adam(clipnorm=1.0) # or clipvalue
model.compile(optimizer=opt, loss='mse')
Prevents exploding gradients during backpropagation.
59. What is class imbalance and how do you handle it in loss function?
Answer:
Class imbalance occurs when one class has significantly more samples than others. This can cause the model to become biased toward the majority class and perform poorly on the minority class.
Solution 1: Using class_weight in model training
You can assign higher weights to the minority class during training using the class_weight argument:
model.fit(X_train, y_train, class_weight={0: 1.0, 1: 3.0}) # Class 1 is weighted more
This tells the model to pay more attention to class 1 during optimization.
Solution 2: Using Focal Loss
Focal Loss is especially useful when class imbalance is severe. It reduces the loss contribution from well-classified examples and focuses more on hard, misclassified examples.
Formula:
Where:
- αt balances importance of classes
- γ focuses more on hard-to-classify examples
Focal Loss Implementation (TensorFlow/Keras)
import tensorflow as tf
from tensorflow.keras import backend as K
def focal_loss(gamma=2., alpha=0.25):
def loss(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1. - K.epsilon())
pt = tf.where(K.equal(y_true, 1), y_pred, 1 - y_pred)
return -K.mean(alpha * K.pow(1. - pt, gamma) * K.log(pt))
return loss
# Usage
model.compile(optimizer=’adam’, loss=focal_loss(gamma=2., alpha=0.25), metrics=[‘accuracy’])
60. What is the difference between static and dynamic computation graphs?
Answer:
- Static (TensorFlow 1.x): Graph is compiled before execution.
- Dynamic (PyTorch): Graph is created on the fly during runtime.
Dynamic graphs are easier to debug and write using native Python control flows.