Network architecture understanding using keras code

My data is of 68871 x 43, where the features are in the column no. 1-43 and each label is represented as a 1x21 vector

In my keras code:

  • print trainX.shape -----> (41311, 10, 43)
  • print trainY.shape -----> (41311, 21)
  • print testX.shape ------> (27538, 10, 43)
  • print testY.shape ------> (27538, 21)

When I run the following keras code:

model = Sequential()
model.add(LSTM(10, input_dim=43))
model.add(Dropout(0.3))
model.add(Dense(21, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(trainX, trainY, validation_split=0.20, nb_epoch=1, batch_size=1, shuffle=False)
scores = model.evaluate(testX, testY, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

The code executes like this enter image description here

My understanding for the keras toolkit is that, if the 3D tensor for trainX is of the shape (41311,10,43) then keras should create a lstm with 10 timestep. If this is so then why are the samples in the screenshot run one by one; if the timestep is 10 then it should have run in the batches of 10 and get 10 prediction and then run the next 10 batches of samples.

Anyone can answer me: why the screenshot shows the samples run one by one when the timestep is 10?


By sample they understand a numpy array of shape (10,43). The first dimension is your number of samples.

So what the network is doing is : 1) split the input into batches of shape (batch, 10, 43)

2) feed the lstm, one sample at a time. A sample being a sequence of 10 events/tensors of length 43 each. So each sequence is 1 sample.

Is that clear?