A block-based inference method for a memory-efficient convolutional neural network implementation is performed to process an input image. A block-based inference step is performed to execute a multi-layer convolution operation on each of a plurality of input block data to generate an output block data and includes selecting a plurality of ith layer recomputing features according to a position of the output block data along a scanning line feed direction, and then selecting an ith layer recomputing input feature block data according to the position of the output block data and the ith layer recomputing features, and selecting a plurality of ith layer reusing features according to the ith layer recomputing input feature block data along a block scanning direction, and then combining the ith layer recomputing input feature block data with the ith layer reusing features to generate an ith layer reusing input feature block data. |