A hardware and software co-design system with a mixed-precision algorithm and a compute-in-memory(CIM)-based accelerator includes a memory, a processor and the CIM-based accelerator. The memory stores a plurality of sets of initial weight parameters of a pre-trained model and a plurality of sets of input parameters. The processor is electrically connected to the memory. The processor is configured to perform operations including performing an initial weight obtaining operation, a pruning quantization joint training operation and a mixed-precision quantization operation. The initial weight obtaining operation includes obtaining the sets of initial weight parameters of the pre-trained model from the memory. The pruning quantization joint training operation includes performing a pruning procedure on the sets of initial weight parameters to generate a plurality of sets of pruned weights. The mixed-precision quantization operation includes performing a filter-wise mixed-precision quantization training on a plurality of non-zero weights of the sets of pruned weights to generate a plurality of filter weights with different bit widths, and pairing the filter weights to generate a plurality of paired filter weight groups, and mixing the paired filter weight groups to generate a plurality of mixed-precision weights. The CIM-based accelerator is electrically connected to the memory and the processor, and receives the mixed-precision weights and the sets of input parameters. The CIM-based accelerator performs a CIM operation on the mixed-precision weights and the sets of input parameters to generate a plurality of CIM outputs. Therefore, the hardware and software co-design system with the mixed-precision algorithm and the CIM-based accelerator of the present disclosure can enable full-scale computations for mixed-precision networks and enhance utilization rates and computational speed. |