| A hardware and software co-design system with a mixed-precision algorithm and a computing-in-memory(CIM)-based accelerator includes a memory, a processor and the CIM-based accelerator. The processor performs operations including obtaining a plurality of sets of initial weight parameters of a pre-trained model from the memory; performing a pruning procedure on the sets of initial weight parameters to generate a plurality of sets of pruned weights; and performing a filter-wise mixed-precision quantization training on a plurality of non-zero weights of the sets of pruned weights to generate a plurality of filter weights with different bit widths, and pairing the filter weights to generate a plurality of paired filter weight groups, and mixing the paired filter weight groups to generate a plurality of mixed-precision weights. The CIM-based accelerator performs a CIM operation on the mixed-precision weights and a plurality of sets of input parameters to generate a plurality of CIM outputs. |