搜尋專利授權區
關鍵字
選單
專利授權區


專利授權區
專利名稱(中) 適用於神經網路運算的處理器
專利名稱(英) PROCESSOR FOR NEURAL NETWORK OPERATION
專利家族 中華民國:I782328
美國:2021-0173648(公開號)
專利權人 國立清華大學 100%
發明人 呂仁碩,羅允辰,郭宇鈞,張耘盛,黃健皓,吳潤身,丁文謙,溫戴興
技術領域 資訊工程,電子電機
專利摘要(英)
Therefore, an object of the disclosure is to provide a processor adapted for neural network operation. The processor can have the advantages of both of the conventional VP architecture and the conventional PE architecture. According to the disclosure, the processor includes a scratchpad memory, a processor core, a neural network accelerator and an arbitration unit (such as a multiplexer unit). The scratchpad memory is configured to store to-be-processed data, and multiple kernel maps of a neural network model, and has a memory interface. The processor core is configured to issue core-side read/write instructions (such as load and store instructions) that conform with the memory interface to access the scratchpad memory. The neural network accelerator is electrically coupled to the processor core and the scratchpad memory, and is configured to issue accelerator-side read/write instructions that conform with the memory interface to access the scratchpad memory for acquiring the to-be-processed data and the kernel maps from the scratchpad memory to perform a neural network operation on the to-be-processed data based on the kernel maps. The accelerator-side read/write instructions conform with the memory interface. The arbitration unit is electrically coupled to the processor core, the neural network accelerator and the scratchpad memory to permit one of the processor core and the neural network accelerator to access the scratchpad memory. Another object of the disclosure is to provide a neural network accelerator for use in a processor of this disclosure. The processor includes a scratchpad memory storing to-be-processed data and storing multiple kernel maps of a convolutional neural network (CNN) model. According to the disclosure, the neural network accelerator includes an operation circuit, a partial-sum memory, and a scheduler. The operation circuit is to be electrically coupled to the scratchpad memory. The partial-sum memory is electrically coupled to the operation circuit. The scheduler is electrically coupled to the partial-sum memory, and is to be electrically coupled to the scratchpad memory. When the neural network accelerator performs a convolution operation for an nth (n is a positive integer) layer of the CNN model, the to-be-processed data is nth-layer input data, and the following actions are performed: (1) the operation circuit receives, from the scratchpad memory, the to-be-processed data and nth-layer kernel maps which are those of the kernel maps that correspond to the nth layer, and performs, for each of the nth-layer kernel maps, multiple dot product operations of the convolution operation on the to-be-processed data and the nth-layer kernel map; (2) the partial-sum memory is controlled by the scheduler to store intermediate calculation results that are generated by the operation circuit during the dot product operations; and (3) the scheduler controls data transfer between the scratchpad memory and the operation circuit and data transfer between the operation circuit and the partial-sum memory in such a way that the operation circuit performs the convolution operation on the to-be-processed data and the nth-layer kernel maps so as to generate multiple nth-layer output feature maps that respectively correspond to the nth-layer kernel maps, after which the operation circuit provides the nth-layer output feature maps to the scratchpad memory for storage therein. Yet another object is to provide a scheduler circuit for use in a neural network accelerator of this disclosure. The neural network accelerator is electrically coupled to a scratchpad memory of a processor. The scratchpad memory stores to-be-processed data, and multiple kernel maps of a convolutional neural network (CNN) model. The neural network accelerator is configured to acquire the to-be-processed data and the kernel maps from the scratchpad memory so as to perform a neural network operation on the to-be-processed data based on the kernel maps. According to the disclosure, the scheduler includes multiple counters, each of which includes a register to store a counter value, a reset input terminal, a reset output terminal, a carry-in terminal, and a carry-out terminal. The counter values stored in the registers of the counters are related to memory addresses of the scratchpad memory where the to-be-processed data and the kernel maps are stored. Each of the counters is configured to, upon receipt of an input trigger at the reset input terminal thereof, set the counter value to an initial value, set an output signal at the carry-out terminal to a disabling state, and generate an output trigger at the reset output terminal. Each of the counters is configured to increment the counter value when an input signal at the carry-in terminal is in an enabling state. Each of the counters is configured to set the output signal at the carry-out terminal to the enabling state when the counter value has reached a predetermined upper limit. Each of the counters is configured to stop incrementing the counter value when the input signal at the carry-in terminal is in the disabling state. Each of the counters is configured to generate the output trigger at the reset output terminal when the counter value has incremented to be overflowing from the predetermined upper limit to become the initial value. The counters have a tree-structured connection in terms of connections among the reset input terminals and the reset output terminals of the counters, wherein, for any two of the counters that have a parent-child relationship in the tree-structured connection, the reset output terminal of one of the counters that serves as a parent node is electrically coupled to the reset input terminal of the other one of the counters that serves as a child node. The counters have a chain-structured connection in terms of connections among the carry-in terminals and the carry-out terminals of the counters, and the chain-structured connection is a post-order traversal of the tree-structured connection, wherein, for any two of the counters that are coupled together in series in the chain-structured connection, the carry-out terminal of one of the counters is electrically coupled to the carry-in terminal of the other one of the counters.
聯絡資訊
承辦人姓名 李曉琪
承辦人電話 03-5715131 #31061
承辦人Email hsiaochi@mx.nthu.edu.tw
我有興趣 BACK