A method and an apparatus for peer-to-peer energy sharing based on reinforcement learning are provided. The method includes following steps: uploading a trading electricity of a future time slot to a coordinator device, and receiving a global trading information obtained by the coordinator device integrating the trading electricity uploaded by each user device; using the global trading information, electricity information of itself and an internal power price to define multiple power states, and estimating an electricity cost of arranging trading electricity under each power state to generate a Q table; establishing a planning model by using the global trading information, estimating electricity costs for arranging the trading electricity of multiple time slots under each power state, and using the same to update the Q table; and predicting the trading electricity suitable to be arranged under a current power state by using the Q table, and uploading the same to the coordinator device for trading. |