A method and an apparatus for reinforcement learning based energy bidding, adapted for an energy aggregator to determine the energy supply configuration between multiple energy suppliers and multiple energy demanders, are provided. In the method, a supply amount of each energy supplier and a demand amount of each energy demander are acquired. A total demand amount of the energy demanders is calculated and replied to each energy supplier, and a total supply amount of the energy suppliers is calculated and replied to each energy demander. An electricity purchase quotation determined by each energy demander according to respective demand amount and the total supply amount, and an electricity sale quotation determined by each energy supplier according to respective supply amount and the total demand amount are received. A linear programming method is adopted to determine the energy supply configuration between the energy suppliers and the energy demanders according to information. |