论文标题
基于多代理增强学习的最佳控制可持续废水处理厂
Optimal control towards sustainable wastewater treatment plants based on multi-agent reinforcement learning
论文作者
论文摘要
废水处理厂旨在消除污染物并减轻环境污染。但是,WWTPS的构建和运行消耗资源,排放温室气体(GHG)并产生残留的污泥,因此需要进一步优化。 WWTP由于高非线性和变化而进行控制和优化是复杂的。这项研究使用了一种新型的技术,多代理的深钢筋学习,同时优化了WWTP中溶解的氧气和化学剂量。奖励功能是从生命周期的角度专门设计的,以实现可持续优化。考虑了五种情况:基线,三种不同的废水质量和面向成本的方案。结果表明,与基线情况相比,基于LCA的优化具有较低的环境影响,因为成本,能源消耗和温室气体排放量分别降低到0.890 CNY/M3-WW,0.530 kWh/m3-ww,2.491 kg CO2-EQ/M3-WW。以成本为导向的控制策略表现出与LCA驱动策略相当的总体绩效,因为它牺牲了环境范围,但成本较低为0.873 CNY/M3-WW。值得一提的是,考虑到影响转移,应基于资源进行WWTP的改造。具体而言,与基线相比,LCA SW场景在富营养化潜力中降低了10 kg PO4-EQ,而10天内则显着增加了其他指标。确定每个指标的主要贡献者,以进行未来的研究和改进。最后,作者讨论了新型的动态控制策略需要高级传感器或大量数据,因此选择控制策略也应考虑经济和生态条件。
Wastewater treatment plants are designed to eliminate pollutants and alleviate environmental pollution. However, the construction and operation of WWTPs consume resources, emit greenhouse gases (GHGs) and produce residual sludge, thus require further optimization. WWTPs are complex to control and optimize because of high nonlinearity and variation. This study used a novel technique, multi-agent deep reinforcement learning, to simultaneously optimize dissolved oxygen and chemical dosage in a WWTP. The reward function was specially designed from life cycle perspective to achieve sustainable optimization. Five scenarios were considered: baseline, three different effluent quality and cost-oriented scenarios. The result shows that optimization based on LCA has lower environmental impacts compared to baseline scenario, as cost, energy consumption and greenhouse gas emissions reduce to 0.890 CNY/m3-ww, 0.530 kWh/m3-ww, 2.491 kg CO2-eq/m3-ww respectively. The cost-oriented control strategy exhibits comparable overall performance to the LCA driven strategy since it sacrifices environmental bene ts but has lower cost as 0.873 CNY/m3-ww. It is worth mentioning that the retrofitting of WWTPs based on resources should be implemented with the consideration of impact transfer. Specifically, LCA SW scenario decreases 10 kg PO4-eq in eutrophication potential compared to the baseline within 10 days, while significantly increases other indicators. The major contributors of each indicator are identified for future study and improvement. Last, the author discussed that novel dynamic control strategies required advanced sensors or a large amount of data, so the selection of control strategies should also consider economic and ecological conditions.