-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nikita Tomin #9
Comments
Hi, thanks for your interest and sorry for issues that you are facing. This is a known issue that happens with the py4j Java Server initiated in your last run was not properly stopped ( the subprocess for running the java sever should be killed to release the server port). We have added the function to close the py4j connection after each training. Please refer to this update In addition, there are two easy ways to fix it: 1) change the java sever port number in line#27 of the Kundur example; 2) manually kill the corresponding java process (refer to the PID number). We are highly interested to learn you final comparison results. Shot me a msg/an email if you like to share. |
Thanks for the quick response! I added
For example, when starting another model |
Hi, sorry, this is one more issue with the original Kundur test code (we haven't updated it while we update other parts), env was initiated twice. Please check out this update. This should address your issue. You are also recommended to use v7 'from PowerDynSimEnvDef_v7 import PowerDynSimEnv' instead of v5 for env definition. If you do this, the jar lib need to be updated too, by setting jar_file = '/lib/RLGCJavaServer0.93.jar' |
Thank you! The problem with the java server has been fixed. However, the training process still takes one step and ends after this one iteration. Previously, many steps were started and the modeling lasted quite a while. Moreover, in essence, the reward array is empty, that is, training does not occur.
|
You need to change the following callback() or set callback=None in the training main() function |
Thank you so much. Now I understand why the training process stopped after the first step. I made changes to the code and ran it. The training lasted about one hour ( However, when I decided to re-run the code, the training lasted much faster, between the stages they became fast. At the same time, training could be completed maybe in 5-6 steps of fault modeling. Even if I already set I apologize for asking you so much. |
Great to learn you can run it now. I think the time is reasonable. I run it on my end, and below is the printout. The summary showed it ran about 9000 steps, and ended in ~7 mins. For RL training, it easily require 1 million steps to get a well-trained solution. So you will need to increase the time_steps significantly. | % time spent exploring | 2 |
|
Yes, you are right! I remember that RL algorithms enjoy their very own Groundhog Day. I ran also it on my work computer, and summary showed it ran about 90000 steps, and ended in ~45 mins.
Thank you for your support and help! I have other little question. I wanted to see (and to visualize) obtainded training results and have tried to make it.
I think, my saved files in storedData folder are empty. |
We keep this openAI baseline implementation here only because our previous paper used this version. You can switch to Stable-Baselines instead of openAI baseline to get the full Tensorboard support. We already use it in the IEEE39 bus system training, as you can see in the codes. |
According your advice, I switched to Stable-Baselines instead of openAI baseline in the Kundur system training.
I used the following environment settings: However after 900000 steps of training DQN agent cannot find a good policy. Please see average reward progress plot https://www.dropbox.com/preview/DQN_adaptivenose.png?role=personal I used the following env settings
Mu suggestion is that in the baseline scenario https://www.dropbox.com/preview/no%20actions%20case.png?role=personal However it's only my suggestion, I can wrong. I thought to try scenarios with increasing load in order to get for sure loss of stability during simulation. |
Traceback (most recent call last): |
anyone who can help me how to resolve above issue |
Dear colleagues! Thank you! You have developed a very interesting tool. We would like to compare a performance of your DRL-based dynamic brake with our dynamic brake model based on the sub-Grammians method.
However, when I started RLGC tool, I met with a strange problem. The first time I installed your tool on my home laptop (Windows 10, Python 3.7), fully following your instructions. And everything worked perfectly. I started training the model of a dynamic brake, training steps started and there were no warnings and errors. Then I decided to install and run your tool on my working computer (Ubuntu, Python 3.7), which has GPUs and is a powerful workstation. The installation was successful, but when I started the Kundur scheme model training, the following warning occurred:
`Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)
Caused by: py4j.Py4JNetworkException
at py4j.GatewayServer.startSocket(GatewayServer.java:788)
at py4j.GatewayServer.start(GatewayServer.java:763)
at py4j.GatewayServer.start(GatewayServer.java:746)
at org.pnnl.gov.pss_gateway.IpssPyGateway.main(IpssPyGateway.java:1143)
... 5 more
Caused by: java.net.BindException: Address already in use: JVM_Bind
at java.net.DualStackPlainSocketImpl.bind0(Native Method)
at java.net.DualStackPlainSocketImpl.socketBind(Unknown Source)
at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
at java.net.PlainSocketImpl.bind(Unknown Source)
at java.net.ServerSocket.bind(Unknown Source)
at py4j.GatewayServer.startSocket(GatewayServer.java:786)
... 8 more
`
In this case, the training process either abruptly ends (there may be one iteration), or does not start at all and an error appears:
py4j.protocol.Py4JJavaError: An error occurred while calling t.initStudyCase.
When I returned home, I met the same java warnings began to appear on my laptop when I ran code
python trainKundur2areaGenBrakingAgent.py
. This is very strange considering that earlier everything worked well on a home laptop. However, other modelstrainIEEE39LoadSheddingAgent _ *.py
work fine.I understand that is something related to your py4j.protocol. However, I'm a loser in java and I can't understand why such happened.
The text was updated successfully, but these errors were encountered: