You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have repeatedly encountered issues where we require immediate rewards to be non-negative, non-positive or in the interval [0,1]. This can be handled already in the parser, and we'd like to add an option in this issue that allows to shift the reward accordingly.
The text was updated successfully, but these errors were encountered:
The change has a significant impact on the performance of the planner, and most often not a positive one. I'll take a closer look at what is happening in the domains with the most significant differences.
Additionally, we should determine if there is code that can be made more efficient with the knowledge that rewards are non-negative or in [0,1].
What is the difference between issue-134-base-ipc14-gen and issue-134-v1-ipc14-gen? I assume genuine means no changes, zero to inf ensures that the reward is positive and zero to one ensures that the reward falls between [0,1]. Is that correct?
What is the difference between issue-134-base-ipc14-gen and issue-134-v1-ipc14-gen?
There shouldn't be any difference. I can only explain the non-negligible differences with the comparably low number of 30 runs. Hopefully this looks different with 100 runs.
I assume genuine means no changes, zero to inf ensures that the reward is positive and zero to one ensures that the reward falls between [0,1]. Is that correct?
We have repeatedly encountered issues where we require immediate rewards to be non-negative, non-positive or in the interval [0,1]. This can be handled already in the parser, and we'd like to add an option in this issue that allows to shift the reward accordingly.
The text was updated successfully, but these errors were encountered: