Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to shift immediate rewards #134

Open
thomaskeller79 opened this issue Jul 15, 2021 · 3 comments
Open

Add option to shift immediate rewards #134

thomaskeller79 opened this issue Jul 15, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@thomaskeller79
Copy link
Member

We have repeatedly encountered issues where we require immediate rewards to be non-negative, non-positive or in the interval [0,1]. This can be handled already in the parser, and we'd like to add an option in this issue that allows to shift the reward accordingly.

@thomaskeller79 thomaskeller79 added the enhancement New feature or request label Jul 15, 2021
@thomaskeller79
Copy link
Member Author

Here are the results for v1:

https://ai.dmi.unibas.ch/_tmp_files/tkeller/issue-134-v1-report-default.html
https://ai.dmi.unibas.ch/_tmp_files/tkeller/issue-134-v1-report-all.html

The change has a significant impact on the performance of the planner, and most often not a positive one. I'll take a closer look at what is happening in the domains with the most significant differences.

Additionally, we should determine if there is code that can be made more efficient with the knowledge that rewards are non-negative or in [0,1].

@geisserf
Copy link
Contributor

What is the difference between issue-134-base-ipc14-gen and issue-134-v1-ipc14-gen? I assume genuine means no changes, zero to inf ensures that the reward is positive and zero to one ensures that the reward falls between [0,1]. Is that correct?

@thomaskeller79
Copy link
Member Author

What is the difference between issue-134-base-ipc14-gen and issue-134-v1-ipc14-gen?

There shouldn't be any difference. I can only explain the non-negligible differences with the comparably low number of 30 runs. Hopefully this looks different with 100 runs.

I assume genuine means no changes, zero to inf ensures that the reward is positive and zero to one ensures that the reward falls between [0,1]. Is that correct?

Yes, all of these are correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants