Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling uneven number of batches per replicated instance of a layer #59

Open
siddharth9820 opened this issue Nov 19, 2020 · 0 comments

Comments

@siddharth9820
Copy link

This is in reference to this function in runtime.py

def num_iterations(self, loader_size):
        """ Determines number of iterations for this stage
        TODO: don't currently support uneven configurations.
        """
        if self.stage == 0 or self.stage is None:
            return loader_size

        num_iterations = loader_size * self.num_ranks_in_first_stage
        assert num_iterations % self.num_ranks_in_stage == 0
        num_iterations = num_iterations // self.num_ranks_in_stage

        return num_iterations

From my understanding, the total number of batches in the dataset should be a multiple of the layer replication factor for all layers except the first one for this function to not throw an assertion error. However, there is no guarantee that the optimizer module of pipedream will assign replication factors so that they follow this constraint as well. As a result, sometimes the framework is unable to execute training because of this limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant