-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fel/operator: Operator PR #124
Conversation
ce35be5
to
c3ef138
Compare
67ccf65
to
fdd8078
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work! I think it is more elegant and flexible than previously. I also like the new exception raised for gradient-based methods.
However, I think we should update the metrics.
Operator and regressionI am concerned about how this PR would support multi-output regression. For me what we want for perturbation based-methods is to compare the initial pred to the modified pred on all outputs. In this sense, we cannot reduce the output of the inference function to a scalar. Nonetheless, methods like Occlusion, when they aggregate sensitivity, they do not take this increase of dimension into account. I have no quick fix for this. And I do not think that taking the mean of the output is pertinent. However, gradient-based functions on multi-output regression task want the output to be reduced, otherwise, they would output one gradient for each output. Finally, we could reduce the output of the inference function for all methods if we take into account the ground truth. We can say that the inference function output is the mae for example. What do you think? My bad, I though about something, I think we can make it work by passing the groung truth or the prediction of the model in the target. Such as:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth it to define an interface for these operators, or at least, specify more clearly the types of the callable (ex. Callable[[tf.keras.Model, tf.Tensor, tf.Tensor], tf.Tensor]
). This should allow us to perform the check that Antonin's asking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor remarks concerning optimization and documentation!
Hello guys, I will summarize our discussion on this PR and list things to change:
The following points were deemed as further enhancement that will be taken into account in future PR:
|
91e126e
to
5bbc3b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one last modififcation, check_operator
function should be moved in my view. Otherwise LGTM
5bbc3b7
to
e01816d
Compare
e01816d
to
d1d6618
Compare
Operator(s) Abstraction
This PR introduces the concept of an operator, which aims to generalize attribution methods for a wide range of use-cases. The idea is as follows: to define an attribution method, we need any function that takes in the model (
f
), a series of inputs (x
) and labels (y
) and returns a scalar inR
.g(f, x, y) -> R
This function, called an operator, can be defined by the user (or by us) and then provides a common interface for all attribution methods that will call it (or calculate its gradient). As you can see, the goal is for attribution methods to have this function as an attribute (in more detail, this will give
self.inference_function = operator
at some point).Some Examples of Operators
f: R^n -> R^c
withc
being the number of classes andy
being one-hot vectors, then our operator simply boils down to:f: R^w×h×c -> R^w×h×r
withr
being the number of channels, there is no properly defined operator in the literature for segmentation, but we could imagine explaining a whole channel, withy
being a one-hot vector that explains channelr
, the operator therefore simply boils down to:Regarding bounding-box, an operator has already been defined in the literature with the D-RISE article. It consists of using the three IOU, objectness, and box classification scores to form... a scalar!
To explain concepts, for example with a model
f = c ⚬ g(x)
, witha = g(x)
and a factorizer that allows interpretinga
in a reduced dimension spaceu = factorizer(a)
, we can very well define the following operator:As you can see, many cases can be handled in this manner!
Implementation
Regarding implementation, there is a series of operators available in the file in
commons/operators
and the most important part -- the operator plug -- is located in theattributions/base.py
file. As discussed with @AntoninPoche & @lucashervier, I think you know where I'm coming from but the PyTorch implementation is not far and would be located here!Once this is done, I simply added the argument to all the attribution methods defined in the library.
This being a quite important PR (not in terms of code line but in terms of internal API change), I'm not against a careful reading and re-reading by the team! :)