-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Error Adapter #741
Conversation
Signed-off-by: Yordis Prieto Lazo <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto Lazo <yordis.prieto@gmail.com>
Signed-off-by: Yordis Prieto Lazo <yordis.prieto@gmail.com>
In practice, I am not sure if this will really work out. My team uses CloudEvents for error reporting as well, both as domain events in our pub/sub systems and also when reporting errors to API consumers. We almost made a proposal very similar to this, but ultimately decided against it. I'll try to outline the reasons. Ultimately, it really boils down to one problem: domain errors just don't have enough structural overlap - particularly, structural overlap that can't be handled by the existing CloudEvents spec and proper use of existing schemas. prior art This has actually been tried before, for HTTP requests specifically: see Problem JSON and JSON:API errors. However, those solutions are not suitable for all environments; they are too reliant on JSON and HTTP to be effective for an event-driven, cross-platform error scheme. Even when compared to the above - probably the closest anyone has ever gotten to a universal error schema - there are essentially only two fields not explicitly covered by CloudEvents: a human-readable message and a detail link. The detail link can be covered with creative use of the existing attributes, however, such as Other domain-specific error data must be provided as either an additional data object (as you have here) or additional attributes (the way Problem JSON handles it). how to provide schema for domain-specific error data? Lots of errors have highly domain-specific data that needs to be passed along, and that data is often crucial to the handling of the error - not simply "metadata" that can be dumped into a log for human debugging purposes. It is not sufficient to leave that data untyped - middleware and other abstract event processors need to be able to figure out what kind of data is there if you want to have any kind of generic handling, such as a common way of dealing with data rollback. The problem is: how do you provide a schema definition for those properties? There really is no way to provide a schema definition for extra suggestions Given the above, I think the best thing that can be done to improve the CloudEvents spec in this regard is to create specific adapters for particular existing error structures, such as Problem JSON and JSON:API errors. In those cases though, I think it will probably turn out to be best to leave the Beyond that, I think what the industry really needs is a collaborative effort on a broader mission of creating standardized error schemas for the myriad categories of domain errors that occur. Then we can simply attach those schemas to CloudEvents and be done with this once and for all. |
Which schemas? I agree with you that CloudEvents has most of the data that we would need. That is why I re-wrote my ADR (which is how the conversation started) to take advantage of CloudEvents's structure and add more alignment around the event's data.
Would you mind expanding on that?
Totally, Just that at that point, you lost the ability to add some data that would allow you to do i18n (if you want to send the event out to some clients that may care about that) or do i18n all together, which many people opt-in to do so, you can still use the If that is your case, I think it is totally valid, but personally, I much rather bring more alignment. In case you don't care, you can ignore the field. It is not mandatory.
That is why you have other fields that allow you to identify the error itself, such as the
We can't if we want to have some alignment and consider multiple concerns like i18n or helping programmers. Which we could remove Still, from my experience, there is always a balance between "what is technically correct" (whatever that means) and dealing with the fact that they are always a trade-off.
That is where I strongly disagree. That is where misguided decisions are made since you rely on people's experiences and ability to "creative" (take my proposal as an example) Would you mind helping me to identify how I accomplish the intention without following the proposal? What would be an alternative? |
You said you rewrote your ADR - out of curiosity, what was your original proposal?
I think I'd love to see an open-source project called So far every attempt to generalize on errors has basically gone the route of "it's a message and a link, and then a bucket of who-knows-what" and one of my favorite things about CloudEvents is the capability to use schema to define the shape of your payload. I realize that's not a workable suggestion in any sort of near term. Perhaps instead I'll share what we were thinking of proposing. My first instinct was actually to open a PR suggesting that we actually create a Originally we had thought to name the attribute We elected not to suggest that after reading carefully the documentation, particularly this:
It's hard to make the argument that a human-readable message falls under routing and processing concerns. It seems like it belongs in
It seems to me that adding an extension attribute What do you think? |
Nevermind to my question above about your ADR, I found the PR against your ADR repository since you tagged this PR in a comment there. Seems you are interested in essentially the same problem I am. I would be interested in pursuing some comprehensive research into existing error patterns as I mentioned above. I don't think the resulting work necessarily belongs in the CloudEvents spec, but I do think the projects would be complimentary - essentially providing a set of schemas that are used in CloudEvent data - particularly errors, since they can often be generalized to a significant extent across domains. |
Can you define error? An error could be of technical nature - something that a developer needs to have a look at and fix. Given that a human has to have a look, interoperability and portability don't strike me as the most important goals. An error could also be a state in the domain. E.g. if you're trying to charge a credit card, your payment provider may return either a If we encounter an event/case our code doesn't handle yet, we need to figure out how to handle the event - read the documentation, look at the schema etc. However, that is independent of whether the event represents a success or an error in our domain. I am not sure why we should treat these differently. |
Personally, I need to update my ADR to add even more constraints where the metadata is a Map<string, string>, so you can't have nested values, and everything is a string.
Right, agree, that is why I am leveraging the spec to handle errors.
The original spec was using extensions, I proposed the last meeting and some people suggest that we don't need to use extensions when we could use an adapter instead, and add the schema of the data. |
We do 🤷🏻 I would like to hear more precise feedback since your comment is amazing, and I would like to remove the subjectivity as much as possible. Proposing some alternative would be amazing. |
Exactly. There are basically two different types of errors: unexpected errors that the user can't do anything about, and errors that are actually expected domain results. In the case of the former, there is really nothing useful to show to the user other than "whoops, something went wrong, try again later or call support with event ID In the case of the latter, we're not really talking about an "error". An error in this case is, in fact, a domain result - indicating that the user's request couldn't be executed. When I said that we use CloudEvents to represent errors in pub/sub, this is what I meant - domain-specific, expected events (which nevertheless may be consumed by parties who would benefit from having a schema attached). This is what I meant by "there is not sufficient structural overlap in error types". Errors that are actually domain results are going to have domain specific concerns. If we're talking about messages sent to a client app, and your domain chooses to provide the user with a link to an error page, that's one approach; if your domain instead encodes a list of steps the user needs to take, such as a list of missing fields they need to fill out before their user profile is "complete", that's another approach. The UI has to know what that shape is and it is likely to be context-specific. If your API is consumed by third parties, it is useful to have a well-defined schema, and send it along with your message. If we're talking about domain events being published to pub/sub, you're going to need to provide some information that allows consuming services to handle the event somehow... and that information should have a schema. There are relatively few cases, notably in the case of developer tools, where you actually want to take "a bucket of who-knows-what" in the form of untyped metadata and just show it to the user. And if you are building a developer tool, "a bucket of who-knows-what" is domain specific! So in the end, what we are left with is:
It's possible that sending Really the only thing about errors that can be generalized is that there is a human-readable string involved - but even that is domain-specific, because some domains want to translate it, and some domains don't want to provide a message at all because the front-end decides what to show the user based on the encoded This is why we didn't go through with suggesting the
I think the alternative is that we don't solve this problem with changes to the CloudEvents spec. Trying to create a schema to solve "errors" is out of scope for this spec. Users of CloudEvents need to define whatever schemas are appropriate for the various domain events in their contexts. This is as true for failure events as it is for success events. The error schema in your PR is specific to your domain and belongs in your application or a team repository. A Map of strings is not sufficient for all domain use cases, and the "general" solution is what already exists in CloudEvents - a The goal of CloudEvents isn't to come up with a unified set of domain failure schemas. It's to come up with an envelope for portable and interoperable events, that is, to define a consistent envelope shape for inter-service communication (events) so that services utilizing pub/sub can be deployed to multiple platforms (portability), and so that systems that span platforms don't have to constantly be mapping between Azure/Google/AWS/IBM event shapes (interoperability). If you want to tackle the problem of portability and interoperability of errors across domains, I think a lot more work needs to be done - specifically, analyzing existing errors across lots of systems and coming up with categories and schemas for various well-known and oft-used domain errors. The resulting schemas could be shared by the open source community and used as payloads in CloudEvents. But they are complimentary solutions - not two parts of the same solution. |
Which CloudEvents spec? This is not part of CloudEvents spec directly but an adapter. Does that mean CloudEvents shouldn't tell you how to define GitHub events https://github.com/cloudevents/spec/blob/v1.0/adapters/github.md#github-cloudevents-adapter or any other events for that matter https://github.com/cloudevents/spec/tree/v1.0/adapters? I think it is important to understand the differences between CloudEvents Spec and Adapters.
Hint where we are right now. It is hard for me to disagree with you, but for the wrong reasons. Most of your concerns have been thought out before, I am not sure if you understand that, none of this comes from a misunderstanding of the situation. Although what you are saying is completely valid, when it comes to trying to tackle the situation, most of your concerns fall into such schema decision (hint, as an adapter, that is not part of the spec), or at least my understanding from what you are saying suggest that. The problem is that you focusing on things that are related to errors, but no part of what this adapter is trying to tackle (Read Purpose section for more information about it). I would like to hear more clear examples (potentially sharing some concrete examples based on real cases) where the Error Adapter would become a problem, and wouldn't work for you. |
You're right. As an adapter, this wouldn't have an effect on me unless I choose to use it. Even as an optional extension it would be opt-in. However, I think you know what I'm getting at here. You are proposing Your patronizing tone implies that you think I don't know what an Adapter is, but frankly, I don't think I'm the one that is confused. The intent of Adapters in the context of CloudEvents is to provide a mapping from various platform-specific event types to CloudEvents:
Is there a Yordis or straw-hat-team PAAS I am not aware of that produces errors which need to be adapted to CloudEvents for interoperability with other platforms? Even if there were, I would still suggest that you be more specific and say that this is a Ultimately what this amounts to is a suggestion of one possible way to represent your team's error needs and other hypothetical use cases using CloudEvents. My issue with it is that it is not concrete enough to adapt a specific event type ( I don't think this belongs here. The purpose of adapters is not to define new An Adapter takes a specific existing interface A and applies an algorithm to make it compatible with another specific interface B (and ideally in this case, the resulting BfromA has no loss of fidelity with respect to the original A). It is by definition concrete and not abstract. What an Adapter does not do is add additional behavior to B in the hopes that it might be close enough to some set of hypothetical As such that the resulting B+ can subsequently be adapted to those As. That is called extension, and is what you are actually proposing here, even if you are calling it an Adapter. An Adapter algorithm doesn't contain a comment like this: "You may use a Your stated purpose, by copying the purpose statement of CloudEvents with a few words changed, is to create a common format for representing generalized errors (As) by adding specific structure to CloudEvents (B) to create this new schema (B+). You want to create portability and interoperability for errors, and presumably you think this schema should be widely adopted because you've proposed it here rather than just keeping it within your team. Essentially, you are making the argument that from now on everyone should adapt their As to your B+. Except maybe not, because your response to my concerns about its lack of flexibility seems to be that I should simply ignore it, because it's optional and just an adapter. You even mentioned that you discussed or considered or tried proposing this as an extension first, which suggests to me that you know that this is - in fact - an extension that you are proposing. As I stated in my first comment, something that may be useful would be to instead provide actual adapters concretely showing how to map the JSON:API error structure and the Problem JSON error structure to CloudEvents. Those are well-known and (perhaps?) widely-used specs, one an entire, opinionated specification for building JSON APIs and the other a quite well-known RFC. In either case the obvious choice would be to map the flexible payload to the |
Since you asked, let's talk about the specific case I mentioned briefly above.
How exactly do I represent this with your schema? Particularly after your comment about constraining the metadata object to a Map of strings?
All of these applications are failure events that require both complex data and a schema, and that's not to mention the basic point that it's just damn useful to deliver a meaningful schema with your payloads, for data integrity validation and debugging purposes. In the end, it's not actually my job to decide what does or does not belong in this spec or its documented extensions and adapters, since I'm not a voting contributor. At this point I'll bow out and leave it up to the maintainers to decide what is best. |
all, I am running late in raising PR against @yordis's fork for JSON schema for error. Here are the schemas associated with event propagating an error. Imo, these should cover both validation errors (error in an event sent earlier that failed validation for example) as well as errors in a system that consumers might be interested in receiving. The errorinstance.json is derived from https://json-schema.org/draft/2019-09/output/schema. pl. review. thanks. my assumption is that error.json
errorinstance.json
linkdescription.json
|
@sdatspun2 hey, happy to see you here. I have some concerns based on your response.
|
fixes: cloudevents#741 Signed-off-by: Doug Davis <dug@us.ibm.com>
fixes: cloudevents#741 And mention the adpaters Signed-off-by: Doug Davis <dug@us.ibm.com>
Alternative to #741 - dealing with errors
fixes: cloudevents#741 And mention the adpaters Signed-off-by: Doug Davis <dug@us.ibm.com>
Purpose
Errors are everywhere. However, error's producers tend to describe errors
differently.
The lack of a common way of describing errors means developers are continually
re-learning how to consume errors. That also limits the potential for libraries,
tooling, and infrastructure to aid of error data across delivery environments,
like SDK or logger systems. The portability and productivity that can be
achieved from error data are hindered overall.
This ADR is a specification for describing error data in standard formats to
provide interoperability across services, platforms, and systems.