Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking platform dependencies #184

Merged
merged 8 commits into from
Apr 5, 2021
Merged

Conversation

mthalman
Copy link
Member

This is the proposed design for dotnet/core#5646 within the dotnet/core#5651 epic.

This is a cross-cutting proposal that impacts all product teams. I've tried to include representatives across the board as reviewers but feel free to include others you feel are missing.


```console
dotnet-deps platform remove debian.10
```
Copy link
Member

@jkotas jkotas Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenarios that this design is trying to improve are not unique to .NET. How are other platforms similar to .NET solving this? Is there something we can learn from solutions used by others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at Node.js and Python so far and have not come across anything formalized like this.

@omajid - Have you or anyone else at RedHat come across any other platforms that do anything similar to this?

@tianon - In your role maintaining Docker Hub's official images, have there been any platforms out there that have created a standalone, machine-readable description of the platform's package dependencies such that it can be used to maintain the packages installed by the Dockerfile? Or has it always just been the Dockerfile being the source of truth? To give you a very brief summary of what is being proposed here for .NET, there would be a JSON file which formalizes .NET's platform dependencies, including Linux packages. That JSON file could be used to do transformations into or validation of Dockerfiles, as an example. The intention is to have a standalone, independent source of truth from which other assets could be derived.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really ambitious -- I'm not aware of anyone trying to standardize this sort of metadata.

If I'm understanding correctly, you're trying to come up with a standard identifier for things like libargon2-dev in Debian/Ubuntu vs argon2-dev in Alpine etc?

I think a really big challenge you'll likely run into is that these things aren't always one-to-one mapped -- there are many cases where distributions provide multiple variants (see the -dev packages in https://packages.debian.org/source/sid/curl for example) or even split different binaries across packages differently. In addition, these things tend to move between packages over time too (like how Debian's btrfs-progs used to also contain the development headers until the dedicated libbtrfs-dev package was introduced). Even the "upstream project" often gets split differently in different distros (where some have a single source package and others will end up with multiple source packages representing the same thing).

A different approach to trying to identify/recognize all the explicit packages might be to identify commands and specific header/pkg-config files necessary, but even that's going to be a bit of guesswork (and if it isn't automated to some extent, will struggle with bitrot), and that list isn't always straightforward to come up with, even for a human (like how difficult it would be to determine in an automated way that a program ends up invoking git at runtime, or that it uses dlopen to load a shared library dynamically).

There's also going to be different classes of dependencies -- obvious ones are build vs runtime, but even at runtime there are some dependencies that are more "required" than others (see https://www.debian.org/doc/debian-policy/ch-relationships.html for an example of how Debian handles this with Depends vs Recommends vs Suggests etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you or anyone else at RedHat come across any other platforms that do anything similar to this?

I can try asking around, but I have not seen anything similar to this. Most upstreams are less disciplined than this, I think 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tianon

If I'm understanding correctly, you're trying to come up with a standard identifier for things like libargon2-dev in Debian/Ubuntu vs argon2-dev in Alpine etc?

Not really a standard identifier. There's no attempt to map 1:1 between distros or anything like that. Package dependencies can be defined for each distro independently. It's really no different than the dependencies that are described in documentation form, it's just that they're defined in a machine-readable form using a common schema.

There's also going to be different classes of dependencies -- obvious ones are build vs runtime

Yes, that's accounted for here by having separate models between them.

but even at runtime there are some dependencies that are more "required" than others (see https://www.debian.org/doc/debian-policy/ch-relationships.html for an example of how Debian handles this with Depends vs Recommends vs Suggests etc).

This is sort of addressed by the dependency usage concept. It's less prescriptive than the depends/recommends/suggests model, allowing for the dependency to be associated with specific dev or app scenarios.

Comment on lines +221 to +224
* default: Indicates that the dependency applies to a canonical app scenario (i.e. Hello World). Note that this is specifically about a canonical app and not intended to be a description of required dependencies in the absolute sense. An example of this is libicu. While libicu is not required to run an app if it has been set to use invariant globalization, the default/canonical setting of a .NET app is that invariant globalization is set to false in which case libicu is necessary. This is why the term "default" is used rather than something like "required".
* diagnostics: Indicates the dependency should be used in scenarios where diagnostic tools are being used such as with LTTng-UST.
* httpsys: Indicates the dependency should be used for ASP.NET Core apps that are configured to use the HTTP.sys web server.
* localization: Indicates the dependency should be used for localization/globalization scenarios such as with tzdata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will scale and at the same time, it's not granular enough. I can imagine a model where each NuGet can describe its dependencies as more maintainable and scalable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made an update so that these usages are self-described within the model: 8c52259. This adds a mapping at the root of the model to define all the available dependency usages so that there can still be model validation and a way to provide a description of the value. This will allow teams to add more usages as they need and confine the updates to just the model file.


### Goals

* Common schema capable of describing both runtime and toolchain dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do runtime and toolchain mean in this context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a definitions section which clarifies these terms: e388195

* Common schema capable of describing both runtime and toolchain dependencies.
* Model that limits repetition to make maintenance easier.
* File format that is machine-readable to allow for automated transformation into other output formats.
* Ability to describe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this apply only to shared framework model or also bundled/self-contained model?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the framework-dependent/self-contained deployment models? If so, this applies to both because in either model the platform dependencies are still required. Self-contained deployment does not statically link these dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-contained deployment does not statically link these dependencies.

That's not correct, some of them do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more details? I'd like to know more about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, Blazor is a self-contained deployment setup with only statically linked dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying. In that case, the browser or browser-wasm RIDs would be just another platform described in this model and not contain these dependencies or whatever would be appropriate.


### Non-Goals

* Any dependency that is included in the deployment of the application is outside the scope of this design. NuGet packages are an example of this. A .NET application's dependency on a .NET package, and any assets contained in those packages (managed, native, or otherwise), is explicitly included in the deployment of the application itself. Therefore, the operating environment is not required to be pre-configured with those specific assets; it'll get them naturally through the deployment of the application. However, a NuGet package may have its own platform dependency (e.g. a Linux package) that is not physically contained in the NuGet package; such a platform dependency would be in scope with this design. The concern addressed here is solely focused on what the operating environment must be pre-configured to contain in order to operate on .NET scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a bit word tweaking. You are not excluding NuGet package dependencies in the design as they are the core building block of any app and runtime itself ships many native libraries with native dependencies as OOB NuGets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded this to be more accurate: 1a92592

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO one of the reasons for having this information is to ensure that we are meeting all the constraints of our system. For example, changes to minimal cmake versions need source build considerations. Where in this model do we document why the min versions of all our dependencies are they way they are? Having this will allow maintainers to more effectively understand how to change them/why they are they way they are.


##### Change Detection

In order to avoid the reliance upon contributors to recognize when they've changed a dependency, a more automated solution would be preferable. This can be done by defining a GitHub bot that checks for files in PRs containing `NativeMethods` or `Interop` in their name. If such a file is detected, a label is added to the PR alerting the submitter that they should evaluate their changes for changes to the dependencies. While not a fool-proof solution, it should provide coverage for the vast majority of dependencies. Work is still required by the submitter to make the appropriate changes to the platform dependency model but the GitHub bot helps alert them when there is potentially action that is needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is sufficient. To help keep dependency tracking live we will need to invest more deeply in dependency identification. Also how would we ensure that toolchain dependencies are being identified automatically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine something like - run all the tests under ltrace or similar, and compare all the dlopen's seen to a baseline list. When a new one is seen, it's an indication that someone should check whether a json update is needed, then update the baseline.

It seems to me that isn't part of 'crawl' but maybe 'run'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the aspect of this proposal that will require the most iteration and refinement over time. I don't want to over-invest in this, however. I fully admit that the proposed change detection isn't foolproof. But it's a fairly cheap and non-disruptive solution to get started with. I think as we learn of other areas that require detection, we can work to address those as needed.

I do like the suggestion of @danmoseley for an advanced implementation of detection logic. This could apply to both runtime and toolchain dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthalman do you believe it's important for this dependency record to be "complete" (in some sense) or is it valuable even if it's not? Eg., if you're using it as a recipe for setting up a container to publish, it needs to be complete. But if you're using it to inform a checker that might help you discover a missing dependency, maybe it does not.
From the discussion, this seems like a hard problem in general, and achieving and maintaining completeness especially through transitive dependencies may not be feasible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be complete to the extent that we're aware of the dependencies. The model is intended to be used as input to other dependent artifacts like Dockerfiles, as you suggest (see phases 2 & 4 in dotnet/core#5651 for others). I don't see the ability to know and describe the dependencies as being unfeasible. Indeed, it better not be; otherwise, what are we documenting for customers? The change detection aspect is certainly a challenge and I feel should be scoped, at least to begin with, to be able to detect most dependency changes rather than all.

My view is that we treat the dependency model as a work in progress for the lifetime of the .NET version it's associated with. We do our best effort to define the model and if we discover something later on several releases later, we can just go back and edit it to accurately reflect what is known. The benefit of having downstream assets consuming this model data is that it acts a form of validation. If the Dockerfiles are synced with the dependency model and something doesn't work in the container because it's missing a dependency, then we know the model isn't correct. I think we just keep iterating on it until it's right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, somehow I had thought you were trying to describe the closure, ie., including transitive dependencies. Yes, just describing direct dependencies is surely doable.

@jkotas
Copy link
Member

jkotas commented Feb 26, 2021

For example, changes to minimal cmake versions need source build considerations. Where in this model do we document why the min versions of all our dependencies are they way they are? Having this will allow maintainers to more effectively understand how to change them/why they are they way they are.

+1

Raising minimum tool versions often requires weighting trade-offs between pain caused by staying with the old version and pain caused by requiring new minimum version. For example, we have different minimum cmake versions for different build configurations today because we found that aggressively raising the minimum version accross the board is going to incur too much pain in aggregate.

We should be able to learn how situations like this are handled in other ecosystems. For example, here is a recent discussion on raising minimum cmake version in LLVM: https://lists.llvm.org/pipermail/llvm-dev/2020-April/subject.html#140578

@mthalman mthalman requested a review from leecow March 1, 2021 18:56
@Pilchie
Copy link
Member

Pilchie commented Mar 1, 2021

I'm not aware of any platform dependencies that ASP.NET Core has here, with the possible exception of the Http.sys/IIS components in Windows.

@danmoseley danmoseley requested a review from ericstj March 2, 2021 01:35
@knuxbbs
Copy link

knuxbbs commented Mar 9, 2021

Is this proposal related to workload manifests (#120)?

For instance, how to track platform dependencies for a mobile workload?

* Model that limits repetition to make maintenance easier.
* File format that is machine-readable to allow for automated transformation into other output formats.
* Ability to describe:
* Dependencies for multiple platform types (Linux, Windows, MacOS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about more specifics in that platform?

For example, .NET Core 2.0 only supported using OpenSSL 1.0.x. .NET Core 2.1 got additional fixes that let it also use OpenSSL 1.1.y. It would be great it we could express things like that in this manifest: needs-one-of (OpenSSL 1.0, OpenSSL 1.1).

I am also wondering what happens across builds. A portable build of .NET 5 SDK that bundles dependencies will have fewer build and runtime requirements than a non-portable build of .NET 5 SDK. The non-portable build will, for example, need the matching version of OpenSSL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, .NET Core 2.0 only supported using OpenSSL 1.0.x. .NET Core 2.1 got additional fixes that let it also use OpenSSL 1.1.y. It would be great it we could express things like that in this manifest: needs-one-of (OpenSSL 1.0, OpenSSL 1.1).

I've redesigned the schema to support this. Dependency names can now be described as an expression with logical OR operators. Take a look: b7833d0

I am also wondering what happens across builds. A portable build of .NET 5 SDK that bundles dependencies will have fewer build and runtime requirements than a non-portable build of .NET 5 SDK. The non-portable build will, for example, need the matching version of OpenSSL.

Since that would have been done by a third party and customized to suit their desired configuration, it's not really possible to describe that here, nor is it really relevant. The intent is to describe the dependencies of the assets distributed by Microsoft. And as I mention here, the model description has no impact on the functionality of .NET that has been built/distributed external of Microsoft.

// A shared framework (e.g. Microsoft.NETCore.App, Microsoft.AspNetCore.App)
SharedFramework,

// A NuGet package (e.g. System.Drawing.Common)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are calling out System.Drawing.Common here, I can't help but wonder what that will look like. AFAIAA, System.Drawing.Common needs libgdiplus, but libgdiplus is neither a build-time nor a runtime dependency, unless you are explicitly using System.Drawing.Common at runtime. Does it fit in with the default type or does this need something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you are explicitly using System.Drawing.Common at runtime

That's the key. The dependency would specifically be tied to the System.Drawing.Common NuGet package. So if you're referencing System.Drawing.Common at runtime, then you'll require libgdiplus for the canonical scenario. In that case, the type would be default.

Here's a snippet of what that would look like:

{
  "platforms":
  {
    "rid": "debian",
    "components": [
      {
        "name": "System.Drawing.Common",
        "type": "NuGetPackage",
        "platformDependencies": [
          {
            "name": "libgdiplus",
            "dependencyType": "LinuxPackage",
            "usage": "default"
          }
        ]
      }
    ]
  }
}


#### New Platform Support

When support for a new platform is added to the product, the platform dependency model of future releases needs to be updated include this platform and all its supported versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any impact on unsupported platforms? For example, we are trying to get source-build to work with Arch Linux. If those folks get .NET building on Arch, will they be affected if we start tracking dependencies here? Will anything suddenly start breaking for them? And will it be okay if we don't want to track Arch Linux here if we dont want to "officially support" it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing would break. This is purely a representation of what dependencies exist; it doesn't dictate what those dependencies must be. The flow of information would always be that the maintainers of the .NET source define what dependencies they want to have, and then the dependency model gets updated to reflect that. There would be no use of the dependency model at runtime or build time (i.e. it has no impact on the running of a .NET application or of the building of .NET source). Its purpose is to add rigor and correctness to the maintenance of other assets that describe what the dependencies are (such as documentation).


Each component describes the dependencies it has for the platform it is contained within. A dependency is identified by its name and type (Linux distro package, DLL).

A key piece of metadata that gives context to the dependency is the "usage" field. This field is set to one of the well-known values that describes the scenario in which this dependency applies. Here are some examples of usage values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about build vs runtime dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an update to address this, including an example: 8b0ced7. Hopefully that makes sense.

I identified toolchain dependencies as a problem to be solved but didn't actually define the model to account for such dependencies. These updates include a git repository as a component type. This allows toolchain dependencies to be described for each .NET git repo.  An example model is included.  I also included a paragraph on where toolchain dependency models would be stored.  I think it makes more sense to store them within each repo rather than combining everything in the dotnet/core repo like the runtime models.
@mthalman
Copy link
Member Author

For example, changes to minimal cmake versions need source build considerations. Where in this model do we document why the min versions of all our dependencies are they way they are? Having this will allow maintainers to more effectively understand how to change them/why they are they way they are.

+1

Raising minimum tool versions often requires weighting trade-offs between pain caused by staying with the old version and pain caused by requiring new minimum version. For example, we have different minimum cmake versions for different build configurations today because we found that aggressively raising the minimum version accross the board is going to incur too much pain in aggregate.

We should be able to learn how situations like this are handled in other ecosystems. For example, here is a recent discussion on raising minimum cmake version in LLVM: https://lists.llvm.org/pipermail/llvm-dev/2020-April/subject.html#140578

@jkotas, @jeffschwMSFT - I'm going to throw out some options here. Let me know what you think is reasonable or feel free to suggest alternatives. My assumption here is that any sort of description will need to be free-form text, not fitting to any schema.

1. Include the description within the model JSON file as general purpose "dependency notes"

Pros:

  • Notes are versioned together with the model file.
  • Edits are reviewable via PRs

Cons:

  • Readability and maintainability suffers by working with free-form text in a file format that is primarily intended for machine-readability.

2. Define description in a GitHub issue with the URL referenced by the dependency in the model.

Pros:

  • Provides the benefits of GitHub issues (e.g. conversation-style timeline, people tagging, etc).

Cons:

  • Not versioned. Are there separate issues between different .NET releases for the same dependency?
  • Edits are not reviewable via PRs

3. Define description in a Markdown file in the relevant GitHub repo with the commit URL referenced by the dependency in the model.

Pros:

  • Versionable, provides history over time
  • Edits are reviewable via PRs

Cons:

  • Versions independently of the model file. Requires updating the model's commit URL reference with any change to the Markdown file.

@jkotas
Copy link
Member

jkotas commented Mar 16, 2021

My preference would be combination of 1 and 2. I think it is useful to have space for a short free-form comment in the model file. If there is too much to say, this comment can include links to github issues, documentation, etc.

@mthalman
Copy link
Member Author

@knuxbbs -

Is this proposal related to workload manifests (#120)?

It's unrelated to #120. Workloads are a set of references to SDK packs and workload manifests are descriptions of those workloads. Workarounds are not concerned with, nor do they provide, the set of platform dependencies necessary to run in the target environment.

For instance, how to track platform dependencies for a mobile workload?

I'll rephrase this as "how to track platform dependencies for a mobile platform". The schema supports this by the use of a RID to describe the platform. In the case of mobile platforms, there are RIDs for Android and iOS, for example. So it would just be another platform described in the model.

The primary motivation for this is to have a file format that supports comments. This satisfies the need to be able to include notes about the dependencies such as reasons for the minimum version.  See dotnet#184 (comment)
This reverts commit f55c61e.

I decided to revert this back to JSON for the simple reason of consistency with the other file formats being used throughout .NET engineering (e.g. releases.json, runtime.json for RIDs, etc).  The main motivation for YAML was to have native support for comments.  JSON can still use comments but is technically unsupported in the standard.  This limits the general accessibility of the file but I would say that the vast majority of consumers would use the .NET library for interacting with the model due to the logic needed to interpret it.
@mthalman
Copy link
Member Author

All current feedback has been addressed. This is ready for further review.

@mthalman
Copy link
Member Author

Please provide any remaining feedback by this Friday. I'd like to have this merged next week.

@mthalman mthalman merged commit 7de94a2 into dotnet:main Apr 5, 2021
@mthalman mthalman deleted the platform-deps branch April 5, 2021 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants