Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spdx: Handle reporting for empty license metadata #437

Merged
merged 1 commit into from
Sep 5, 2019

Conversation

rnjudge
Copy link
Contributor

@rnjudge rnjudge commented Sep 3, 2019

Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves #431

Signed-off-by: Rose Judge rjudge@vmware.com

@rnjudge rnjudge force-pushed the issue-431-spdx-validate branch 3 times, most recently from 9693e40 to 572258d Compare September 4, 2019 05:32
@nishakm
Copy link
Contributor

nishakm commented Sep 4, 2019

If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/).

Suggestion: "does not generate valid SPDX" and then list the reasons per SPDX spec 2.1

@nishakm
Copy link
Contributor

nishakm commented Sep 4, 2019

I found a couple of fixes for just the debian image that will pass validation:

This one gets rid of the SPDXRef error

def get_package_spdxref(package_obj):
     '''Given the package object, return an SPDX reference ID'''
-    return 'SPDXRef-{}'.format(package_obj.get_package_id())
+    return 'SPDXRef-{}'.format(
+        package_obj.get_package_id().replace(':', '-', 1))

This one gets rid of the dangling license block at the end:

@@ -130,7 +131,7 @@ def update_license_list(license_list, license_string):
     string is not in the list, add it'''
     licenses = get_package_licenses(license_string)
     for l in licenses:
-        if l not in license_list:
+        if l and l not in license_list:
             license_list.append(l)

Meanwhile, I have uncovered a bug in the debian based golang image. I'll file a bug on that one.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 4, 2019

I found a couple of fixes for just the debian image that will pass validation:

This one gets rid of the SPDXRef error

def get_package_spdxref(package_obj):
     '''Given the package object, return an SPDX reference ID'''
-    return 'SPDXRef-{}'.format(package_obj.get_package_id())
+    return 'SPDXRef-{}'.format(
+        package_obj.get_package_id().replace(':', '-', 1))

Can you imagine any situation where changing the : to - might misrepresent the version for users who want to extract the version info directly from the Tern SPDX document? Or are we not worried about potential corner cases like that right now?

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 4, 2019

If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/).

Suggestion: "does not generate valid SPDX" and then list the reasons per SPDX spec 2.1

I updated the commit message.

@nishakm
Copy link
Contributor

nishakm commented Sep 4, 2019

Can you imagine any situation where changing the : to - might misrepresent the version for users who want to extract the version info directly from the Tern SPDX document? Or are we not worried about potential corner cases like that right now?

The package version is whatever the package manager says it is. So in dpkg's case, that's the full debian version with epoch and everything. This is just for package id. The problem with changing this in the class's get_package_id method is that, as far as I can see, Debian is the only one that uses the :. Perhaps a more long term solution would be to make get_package_id an SPDX format specific thing (as it is the only format that asks for a reference id for a package).

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 4, 2019

Can you imagine any situation where changing the : to - might misrepresent the version for users who want to extract the version info directly from the Tern SPDX document? Or are we not worried about potential corner cases like that right now?

The package version is whatever the package manager says it is. So in dpkg's case, that's the full debian version with epoch and everything. This is just for package id. The problem with changing this in the class's get_package_id method is that, as far as I can see, Debian is the only one that uses the :. Perhaps a more long term solution would be to make get_package_id an SPDX format specific thing (as it is the only format that asks for a reference id for a package).

@nishakm Ah, I was confusing the package ID and package version. In order to determine where should do the actual : to - manipulation, I have one more question:

If we supported other SPDX formatting in the future, would the colon after the epoch be an issue in the SPDXRef-* value or is the issue isolated to tag:value validation? If all SPDX formatting would have an issue with the colon inside the SPDXRef-* value, I think we should do the string manipulation inside get_package_id. If the colon is only an issue for spdxtagvalue reporting, then I think we should localize the manipulation inside the respective plugin.

Thoughts?

@nishakm
Copy link
Contributor

nishakm commented Sep 4, 2019

If we supported other SPDX formatting in the future, would the colon after the epoch be an issue in the SPDXRef-* value or is the issue isolated to tag:value validation?

I think it is the tag-value parser. There is no space between the strings and the : so I don't see why the parser can't handle it. However, for unblocking us on validation, such as it is, I would rather have this change.

If all SPDX formatting would have an issue with the colon inside the SPDXRef-* value, I think we should do the string manipulation inside get_package_id. If the colon is only an issue for spdxtagvalue reporting, then I think we should localize the manipulation inside the respective plugin.

Thoughts?

Perhaps there is a different way of creating a "package identifier" for documents that can be constant for all types of formats. If not, I would rather if this whole function get removed from the class and moved to the formats.

@rnjudge
Copy link
Contributor Author

rnjudge commented Sep 5, 2019

Thoughts?

Perhaps there is a different way of creating a "package identifier" for documents that can be constant for all types of formats. If not, I would rather if this whole function get removed from the class and moved to the formats.

As was discussed on the Tern slack channel, if SPDX formats are the only formats requiring a unique package identifier we should move the get_package_id functionality to tern/formats/spdx/formats.py. I pushed a change for you to take a look at that also removes the get_pacakge_id code in tern/classes/package.py.

Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
Copy link
Contributor

@nishakm nishakm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validator approves!

@nishakm nishakm merged commit 026e0a7 into tern-tools:master Sep 5, 2019
@rnjudge rnjudge deleted the issue-431-spdx-validate branch October 25, 2019 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SPDX documents don't validate for empty licenses
2 participants