Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPDX documents don't validate for empty licenses #431

Closed
nishakm opened this issue Aug 30, 2019 · 6 comments · Fixed by #437
Closed

SPDX documents don't validate for empty licenses #431

nishakm opened this issue Aug 30, 2019 · 6 comments · Fixed by #437
Assignees
Labels
bug Something went wrong
Milestone

Comments

@nishakm
Copy link
Contributor

nishakm commented Aug 30, 2019

Describe the bug
SPDX documents no longer are validated using this tool: http://13.57.134.254/app/validate/

To Reproduce
Steps to reproduce the behavior:

  1. tern -l report -m spdxtagvalue -i golang:1.12.7 -f golang_1.12.7_spdx.txt
  2. Upload document to the site and click validate
  3. See error

Error in terminal

Unable to parse the file: File /var/www/spdxtools/spdx-online-tools/src/app/media/AnonymousUser/1567173789/golang_1.12.7_spdx.txt is not a recognized RDF/XML or tag/value format. While verifying for Tag/Value format: Error converting tag/value to RDF/XML format: Expecting a definition of a file, package, license information, or document property at License: GPLv2+ line number 564. While verifying for RDF/XML format: [line: 1, col: 1 ] Content is not allowed in prolog.

Expected behavior
This document should validate

Environment you are running Tern on

$ tern --version
Tern at commit 864a00ed00ee3fa1d768399a7c1cba6878859f94
   python version = 3.7.3 (default, Apr  3 2019, 19:16:38)
@nishakm nishakm added the bug Something went wrong label Aug 30, 2019
@nishakm nishakm added this to the Release 1.0.0 milestone Aug 30, 2019
@swinslow
Copy link
Contributor

Hi @nishakm @rnjudge, I will take a look at this in the next couple of days (away from my computer today) to see if I can reproduce too, and to add thoughts on the tag-value formatting.

@rnjudge
Copy link
Contributor

rnjudge commented Aug 30, 2019

@nishakm @swinslow For what it's worth, it doesn't validate output going all the way back to tern-0.4.0. I tried with spdxtagvalue output from the latest commit, tern-0.5.4 and tern-0.4.0 and got the same error for all 3 output files. So I would be curious if something changed on the spdx side of the validate app as well. Do either of you know the last time it was confirmed that Tern's spdxtagvalue output file was able to be validated? Something about the following line in all 3 cases is the root of the failure:

License: GPLv2+

@nishakm
Copy link
Contributor Author

nishakm commented Aug 30, 2019

This is the original document that finally passed validation: https://github.com/vmware/tern/pull/270/files

It still passes validation, so I think something regressed in the format. Maybe run tern on the same image and do a diff on the output file and this one.

@nishakm
Copy link
Contributor Author

nishakm commented Aug 30, 2019

@swinslow So the issue we found was that if there are no licenses found, then no license identifier list is generated. This is especially true with debian based images where there is no license metadata. How do we write valid SPDX documents when we can't find any and all of the mandatory tags?

@nishakm nishakm changed the title SPDX documents don't validate anymore SPDX documents don't validate for empty licenses Aug 30, 2019
@rnjudge
Copy link
Contributor

rnjudge commented Sep 3, 2019

The consensus from the SPDX call on 9/3 is that if there is no license metadata the report should use NOASSERTION, i.e.

LicenseID: NOASSERTION

rnjudge added a commit to rnjudge/tern that referenced this issue Sep 3, 2019
If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/). This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information. Prior
to this commit the values for PackageDeclaredLicense and LicenseID
would report with "LicenseRef-" and no actual license value.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 3, 2019
If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/). This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information. Prior
to this commit the values for PackageDeclaredLicense and LicenseID
would report with "LicenseRef-" and no actual license value.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 3, 2019
If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/). This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information. Prior
to this commit the values for PackageDeclaredLicense and LicenseID
would report with "LicenseRef-" and no actual license value.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 4, 2019
If no license metadata is found (i.e. debian-based images) Tern
does not generate reports that can be validated by SPDX
(http://13.57.134.254/app/validate/). This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information. Prior
to this commit the values for PackageDeclaredLicense and LicenseID
would report with "LicenseRef-" and no actual license value.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
@rnjudge
Copy link
Contributor

rnjudge commented Sep 4, 2019

@swinslow I opened a PR according to your suggestion in the SPDX call which fixes the original issue we were seeing related to missing license metadata. However, we still see an issue with validation for debian-based image reports. The issue now is related to how debian can sometimes version their packages with an epoch that uses a colon to separate the epoc from the upstream version/debian release. You can read more about it here but to summarize, their versioning looks like this:
[epoch:]upstream_version[-debian_revision].

In most cases, the epoch is omitted but in the case where it is present the colon that follows confuses the SPDX tag:value validation. The error we see from the validator is:
No external document ref found for SPDX ID SPDXRef-bsdutils.1:2.33.1-0.1. While verifying for RDF/XML format: [line: 1, col: 1 ] Content is not allowed in prolog

which we believe is coming from epoch:upstreamversion-debianrelease version format of bsdutils.

Any thoughts on how we should handle this? I don't see anything about this specific style of package versioning in the 2.1 spec. For Tern's use case, we could always trim the epoch but this particular case seems like something that might be helpful for others to see in the spec as well.

rnjudge added a commit to rnjudge/tern that referenced this issue Sep 4, 2019
Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 5, 2019
Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Sep 5, 2019
Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
nishakm pushed a commit that referenced this issue Sep 5, 2019
Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves #431

Signed-off-by: Rose Judge <rjudge@vmware.com>
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
Currently, if no license metadata is found (i.e. debian-based images)
Tern does not generate valid SPDX. An empty license field still reports
as "LicenseRef-". According to the 2.1 spec, if information about the
license is unknown, the value should be NOASSERTION.

This commit adds a few checks in
tern/formats/spdx/spdxtagvalue/generator.py to make sure that a license
value exists before trying to report the license information.

It also moves the get_package_id functionality originally in
tern/classes/package.py to a format in tern/formats/spdx/formats.py as
package_id is a value only utilized by SPDX format reports. Since
the get_package_id functionality was moved out of classes, the test for
this function was removed from the test_class_package test file.

tern/formats/spdx/spdxtagvalue/generator.py was updated to pull the
package_id info from spdx formats.py and has additional manipulation
to handle the case when a debian package is reported in the form
[epoch:]upstream_version[-debian_revision]. The colon after the epoch
needs to be changed to '-' in order to validate the SPDX report.

Additionally, this commit wraps the PackageCopyrightText value in
<text></text> in the case that the copyright statement is more than one
line per guidelines from the 2.1 spec.

Finally, this commit makes a change to the logic inside
update_license_list() that gets rid of the dangling license block at
the end of the report if no licenses are available from the container
image metadata.

Resolves tern-tools#431

Signed-off-by: Rose Judge <rjudge@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something went wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants