Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[integration] Add support of Graal's CoLic Backend to ELK #9

Closed
inishchith opened this issue Jun 19, 2019 · 9 comments
Closed

[integration] Add support of Graal's CoLic Backend to ELK #9

inishchith opened this issue Jun 19, 2019 · 9 comments
Labels
coding-period-two task completed during coding period #2 high-priority urgency, due for a long time ready work completed, minor meta related work left

Comments

@inishchith
Copy link
Owner

inishchith commented Jun 19, 2019

This thread addresses adding support of Graal's CoLic Backend and metrics visualization to ELK and Mordred.


@inishchith inishchith added high-priority urgency, due for a long time coding-period-two task completed during coding period #2 labels Jun 19, 2019
@inishchith inishchith added this to the 📈 Second Evaluation milestone Jun 19, 2019
@inishchith inishchith pinned this issue Jun 19, 2019
@inishchith inishchith unpinned this issue Jun 21, 2019
@inishchith
Copy link
Owner Author

inishchith commented Jun 26, 2019

Data produced by CoLic

  • Category: code_license_scancode

 "analysis": [
            {
                "file_path": "tests/base_analyzer.py",
                "licenses": [
                    {
                        "category": "Copyleft",
                        "end_line": 18,
                        "homepage_url": "http://www.gnu.org/licenses/gpl-3.0-standalone.html",
                        "is_exception": false,
                        "key": "gpl-3.0-plus",
                        "matched_rule": {
                            "identifier": "gpl-3.0-plus_12.RULE",
                            "is_license_notice": true,
                            "is_license_reference": false,
                            "is_license_tag": false,
                            "is_license_text": false,
                            "license_expression": "gpl-3.0-plus",
                            "licenses": [
                                "gpl-3.0-plus"
                            ]
                        },
                        "name": "GNU General Public License 3.0 or later",
                        "owner": "Free Software Foundation (FSF)",
                        "reference_url": "https://enterprise.dejacode.com/urn/urn:dje:license:gpl-3.0-plus",
                        "score": 98.2,
                        "short_name": "GPL 3.0 or later",
                        "spdx_license_key": "GPL-3.0-or-later",
                        "spdx_url": "https://spdx.org/licenses/GPL-3.0-or-later",
                        "start_line": 6,
                        "text_url": "http://www.gnu.org/licenses/gpl-3.0-standalone.html"
                    }
                ]
            },
    ...........
]


  • Category: code_license_scancode_cli
"analysis": [
                {
                   "authors": [
                        {
                            "end_line": 20,
                            "start_line": 19,
                            "value": "Valerio Cosentino <valcos@bitergia.com>"
                        }
                    ],
                    "base_name": "codecomplexity",
                    "copyrights": [
                        {
                            "end_line": 3,
                            "start_line": 3,
                            "value": "Copyright (c) 2015-2018 Bitergia"
                        }
                    ],
                    "date": "2019-06-26",
                    "dirs_count": 0,
                    "extension": ".py",
                    "file_path": "graal/codecomplexity.py",
                    "file_type": "Python script, ASCII text executable",
                    "files_count": 0,
                    "holders": [
                        {
                            "end_line": 3,
                            "start_line": 3,
                            "value": "Bitergia"
                        }
                    ],
                    "is_archive": false,
                    "is_binary": false,
                    "is_media": false,
                    "is_script": true,
                    "is_source": true,
                    "is_text": true,
                    "license_expressions": [
                        "gpl-3.0-plus"
                    ],
                    "licenses": [
                        {
                            "category": "Copyleft",
                            "end_line": 17,
                            "homepage_url": "http://www.gnu.org/licenses/gpl-3.0-standalone.html",
                            "is_exception": false,
                            "key": "gpl-3.0-plus",
                            "matched_rule": {
                                "identifier": "gpl-3.0-plus_12.RULE",
                                "is_license_notice": true,
                                "is_license_reference": false,
                                "is_license_tag": false,
                                "is_license_text": false,
                                "license_expression": "gpl-3.0-plus",
                                "licenses": [
                                    "gpl-3.0-plus"
                                ],
                                "match_coverage": 98.2,
                                "matched_length": 109,
                                "matcher": "3-seq",
                                "rule_length": 111,
                                "rule_relevance": 100
                            },
                            "matched_text": "This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 3 [of] [the] [License], or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License\n# along with this program; if not, write to the Free Software\n# Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-",
                            "name": "GNU General Public License 3.0 or later",
                            "owner": "Free Software Foundation (FSF)",
                            "reference_url": "https://enterprise.dejacode.com/urn/urn:dje:license:gpl-3.0-plus",
                            "score": 98.2,
                            "short_name": "GPL 3.0 or later",
                            "spdx_license_key": "GPL-3.0-or-later",
                            "spdx_url": "https://spdx.org/licenses/GPL-3.0-or-later",
                            "start_line": 5,
                            "text_url": "http://www.gnu.org/licenses/gpl-3.0-standalone.html"
                        }
                    ],
                    "md5": "aa66e700b06ead2a28c2dc29633ebc00",
                    "mime_type": "text/x-python",
                    "name": "codecomplexity.py",
                    "path": "codecomplexity.py",
                    "programming_language": "Python",
                    "scan_errors": [],
                    "sha1": "124e07ae6c850eb232aaf07f43cdb2b2ad2a1db1",
                    "size": 7817,
                    "size_count": 0,
                    "type": "file"
                }
            ],
          ....... 
}
....
]

@valeriocos
Copy link

For an initial iteration we could focus on the attributes file_path and the licenses.licenses list. What do you think @inishchith ?

Should we focus also on the other colic analyzer (nomossa), although it is faster, but less precise?

@inishchith
Copy link
Owner Author

For an initial iteration we could focus on the attributes file_path and the licenses.licenses list. What do you think @inishchith ?

Yes. We can start with licenses list.

Should we focus also on the other colic analyzer (nomossa), although it is faster, but less precise?

I had some problems executing it, hence couldn't post the results here.
I'll have a look again and let you know.

@inishchith inishchith added the in-progress currently being worked on label Jun 29, 2019
@inishchith
Copy link
Owner Author

inishchith commented Jul 3, 2019

@valeriocos I'm currently considering results of code_license_scancode_cli category and I think we won't require repository-level analysis to be added (as in CoCom). I have produced some results (you can check them with colic_*)

I feel the idea of module-level proposed here would be of help. WDYT?

@inishchith inishchith pinned this issue Jul 3, 2019
@valeriocos
Copy link

Good idea @inishchith !

I've just checked the data in the incubator instance and things look nice! Maybe we should mark the files that don't have a license, a solution could be to add the attribute has_license to the enriched indexes, which can be 1 or 0 (depending if the file has a license or not). Then, we could create a visualization to show the evolution of the licensed vs non-licensed files, WDYT?

A couple of minor comments about the data at: https://grimoirelab-incubator.biterg.io/data/colic_enrich_graal_file/_search?pretty:

  • The attribute licenses shouldn't be a list instead of a string ("licenses" : "gpl-3.0-plus",)?
  • is_colic_hits" : 1, should probably be is_colic_license

@inishchith
Copy link
Owner Author

inishchith commented Jul 3, 2019

I've just checked the data in the incubator instance and things look nice! Maybe we should mark the files that don't have a license, a solution could be to add the attribute has_license to the enriched indexes, which can be 1 or 0 (depending on if the file has a license or not). Then, we could create a visualization to show the evolution of the licensed vs non-licensed files, WDYT?

Sounds Good. I'll work on this. Thanks

A couple of minor comments about the data at: https://grimoirelab-incubator.biterg.io/data/colic_enrich_graal_file/_search?pretty:

  • The attribute licenses shouldn't be a list instead of a string ("licenses" : "gpl-3.0-plus",)?

Yes, It should be. I had a rough sketch of enricher just to see how things proceeded, I'll working on a better version (check below)

  • is_colic_hits" : 1, should probably be is_colic_license

I think this field is not added by us, it's added by the enricher (Please do correct me in case i've missed something). The cocom enriched index has a field is_cocom_file.


What do you think about the category of CoLic analyzer to be used?
For the current viz. I've used code_license_scancode_cli and thought to insert an item per file(as discussed for CoCom) and have thought that the enriched field licenses would consist of ["licenses"][i]["name"]. WDYT about this?

Also, I noticed that the structure of analysis(for code_license_scancode_cli ) is different(dict) as compared to other categories (list). I'll be up with a quick fix. Sorry for the mistake, we missed this out during the initial addition IMO.

@valeriocos
Copy link

Thank you for the quick reply @inishchith

I think this field is not added by us, it's added by the enricher (Please do correct me in case i've missed something). The cocom enriched index has a field is_cocom_file.

hits is added here: https://github.com/inishchith/grimoirelab-elk/blob/gsoc-graal-2019-colic/grimoire_elk/enriched/colic.py#L75. If we change it, we will get something different from is_colic_hits

What do you think about the category of CoLic analyzer to be used?

Probably the enricher should make transparent the category used. So the output of the enricher should be always the same no matter the category used . WDYT?

@inishchith
Copy link
Owner Author

@valeriocos Thanks for the quick response.

Thanks for pointing the correction in is_colic_hits part.

Probably the enricher should make transparent the category used. So the output of the enricher should be always the same no matter the category used. WDYT?

Yes. Hence, I think we need to fix the output for code_license_scancode_clias mentioned above. i

@inishchith
Copy link
Owner Author

Closing in reference to completion of the task. (reference)

@inishchith inishchith unpinned this issue Aug 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coding-period-two task completed during coding period #2 high-priority urgency, due for a long time ready work completed, minor meta related work left
Projects
None yet
Development

No branches or pull requests

2 participants