Improvements to RO-Crate in Python tutorial #4850

elichad · 2024-03-19T15:51:48Z

Hello again GTN team, a new job has brought me back :)

Suggesting these changes after following the tutorial myself:

updated output of get_entities() because ./ Dataset always appears first
added "name" property to Donald Duck because if you re-export the crate and then continue with the tutorial, the name can appear as None when you print authors in a later section causing confusion
switched bash commands to Python where possible so it's easier to follow along with the auto-generated notebook
added explanation of what happens when including the exp/logs dataset in the crate
added extra CLI commands

One question:
The command-line interface section has the student clone an entire repository just to get one folder of test data to run rocrate on. I think it'd be better to host this test data in another repository by itself or as a downloadable zip file, which would simplify the commands needed - I'm happy to update the tutorial accordingly but what's the best approach (or most common in the GTN) to storing the data? Zenodo?

Requesting feedback from @simleo as a maintainer of the rocrate package

bgruening · 2024-03-19T16:58:38Z

Hello again GTN team, a new job has brought me back :)

I saw you recently on some registration list and was like --- "What can this be, is Eli back?" --- so happy to see you again! :)

hexylena

Welcome back @elichad! These are good improvements, thanks for making them. I've pinged the contributors in slack as well.

hexylena · 2024-03-20T09:29:58Z

topics/fair/tutorials/ro-crate-in-python/tutorial.md


 In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level. This file contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html). A contextual entity can represent, for instance, a person, an organization or an event.

-Suppose Alice and Bob worked on a research project together, and then wrote a paper about it; additionally, Alice prepared a spreadsheet containing experimental data, which Bob then used to generate a diagram. For the purpose of this tutorial, you can just create dummy files for the documents:
+Suppose Alice and Bob worked on a research project together, and then wrote a paper about it; additionally, Alice prepared a spreadsheet containing experimental data, which Bob then used to generate a diagram. For the purpose of this tutorial, you can just create placeholder files for the documents:


we should add linting for this, good change.

hexylena · 2024-03-20T09:31:39Z

topics/fair/tutorials/ro-crate-in-python/tutorial.md

-```bash
-mkdir exp/logs
-touch exp/logs/log1.txt
-touch exp/logs/log2.txt


I think these should also work with ! prefixes, but, the python version is more pythonic

hexylena · 2024-03-20T09:33:20Z

topics/fair/tutorials/ro-crate-in-python/tutorial.md

-## Command Line Interface
+> <comment-title>Jupyter Notebook users: switch to a terminal</comment-title>
+> The code cells in this section use Unix shell commands, which can't be run within a notebook. Open a Unix/Linux terminal to follow along.
+{: .comment}

 `ro-crate-py` includes a hierarchical command line interface: the `rocrate` tool. `rocrate` is the top-level command, while specific functionalities are provided via sub-commands. Currently, the tool allows to initialize a directory tree as an RO-Crate (`rocrate init`) and to modify the metadata of an existing RO-Crate (`rocrate add`).


We could consider splitting the tutorial in two here, and then use a bash notebook for the second half with the cli. @simleo @stain what do y'all think?

I'm not sure. One alternative would be to add leading "!" to the commands, but then it will read too jupyter-specific. Would the split involve spreading the tutorial over two separate pages? That would make it a bit harder to follow.

hexylena · 2024-03-20T09:36:23Z

topics/fair/tutorials/ro-crate-in-python/tutorial.md

@@ -311,7 +331,7 @@ Commands:

 Note that data entities (e.g., workflows) must already be present in the directory tree: the effect of the command is to register them in the metadata file.

-### Example
+## Example

 To run the following commands, we need a copy of the ro-crate-py repository:


The command-line interface section has the student clone an entire repository just to get one folder of test data to run rocrate on. I think it'd be better to host this test data in another repository by itself or as a downloadable zip file, which would simplify the commands needed - I'm happy to update the tutorial accordingly but what's the best approach (or most common in the GTN) to storing the data? Zenodo?

given that this test case hasn't changed in four years, ok, that's reasonable. If it'd been modified more recently I'd say let's stick with the github clone just to ensure it stays fresh and doesn't need to be updated separately.

Zenodo is the right choice, yes. Please create a record, add @simleo's as a contributor to it (his information is in contributors.yaml of course), and add it to the GTN collection if you could!

I think a separate Zenodo record is a bit overkill, and it would need to be kept in sync with the documentation. That sounds risky since there would be only one person able to update the Zenodo record, while changes to the GitHub repo can be made through pull requests. This is why I'd rather stick with the git clone.

I can't figure out a good waya to clone just the subdir with a sparse checkout, so, could be fine to clone everything.

We could restrict it with --depth=1 to make it a bit smaller.

simleo · 2024-03-20T11:28:25Z

Thanks for this update, Eli. I'm a bit taken aback because I never thought of this tutorial to be something "alive" that would need updating, but only a training component written for the 2023 Smorgasbord. If the intent is to keep it in sync with the development of ro-crate-py, there's the matter of avoiding duplication / out-of-sync with the ro-crate-py docs (which are in the repository's README.md), as noted by Eli in this issue. Eli's proposed solution (see the issue linked above) is to keep updating this tutorial and make it the only documentation source. We could do that, though I have a few doubts. E.g. is there a way to see the rendering of a PR such as this one before it's merged? Is the language just GitHub-flavored markdown or something more specialized to the GTN?

hexylena · 2024-03-20T11:50:04Z

If the intent is to keep it in sync with the development of ro-crate-py,

I think it could be nice? The GTN is a very FAIR training platform, it could be a good way to ensure that it's seen by a number of people.

It's not necessarily that we want to 'take over' your documentation, since you're obviously in the best place to maintain it long term, and this would mean using a different platform and changes going through peer-review which I recognise is an added burden for you @simleo.

I think there are some benefits to having it in the GTN but I can recognise those aren't necessarily aligned with your primary goals of delivering RO-Crate's python library, so, at least from our side, thanks to contributors like @elichad, and potentially others in the future, we can consider also maintaining parallel versions, as an additional place to advertise this really nice library.

Eli's proposed solution (see the issue linked above) is to keep updating this tutorial and make it the only documentation source. We could do that, though I have a few doubts. E.g. is there a way to see the rendering of a PR such as this one before it's merged?

Yes!

if you just want a basic rendering of markdown there is the github view.
If you wish to see a fully rendered preview, it's absolutely possible. (this can also be done locally, but many part-time contributors find gitpod easier.)

The fully rendered preview is potentially more important in this case since this tutorial uses one of the advanced GTN features of converting a markdown document directly into a Jupyter notebook which users can use if they're not following along in their own terminal.

Is the language just GitHub-flavored markdown or something more specialized to the GTN?

It is GFM, you can use all the familiar markdown constructs

But there are some jekyll specific, and GTN-specific additions on top.

Jekyll templates out variables such as {{ site.baseurl }} for us, GTN specific additions include features like tip boxes which are written with standard markdown blockquotes plus a css class, plus a GTN specific title <tip-title>..</tip-title> feature.

simleo

I managed to render the updated page using GitPod, thanks @hexylena! The changes look good to me.

hexylena · 2024-03-20T16:52:42Z

I've given you merge permissions @simleo for the future!

elichad added 9 commits March 19, 2024 12:34

update contributors entry

0788634

update ordering of get_entities output

3f594b1

add name to Donald Duck to prevent it showing as None later

3f84706

migrate data file creation to Python so it works in a notebook too

67987c3

add comment where code cells switch to bash

da7a176

heading level changes N -> N-1

c576a19

make agenda text fit the tutorial

8df1ab1

minor tweaks to language

bc347fe

add some missing content

cff2ad3

github-actions bot added template-and-tools fair labels Mar 19, 2024

elichad mentioned this pull request Mar 19, 2024

Avoid duplication of tutorial material ResearchObject/ro-crate-py#179

Closed

hexylena reviewed Mar 20, 2024

View reviewed changes

simleo approved these changes Mar 20, 2024

View reviewed changes

hexylena merged commit 06dd849 into galaxyproject:main Mar 20, 2024
3 checks passed

elichad mentioned this pull request Mar 26, 2024

Tutorial updates ResearchObject/ro-crate-py#180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to RO-Crate in Python tutorial #4850

Improvements to RO-Crate in Python tutorial #4850

elichad commented Mar 19, 2024

bgruening commented Mar 19, 2024

hexylena left a comment •

edited

Loading

hexylena Mar 20, 2024

hexylena Mar 20, 2024

hexylena Mar 20, 2024

simleo Mar 20, 2024

hexylena Mar 20, 2024

simleo Mar 20, 2024

hexylena Mar 20, 2024

simleo commented Mar 20, 2024

hexylena commented Mar 20, 2024 •

edited

Loading

simleo left a comment

hexylena commented Mar 20, 2024

Improvements to RO-Crate in Python tutorial #4850

Improvements to RO-Crate in Python tutorial #4850

Conversation

elichad commented Mar 19, 2024

bgruening commented Mar 19, 2024

hexylena left a comment • edited Loading

Choose a reason for hiding this comment

hexylena Mar 20, 2024

Choose a reason for hiding this comment

hexylena Mar 20, 2024

Choose a reason for hiding this comment

hexylena Mar 20, 2024

Choose a reason for hiding this comment

simleo Mar 20, 2024

Choose a reason for hiding this comment

hexylena Mar 20, 2024

Choose a reason for hiding this comment

simleo Mar 20, 2024

Choose a reason for hiding this comment

hexylena Mar 20, 2024

Choose a reason for hiding this comment

simleo commented Mar 20, 2024

hexylena commented Mar 20, 2024 • edited Loading

simleo left a comment

Choose a reason for hiding this comment

hexylena commented Mar 20, 2024

hexylena left a comment •

edited

Loading

hexylena commented Mar 20, 2024 •

edited

Loading