Skip to content

Commit

Permalink
Merge pull request #121 from ResearchObject/update_docs
Browse files Browse the repository at this point in the history
Docs revamp
  • Loading branch information
simleo authored May 2, 2022
2 parents 4f42c05 + e76d7f2 commit 83388af
Showing 1 changed file with 153 additions and 94 deletions.
247 changes: 153 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,167 +1,226 @@
[![Python package](https://github.com/ResearchObject/ro-crate-py/workflows/Python%20package/badge.svg)](https://github.com/ResearchObject/ro-crate-py/actions?query=workflow%3A%22Python+package%22) [![Upload Python Package](https://github.com/ResearchObject/ro-crate-py/workflows/Upload%20Python%20Package/badge.svg)](https://github.com/ResearchObject/ro-crate-py/actions?query=workflow%3A%22Upload+Python+Package%22) [![PyPI version](https://badge.fury.io/py/rocrate.svg)](https://pypi.org/project/rocrate/) [![DOI](https://zenodo.org/badge/216605684.svg)](https://zenodo.org/badge/latestdoi/216605684)

ro-crate-py is a Python library to create and consume [Research Object Crates](https://w3id.org/ro/crate). It currently supports the [RO-Crate 1.1](https://w3id.org/ro/crate/1.1) specification.

# ro-crate-py

Python library to create/parse [RO-Crate](https://w3id.org/ro/crate) (Research Object Crate) metadata.
## Installation

Supports specification: [RO-Crate 1.1](https://w3id.org/ro/crate/1.1)

Status: **Alpha**


## Installing

You will need Python 3.7 or later (Recommended: 3.7).

This library is easiest to install using [pip](https://docs.python.org/3/installing/):
ro-crate-py requires Python 3.7 or later. The easiest way to install is via [pip](https://docs.python.org/3/installing/):

```
pip install rocrate
```

If you want to install manually from this code base, then try:
To install manually from this code base (e.g., to try the latest development revision):

```
git clone https://github.com/ResearchObject/ro-crate-py
cd ro-crate-py
pip install .
```

..or if you use don't use `pip`:
```
python setup.py install
```

## General usage
## Usage

### The RO-crate object
### Creating an RO-Crate

In general you will want to start by instantiating the `ROCrate` object. This can be a new one:
In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level that contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html) (representing, e.g., a person or an organization).

Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. Let's make an RO-Crate to package all this:

```python
from rocrate.rocrate import ROCrate

crate = ROCrate()
crate = ROCrate()
paper = crate.add_file("exp/paper.pdf", properties={
"name": "manuscript",
"encodingFormat": "application/pdf"
})
table = crate.add_file("exp/results.csv", properties={
"name": "experimental data",
"encodingFormat": "text/csv"
})
diagram = crate.add_file("exp/diagram.svg", dest_path="images/figure.svg", properties={
"name": "bar chart",
"encodingFormat": "image/svg+xml"
})
```

or an existing RO-Crate package can be loaded from a directory or zip file:
We've started by adding the data entitites. Now we need contextual entities to represent Alice and Bob:

```python
crate = ROCrate('/path/to/crate/')
from rocrate.model.person import Person

alice_id = "https://orcid.org/0000-0000-0000-0000"
bob_id = "https://orcid.org/0000-0000-0000-0001"
alice = crate.add(Person(crate, alice_id, properties={
"name": "Alice Doe",
"affiliation": "University of Flatland"
}))
bob = crate.add(Person(crate, bob_id, properties={
"name": "Bob Doe",
"affiliation": "University of Flatland"
}))
```

Next, we express authorship of the various files:

```python
crate = ROCrate('/path/to/crate/file.zip')
paper["author"] = [alice, bob]
table["author"] = alice
diagram["author"] = bob
```

In addition, there is a set of higher level functions in the form of an interface to help users create some predefined types of crates.
As an example here is the code to create a [workflow RO-Crate](https://about.workflowhub.eu/Workflow-RO-Crate/), containing a workflow template.
This is a good starting point if you want to wrap up a workflow template to register at [workflowhub.eu](https://about.workflowhub.eu/):

Finally, we serialize the crate to disk:

```python
from rocrate import rocrate_api
crate.write("exp_crate")
```

Now the `exp_crate` directory should contain copies of the three files and an `ro-crate-metadata.json` file with a JSON-LD serialization of the entities and relationships we created, according to the RO-Crate profile. Note that we have chosen a different destination path for the diagram, while the other two files have been placed at the top level with their names unchanged (the default).

wf_path = "test/test-data/test_galaxy_wf.ga"
files_list = ["test/test-data/test_file_galaxy.txt"]
Some applications and services support RO-Crates stored as archives. To save the crate in zip format, use `write_zip`:

# Create base package
wf_crate = rocrate_api.make_workflow_rocrate(workflow_path=wf_path,wf_type="Galaxy",include_files=files_list)
```python
crate.write_zip("exp_crate.zip")
```

Independently of the initialization method, once an instance of `ROCrate` is created it can be manipulated to extend the content and metadata.
You can also add whole directories. A directory in RO-Crate is represented by the `Dataset` entity:

### Data entities
```python
logs = crate.add_dataset("exp/logs")
```

#### Adding remote entities

[Data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html) can be added with:
Data entities can also be remote:

```python
## adding a File entity:
sample_file = '/path/to/sample_file.txt'
file_entity = crate.add_file(sample_file)
input_data = crate.add_file("http://example.org/exp_data.zip")
```

# Adding a File entity with a reference to an external (absolute) URI
remote_file = crate.add_file('https://github.com/ResearchObject/ro-crate-py/blob/master/test/test-data/test_galaxy_wf.ga', fetch_remote = False)
By default the file won't be downloaded, and will be referenced by its URI in the serialized crate:

# adding a Dataset
sample_dir = '/path/to/dir'
dataset_entity = crate.add_directory(sample_dir, 'relative/rocrate/path')
```json
{
"@id": "http://example.org/exp_data.zip",
"@type": "File"
},
```

### Contextual entities
If you add `fetch_remote=True` to the `add_file` call, however, the library (when `crate.write` is called) will try to download the file and include it in the output crate.

[Contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html) are used in an RO-Crate to adequately describe a Data Entity. The following example shows how to add the person contextual entity to the RO-Crate root:
Another option that influences the behavior when dealing with remote entities is `validate_url`, also `False` by default: if it's set to `True`, when the crate is serialized, the library will try to open the URL to add / update metadata bits such as the content's length and format (but it won't try to download the file unless `fetch_remote` is also set).

```python
from rocrate.model.person import Person

# Add authors info
crate.add(Person(crate, '#joe', {'name': 'Joe Bloggs'}))
#### Adding entities with an arbitrary type

# wf_crate example
publisher = Person(crate, '001', {'name': 'Bert Verlinden'})
creator = Person(crate, '002', {'name': 'Lee Ritenour'})
wf_crate.add(publisher, creator)
An entity can be of any type listed in the [RO-Crate context](https://www.researchobject.org/ro-crate/1.1/context.jsonld). However, only a few of them have a counterpart (e.g., `File`) in the library's class hierarchy (either because they are very common or because they are associated with specific functionality that can be conveniently embedded in the class implementation). In other cases, you can explicitly pass the type via the `properties` argument:

# These contextual entities can be assigned to other metadata properties:
```python
from rocrate.model.contextentity import ContextEntity

hackathon = crate.add(ContextEntity(crate, "#bh2021", properties={
"@type": "Hackathon",
"name": "Biohackathon 2021",
"location": "Barcelona, Spain",
"startDate": "2021-11-08",
"endDate": "2021-11-12"
}))
```

wf_crate.publisher = publisher
wf_crate.creator = [ creator, publisher ]
Note that entities can have multiple types, e.g.:

```python
"@type" = ["File", "SoftwareSourceCode"]
```

### Other metadata
### Consuming an RO-Crate

Several metadata fields on root level are supported for the workflow RO-crate:
An existing RO-Crate package can be loaded from a directory or zip file:

```
wf_crate.license = 'MIT'
wf_crate.isBasedOn = "https://climate.usegalaxy.eu/u/annefou/w/workflow-constructed-from-history-climate-101"
wf_crate.name = 'Climate 101'
wf_crate.keywords = ['GTN', 'climate']
wf_crate.image = "climate_101_workflow.svg"
wf_crate.description = "The tutorial for this workflow can be found on Galaxy Training Network"
wf_crate.CreativeWorkStatus = "Stable"
```python
crate = ROCrate('exp_crate') # or ROCrate('exp_crate.zip')
for e in crate.get_entities():
print(e.id, e.type)
```

### Writing the RO-crate file
```
ro-crate-metadata.json CreativeWork
./ Dataset
paper.pdf File
results.csv File
images/figure.svg File
https://orcid.org/0000-0000-0000-0000 Person
https://orcid.org/0000-0000-0000-0001 Person
```

In order to write the crate object contents to a zip file package or a decompressed directory, there are 2 write methods that can be used:
The first two entities shown in the output are the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html) and the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html), respectively. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). You can access data entities with `crate.data_entities` and contextual entities with `crate.contextual_entities`. For instance:

```python
# Write to zip file
out_path = "/home/test_user/crate"
crate.write_zip(out_path)
for e in crate.data_entities:
author = e.get("author")
if not author:
continue
elif isinstance(author, list):
print(e.id, [p["name"] for p in author])
else:
print(e.id, repr(author["name"]))
```

# write crate to disk
out_path = "/home/test_user/crate_base"
crate.write(out_path)
```
paper.pdf ['Alice Doe', 'Bob Doe']
results.csv 'Alice Doe'
images/figure.svg 'Bob Doe'
```

You can fetch an entity by its `@id` as follows:

```python
article = crate.dereference("paper.pdf")
```


## Command Line Interface

`ro-crate-py` includes a hierarchical command line interface: the `rocrate` tool. `rocrate` is the top-level command, while specific functionalities are provided via sub-commands. Currently, the tool allows to initialize a directory tree as an RO-Crate (`rocrate init`) and to modify the metadata of an existing RO-Crate (`rocrate add`).

```
```console
$ rocrate --help
Usage: rocrate [OPTIONS] COMMAND [ARGS]...

Options:
-c, --crate-dir PATH
--help Show this message and exit.
--help Show this message and exit.

Commands:
add
init
write-zip
```

Commands act on the current directory, unless the `-c` option is specified.
### Crate initialization

The `rocrate init` command explores a directory tree and generates an RO-Crate metadata file (`ro-crate-metadata.json`) listing all files and directories as `File` and `Dataset` entities, respectively. The metadata file is added (overwritten if present) to the directory at the top-level, turning it into an RO-Crate.
The `rocrate init` command explores a directory tree and generates an RO-Crate metadata file (`ro-crate-metadata.json`) listing all files and directories as `File` and `Dataset` entities, respectively.

The `rocrate add` command allows to add workflows and other entity types (currently [testing-related metadata](https://github.com/crs4/life_monitor/wiki/Workflow-Testing-RO-Crate)) to an RO-Crate. The entity type is specified via another sub-command level:
```console
$ rocrate init --help
Usage: rocrate init [OPTIONS]

Options:
--gen-preview
-e, --exclude CSV
-c, --crate-dir PATH
--help Show this message and exit.
```
# rocrate add --help

The command acts on the current directory, unless the `-c` option is specified. The metadata file is added (overwritten if present) to the directory at the top level, turning it into an RO-Crate.

### Adding items to the crate

The `rocrate add` command allows to add workflows and other entity types (currently [testing-related metadata](https://crs4.github.io/life_monitor/workflow_testing_ro_crate)) to an RO-Crate:

```console
$ rocrate add --help
Usage: rocrate add [OPTIONS] COMMAND [ARGS]...

Options:
Expand All @@ -178,46 +237,46 @@ Note that data entities (e.g., workflows) must already be present in the directo

### Example

```
```bash
# From the ro-crate-py repository root
cd test/test-data/ro-crate-galaxy-sortchangecase
```

This directory is already an ro-crate. Delete the metadata file to get a plain directory tree:

```
```bash
rm ro-crate-metadata.json
```

Now the directory tree contains several files and directories, including a Galaxy workflow and a Planemo test file, but it's not an RO-Crate since there is no metadata file. Initialize the crate:

```
```bash
rocrate init
```

This creates an `ro-crate-metadata.json` file that lists files and directories rooted at the current directory. Note that the Galaxy workflow is listed as a plain `File`:

```json
{
"@id": "sort-and-change-case.ga",
"@type": "File"
}
{
"@id": "sort-and-change-case.ga",
"@type": "File"
}
```

To register the workflow as a `ComputationalWorkflow`:

```
```bash
rocrate add workflow -l galaxy sort-and-change-case.ga
```

Now the workflow has a type of `["File", "SoftwareSourceCode", "ComputationalWorkflow"]` and points to a `ComputerLanguage` entity that represents the Galaxy workflow language. Also, the workflow is listed as the crate's `mainEntity` (see https://about.workflowhub.eu/Workflow-RO-Crate).

To add [workflow testing metadata](https://github.com/crs4/life_monitor/wiki/Workflow-Testing-RO-Crate) to the crate:

```
rocrate add test-suite -i \#test1
rocrate add test-instance \#test1 http://example.com -r jobs -i \#test1_1
rocrate add test-definition \#test1 test/test1/sort-and-change-case-test.yml -e planemo -v '>=0.70'
```bash
rocrate add test-suite -i test1
rocrate add test-instance test1 http://example.com -r jobs -i test1_1
rocrate add test-definition test1 test/test1/sort-and-change-case-test.yml -e planemo -v '>=0.70'
```


Expand Down

0 comments on commit 83388af

Please sign in to comment.