Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to NeuroJSON (.jnii and .bnii) files (to development branch) #579

Closed
wants to merge 0 commits into from

Conversation

fangq
Copy link
Contributor

@fangq fangq commented Feb 10, 2022

thanks Chris for the feedback. Here is the resubmission of #578, patched against the development branch. I re-tested the export feature and everything looks good. Let me know if you have any other feedback.

Just want to mention that the travis-CI test seems to be failing due to other changes unrelated to JNIfTI, although I see the upstream repo now uses Appveyor instead of travis.


Dear Chris (@neurolabusc), I would like to submit a patch to enable dcm2niix to export to NeuroJSON NIfTI JSON wrappers, including

  • .jnii - lightweight JSON wrapper to NIfTI (text-based JNIfTI, see current specification)
  • .bnii - binary JSON (yet still human-readable) wrapper to NIfTI for disk/IO efficiency

The patch is fairly straightforward, including two added flags: -e j for exporting to .jnii and -e b for exporting to .bnii. Both flags are compression-aware: when -z y is given, the data segment of .jnii/.bnii will be compressed based on the compression level. In this patch, I only added a few functions (nii_savejnii, nii_savebnii) in the nii_dicom_batch.cpp unit and two additional helper units cJSON.c/h and base64.c/h, both of which have been copied from your nii2mesh code tree. The added codes should have no impact to any of the existing features.

The updated code has passed the Travis-CI automated tests, and I have also run valgrind to make sure there is no memory leakage. I also formatted the new functions to use the same tab-based indentation styles.

The bottom screenshot summarizes some of the tests I did, making sure the generated files are readily readable by my Python and MATLAB parsers. It also demonstrate additional benefits enabled by this format - including file sizes, header searchability (without decompressing files) and readability.

I hope you can take a look at this patch and let me know if it can be accepted. I am committed to supporting this feature moving forward. Look forward to hearing from you!

jnifti_patch_demo

@ningfei
Copy link
Collaborator

ningfei commented Feb 10, 2022

Hi Qianqian, I would suggest to rebase or cherry-pick you commits onto the head of development branch, to avoid these unecessary merge commits here.

Btw, does loadjson from JSONLab already support reading .jnii file? I tested it, it didn't work unfortunately...

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

hi @ningfei, I will try a rebase and update this PR.

.jnii is basically a pure JSON file, so any JSON parser should be able to read it, including loadjson. What version of JSONLab you are using? what error did you see? because the internal compression tags were added in JSONLab 1.9.8 (released in 2019), to correctly decode the ND arrays stored in the JSON file, you should use either the 2.0 release or the latest github version (2.9.8-pre).

@ningfei
Copy link
Collaborator

ningfei commented Feb 10, 2022

I basically recompiled dcm2niix and used -e j option to export .jnii file. Then in MATLAB, I tried nii = loadjson('test.nii'); using latest github head of jsonlab:

Error using reshape
Number of elements must not change. Use [] as one of the size inputs to automatically calculate the appropriate size for that dimension.

Error in jdatadecode (line 329)
            ndata=reshape(ndata(:),dims(:)');

Error in jdatadecode (line 102)
                newdata(j).(fn{i})=jdatadecode(data(j).(fn{i}),opt);

Error in loadjson (line 195)
        data=jdatadecode(data,'Base64',1,'Recursive',1,varargin{:});

However, if I convert a nifti file to jnii using nii2jnii. Then I was able to load the generated jnii file using loadjson.

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

@ningfei, can you give me the full dcm2niix command (or path of the dicom file you converted)?

I just tested again with the files in dcm_qa_uih and loadjson reads the file without problem. can you also do a which loadjson and make sure it was using the latest version.

@ningfei
Copy link
Collaborator

ningfei commented Feb 10, 2022

@fangq Yes, I'm sure I used the latest version of jsonlab. I tried both dcm2niix -x n -b n -i y -e j -o test DICOM and dcm2niix -e j -o test DICOM to convert some patient data I have and the files in dcm_qa_uih. loadjson couldn't load the generated .jnii files properly. Now I get a warning message but basically the same error:

Warning: Failed to decode embedded JData annotations, return raw JSON data

Error: MATLAB:getReshapeDims:notSameNumel
Number of elements must not change. Use [] as one of the size inputs to automatically calculate the appropriate size for that dimension. 
> In loadjson (line 198)

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

can you give me the specific DICOM folder name so I can reproduce? I tested a few folders in dcm_qa_uih, all went ok, see screenshot. Ubuntu 18.04

jnii_test

@ningfei
Copy link
Collaborator

ningfei commented Feb 10, 2022

That's a general problem I'm afraid. I did tests for all the files under dcm_qa_uih and the patient data I have on two machines, Ubuntu 20.04 and 20.10. All failed... At line 329 of jdatadecode, where the error happened, the ndata should have the size of 160*460*512 = 37683200, but the length is 100488536 instead...

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

what is the output of which jdatadecode? I want to make sure there isn't a version conflict - I know LEAD-DBS has a built-in version of iso2mesh (v1.8, released 5 yrs ago), which contains loadjson (but not jdatadecode.m).

the other thing to check is to make sure your dcm2niix is compiled from this branch:
https://github.com/NeuroJSON/dcm2niix/tree/development

also, if you can post one of the generated .jnii file to a google drive/dropbox, I will try to see if I can load it from my side.

@ningfei
Copy link
Collaborator

ningfei commented Feb 10, 2022

For sure, jdatadecode is inside the jsonlab repository. I actually removed LeadDBS from path before the tests.

I applied your PR to dcm2niix development branch. But now I also tried using the NeuroJSON repo. Same error.

I also tested on Windows, still same error. To me, it seems that the base64 encoding in dcm2niix was not working properly.

You can check out this test jnii I converted, it's from In\DTI_132225\t1_gre_fsp_3d_sag__132750

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

@ningfei, thanks for the test, I figured out what happened - my implementation of the JSON-encoded un-compressed data does not agree with the specification. In most of my tests, I used compression (-z y), so it was not apparent to me.

give me a moment to add a correction.

@fangq
Copy link
Contributor Author

fangq commented Feb 10, 2022

@ningfei, the issue should be fixed, although you need to update both dcm2niix and jsonlab, here are the two needed patches:
b3c4bfe and https://github.com/fangq/jsonlab/commit/6150ae1c788348d3a8450a721af1f21cbacd6bf4

what happened was the previously generated uncompressed NIFTIData section incorrectly used the below JData N-D array container

"NIFTIData": {
   "_ArrayType_": "...",
   "_ArraySize_":  [...], 
   "_ArrayData_": ...
}

where _ArrayData_ is supposed to store the plain JSON-formatted array elements (in row-major), i.e. a series of comma-separated numbers such as "_ArrayData_":[1,2,0,1,...], and is not compatible with base64 encoded binary data.

To store uncompressed binary data, I should have used this construct

"NIFTIData": {
   "_ArrayType_": "...",
   "_ArraySize_":  [...], 
   "_ArrayZipType_": "base64",
   "_ArrayZipSize_": [...]
   "_ArrayZipData_": "... base64-encoded un-compressed binary stream ..."
}

Because in the past, most of my JSON files use internal compression to reduce size, so storing uncompressed binary data in JSON has not been encountered until now. I am so glad that you caught this so it won't become an issue moving forward.

Please test again and let me know if you see any additional issues.

PS: I want to add that .bnii does not have this issue because it directly stores the uncompressed binary data in _ArrayData_ without needing base64 decoding.

@ningfei
Copy link
Collaborator

ningfei commented Feb 11, 2022

Great, thanks! It works now. I haven't really looked into the details of JData specifications. I understand that jnifti would be more extensible/flexible. But as I tested, both the size of the generated file and the loading speed is not comparable to the standard nifti file. See the comparision below (the same t1 image as we used for test from dcm_qa_uih). Did I miss something here?

➜  dcm_qa_uih git: ll t1* 
47M Feb 11 08:13 t1_compressed.jnii
96M Feb 11 08:13 t1.jnii
72M Feb 11 08:16 t1.nii
35M Feb 11 08:16 t1.nii.gz
tic;nii = loadjson('t1.jnii');toc
tic;nii = loadjson('t1_compressed.jnii');toc
tic;nii = load_untouch_nii('t1.nii');toc
tic;nii = load_untouch_nii('t1.nii.gz');toc
Elapsed time is 1.694440 seconds.
Elapsed time is 1.404535 seconds.
Elapsed time is 0.116922 seconds.
Elapsed time is 0.410925 seconds.

@neurolabusc
Copy link
Collaborator

@ningfei thanks for your great detective work on this PR. You may be interested in this group. Formats like GIfTI (xml with binary data embedded as base64) and JNIfTI (JSON with binary data embedded as base64) trade off file size and speed in an effort to make the headers human readable and extensible.

@ningfei
Copy link
Collaborator

ningfei commented Feb 11, 2022

Thanks @neurolabusc ! Just joined, will check out the discussion there.

@fangq
Copy link
Contributor Author

fangq commented Feb 11, 2022

@ningfei, what @neurolabusc mentioned is correct - it is a trade-off:

on the down side:

on the pro side:

  • JSON is readable, searchable, universally parsable, and web-friendly, it can be easily extended without breaking backward/forward compatibility
  • once data converted to JSON (or binary JSON), one can directly utilize established NoSQL database technologies to organize/store/query/manipulate large-sized complex datasets, locally or in the cloud. NoSQL databases such as MongoDB or CouchDB use JSON as the native input/output format
  • the community can take advantage of the large JSON toolchain/library eco-system, including JSON-Schema for data validation, JSON Path for query, JSON-LD for data linking/referencing etc, like we did in a recent paper.

If speed/size is a concern, I would recommend you try -e b or -e b -z y to check out those .bnii files. .bnii does not have the 33% overhead, and is relatively fast to parse (use loadbj/savebj or loadjnifti/savejnifti).

JNIfTI also supports other compression methods, such as lzma or lz4 - lzma can produce smaller file sizes but slower in compression, and lz4 is larger in size but is so fast that is disk-speed-bound. I am happy to contribute these additional compression modules if @neurolabusc sees a need in dcm2niix.

@neurolabusc
Copy link
Collaborator

@fangq for upcoming formats I would suggest you evaluate zstd instead of lzma or lz4. You can evaluate different compression methods using my zlib-bench-python. Uncomment or add the lines like #exes.append({'exe': 'lz4',.. to include your favorite compression tool. You can set the corpus to test neuroimaging datatypes. Relevant to this discussion is how swizzling of the UINT16 data can dramatically improve compression (e.g. blosc).

It should also be noted that recent zstd releases have enhanced performance a lot since I last ran my benchmarks, so the performance will be even more stark. zstd is a real game changer for open source image compression. It would be interesting to see how the latest zstd compares against the outstanding but proprietary Oodle products.

My take away is that lzma and lz4 are not on the Pareto frontier - by adjusting the compression level zstd can be tuned from super fast to super compression.

@ningfei
Copy link
Collaborator

ningfei commented Feb 11, 2022

@fangq great to know that you are trying to add support for mmap. Looking forward! It would be very useful, for example, if I want to query the value at some specific voxel coordiates in a large amount of images.

@fangq
Copy link
Contributor Author

fangq commented Feb 11, 2022

thanks @neurolabusc for the pointers on data compression. Somebody did mention zstd in one of the zmat issues (https://github.com/fangq/zmat/issues/5) and I haven't read it carefully. it is indeed quite impressive. Also, the swizzled data compression is impressive too. lot's good tools to incorporate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants