Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Found 'j(106)' Expected 'x' here #94

Closed
DilumAluthge opened this issue Jan 11, 2021 · 6 comments
Closed

ERROR: Found 'j(106)' Expected 'x' here #94

DilumAluthge opened this issue Jan 11, 2021 · 6 comments

Comments

@DilumAluthge
Copy link

DilumAluthge commented Jan 11, 2021

I have two almost identical files, foo.pdf and bar.pdf. I can open them both in a PDF viewer.

I cannot open bar.pdf in PDFIO.jl, but I can open foo.pdf in PDFIO.jl. See the log below.

I can't share the PDF files publicly, but I'd be happy to email them to you if you like.

julia> using PDFIO

julia> bar = PDFIO.pdDocOpen("bar.pdf")

PDDoc ==>

CosDoc ==>
	filepath:		/Users/dilum/Downloads/bar.pdf
	size:			11675219
	hasNativeXRefStm:	 false
	Trailer dictionaries:
	<<
	/Root	1 0 R
	/Size	3142
	/Info	2 0 R
>>

Catalog:
1 0 obj
<<
	/Pages	3 0 R
	/Type	/Catalog
>>
endobj

isTagged: none


julia> foo = PDFIO.pdDocOpen("foo.pdf")
ERROR: Found 'j(106)' Expected 'x' here
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] skipv at /Users/dilum/.julia/dev/PDFIO/src/BufferParser.jl:25 [inlined]
 [3] skipv at /Users/dilum/.julia/dev/PDFIO/src/BufferParser.jl:30 [inlined]
 [4] read_xref_table(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:491
 [5] read_xref_tables(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:460
 [6] doc_trailer_update(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:412
 [7] cosDocOpen(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:141
 [8] PDFIO.PD.PDDocImpl(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/PDDocImpl.jl:16
 [9] pdDocOpen(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/PDDoc.jl:77
 [10] pdDocOpen(::String) at /Users/dilum/.julia/dev/PDFIO/src/PDDoc.jl:77
 [11] top-level scope at REPL[8]:1

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, haswell)

I am on the master branch of PDFIO.jl, i.e. I installed it with ] add PDFIO#master.

And the PDFIO.jl tests pass for me.

@sambitdash
Copy link
Owner

sambitdash commented Jan 11, 2021

@DilumAluthge what you call as identical PDF files are differentiated by the scheme you mentioned in your post on discourse recently. Sorry, I am presuming you are the same person.

https://discourse.julialang.org/t/renumbering-pdf-files-can-this-be-implemented-with-pdfio-jl-instead/53144

I will not be surprised PDFs generated by your code will have wrong cross references and are corrupt from a PDF specification standpoint. While PDFIO has been made lenient at places to accommodate the PDFs generated by some well known creators but generally I am not in favor of supporting any file that has been tampered with and inconsistent with the PDF specifications.

@DilumAluthge
Copy link
Author

Actually, I still get this error even without running the code in my Discourse comment.

If bar.pdf has this:

1 0 obj
<</Type /Catalog /Pages 3 0 R
>>
endobj

And foo.pdf has this:

1 0 obj
<</Type /Catalog
  /Pages 3 0 R
>>
endobj

And there are no other differences between the files, then PDFIO can load bar.pdf but not foo.pdf.

This should be legal, right? You are allowed to add whitespace inside a dictionary? At least, that's what it says here: https://www.oreilly.com/library/view/developing-with-pdf/9781449327903/ch01.html#example_1-6

@DilumAluthge
Copy link
Author

Also, this works:

1 0 obj
<</Type /Catalog /Pages 3 0 R
>>
endobj

But this errors:

1 0 obj
<</Type /Catalog /Pages 3 0 R>>
endobj

And again, the example here seems to imply that you are allowed to strip out the whitespace inside a dictionary.

@sambitdash
Copy link
Owner

Ensure you have fixed the cross reference tables and/or dictionaries after you add the any whitespaces. The same chapter tells you how to update the cross reference tables. PDF objects are located based on the object offsets from cross reference tables.

@DilumAluthge
Copy link
Author

Thanks!

Does PDFIO have any functionality for generating the cross-reference table? Or do I need to do it manually?

@DilumAluthge
Copy link
Author

I.e. is there a function that would parse the entire PDF file, make a list of all the indirect objects, and output the table?

Thanks for all of your help, both here and on Discourse! As you can tell, I am very new to working with PDF files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants