Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check isNull(column) at SinglePageRecordReader to avoid ArrayIndexOutOfBoundsException #86

Closed
wants to merge 1 commit into from

Conversation

sakama
Copy link
Contributor

@sakama sakama commented May 23, 2017

I'm using embulk-base-resetclient with embulk-output-elasticsearch.

I sometimes got ArrayIndexOutOfBoundsException while exporting to Es server.
I'm using embulk-base-resetclient v0.4.2 for now, but confirmed that also happens with v0.5.1.
It seems that failure happens when msgpack formatted file contains empty String column.

This problem never happens when source file is (JSON|CSV), only happens with MessagePack formatted files.
Then, I may need to fix embulk-parser-msgpack.

Stacktrace

org.embulk.exec.PartialExecutionException: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:373) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:591) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:33) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:389) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:385) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.Exec.doWith(Exec.java:25) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.run(BulkLoader.java:385) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:180) ~[embulk-core-0.8.18.jar:na]
  at com.treasuredata.worker.calls.EmbulkResult.run(EmbulkResult.java:69) ~[td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_80]
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_80]
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
  at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
  at com.treasuredata.java_call.JavaCall.processRequest(JavaCall.java:147) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at com.treasuredata.java_call.JavaCall.run(JavaCall.java:111) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at com.treasuredata.worker.JavaCallMain.main(JavaCallMain.java:21) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at java.util.Arrays$ArrayList.get(Arrays.java:2866) ~[na:1.7.0_80]
  at org.embulk.spi.Page.getStringReference(Page.java:53) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageReader.getString(PageReader.java:111) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageReader.getString(PageReader.java:105) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.base.restclient.record.SinglePageRecordReader.getString(SinglePageRecordReader.java:67) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.stringColumn(JacksonAllInObjectScope.java:62) ~[na:na]
  at org.embulk.spi.Column.visit(Column.java:58) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.Schema.visitColumns(Schema.java:81) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9) ~[na:na]
  at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14) ~[na:na]
  at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18) ~[na:na]
  at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43) ~[na:na]
  at org.embulk.spi.PageBuilder.doFlush(PageBuilder.java:244) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageBuilder.flush(PageBuilder.java:250) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageBuilder.addRecord(PageBuilder.java:227) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.parser.msgpack.MsgpackParserPlugin.run(MsgpackParserPlugin.java:280) ~[na:na]
  at org.embulk.spi.FileInputRunner.run(FileInputRunner.java:153) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.util.Executors.process(Executors.java:67) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.util.Executors.process(Executors.java:42) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:184) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:180) ~[embulk-core-0.8.18.jar:na]
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_80]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]
java.lang.ArrayIndexOutOfBoundsException: 123456789
org.embulk.exec.PartialExecutionException: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:373)
  at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:591)
  at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:33)
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:389)
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:385)
  at org.embulk.spi.Exec.doWith(Exec.java:25)
  at org.embulk.exec.BulkLoader.run(BulkLoader.java:385)
  at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:180)
  at com.treasuredata.worker.calls.EmbulkResult.run(EmbulkResult.java:69)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.treasuredata.java_call.JavaCall.processRequest(JavaCall.java:147)
  at com.treasuredata.java_call.JavaCall.run(JavaCall.java:111)
  at com.treasuredata.worker.JavaCallMain.main(JavaCallMain.java:21)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at java.util.Arrays$ArrayList.get(Arrays.java:2866)
  at org.embulk.spi.Page.getStringReference(Page.java:53)
  at org.embulk.spi.PageReader.getString(PageReader.java:111)
  at org.embulk.spi.PageReader.getString(PageReader.java:105)
  at org.embulk.base.restclient.record.SinglePageRecordReader.getString(SinglePageRecordReader.java:67)
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.stringColumn(JacksonAllInObjectScope.java:62)
  at org.embulk.spi.Column.visit(Column.java:58)
  at org.embulk.spi.Schema.visitColumns(Schema.java:81)
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36)
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
  at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
  at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
  at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
  at org.embulk.spi.PageBuilder.doFlush(PageBuilder.java:244)
  at org.embulk.spi.PageBuilder.flush(PageBuilder.java:250)
  at org.embulk.spi.PageBuilder.addRecord(PageBuilder.java:227)
  at org.embulk.parser.msgpack.MsgpackParserPlugin.run(MsgpackParserPlugin.java:280)
  at org.embulk.spi.FileInputRunner.run(FileInputRunner.java:153)
  at org.embulk.spi.util.Executors.process(Executors.java:67)
  at org.embulk.spi.util.Executors.process(Executors.java:42)
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:184)
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:180)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)

@sakama sakama changed the title [WIP] Check isNull(column) at SinglePageRecordReader to avoid ArrayIndexOutOfBoundsException Check isNull(column) at SinglePageRecordReader to avoid ArrayIndexOutOfBoundsException May 23, 2017
@ylzen
Copy link
Member

ylzen commented May 25, 2017

@sakama Is this PR ready for review?

@sakama
Copy link
Contributor Author

sakama commented May 26, 2017

I don't have perfect confident and will test again.

@sakama sakama requested a review from dmikurube May 26, 2017 07:05
@sakama
Copy link
Contributor Author

sakama commented May 26, 2017

@dmikurube At least, I can insert data as expected without any failures.
Could you take a look?

@muga
Copy link
Contributor

muga commented May 26, 2017

I just wanted to left comments:

  • null check is necessary somewhere.
  • It's better to return null value if isNull returns true. Not empty string.
  • Nice to have null check for timestamp and json types.

@dmikurube
Copy link
Member

@sakama That SinglePageRecordReader is really just forwarding method calls into PageReader. If this is a problem, I'm afraid we need to fix Embulk's PageReader instead. What do you think? Cc: @muga

@muga
Copy link
Contributor

muga commented May 29, 2017

@dmikurube Agreed. embulk/embulk#654

@dmikurube
Copy link
Member

@muga lgtm'ed the PR on embulk/embulk.

@sakama If we still need an immediate workaround, I'd merge this when a comment is appended to describe it's just a workaround (with a link to an issue or this PR).

@sakama
Copy link
Contributor Author

sakama commented May 30, 2017

I checked embulk/embulk#654 at my env.
I got NullPointerException instead of ArrayIndexOutOfBoundsException.
This should valid behavior and I think I can avoid by adding null check at JacksonAllInObjectScope.

I'll create another PR instead of this PR.

Caused by: java.lang.NullPointerException
	at org.embulk.spi.time.TimestampFormatter.format(TimestampFormatter.java:93)
	at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.timestampColumn(JacksonAllInObjectScope.java:74)
	at org.embulk.spi.Column.visit(Column.java:60)
	at org.embulk.spi.Schema.visitColumns(Schema.java:81)
	at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36)
	at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
	at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
	at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
	at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
	at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
	at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:394)
	at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:319)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	Suppressed: java.lang.NullPointerException
		... 16 more
	Suppressed: java.lang.NullPointerException
		... 16 more
	Suppressed: java.lang.NullPointerException
		... 16 more

@sakama sakama closed this May 30, 2017
@sakama sakama deleted the fix-array-index-out-of-bounds-exception branch May 30, 2017 07:54
@dmikurube
Copy link
Member

@sakama Ah, that sounds the right fix. Thanks!

@muga
Copy link
Contributor

muga commented May 30, 2017

@sakama good catch 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants