Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation request: behavior of read_*_chunked when number of rows exceeds maximum integer value #1177

Closed
timothy-barry opened this issue Jan 29, 2021 · 3 comments

Comments

@timothy-barry
Copy link
Contributor

Hello,

Thank you for the response to my previous issue.

Can I safely use read_*_chunked when the number of rows in the file exceeds R's maximum integer value of about 2 billion? I will be reading fewer than 2 billion rows per chunk. Moreover, I will not use the index or "pos" argument in the callback functions.

@jimhester
Copy link
Collaborator

Reading the file should 'work', though pos will wrap around.

It is going to be very slow however.

I generated a file with 4 billion lines of just '1'` and the read it with

read_lines_raw_chunked("out", function(x, pos) print(pos), chunk_size = 10000000, progress=F))

It took over an hour to get to 2 Billion rows read and start wrapping. As this is about as simple as you can get reading more complex data and doing anything useful with it seems like it would take too long to be practical.

I think you would probably be better off looking into other tools more suited to handle data of this size.

@timothy-barry
Copy link
Contributor Author

timothy-barry commented Jan 29, 2021

Thanks Jim. I've noticed that read_*_chunked is slow when printing pos within the callback function. I wonder if another callback function, such as function(x, pos) return(10), would be faster?

EDIT: I'll give this a try and get back.

jimhester added a commit that referenced this issue Feb 2, 2021
Previously the read would overflow in these cases.

Fixes #1177
@timothy-barry
Copy link
Contributor Author

Thank you. I'll give this a shot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants