How would one readLines from a gzip file in R?
16:06 13 Aug 2017

I need to read lines in small batches (say, 100 at a time) from a gzip file which is a text file that has been compressed using gzip. I use small batches because each line is extremely long.

However, I am unable to that with something like this (I think the buffer is not updated):

in.con <- gzfile("somefile.txt.gz")
for (i in 1:100000) {
  chunk <- readLines(in.con,n = 100)
  # If you inspect a chunk in each loop step, say with a print
  # you will find that chunk updates once or twice and then
  # keeps printing the same data.
}
close(in.con)

How do I accomplish something similar?

Notes:

  1. For small files, this will work.
  2. You will need a very large file, and when you try to read it multiple times, you will see that the chunk variable will not update
  3. I think it is because an underlying scan is not reliable on a gzip file
  4. The i variable is just to limit the loop. i is not needed to be referenced
  5. Some comments seem to be saying that the code will not work with a text file. I'm posting results that show otherwise:

.

in.con <- file("some.file.txt", "r", blocking = FALSE)
while(TRUE) {
  chunk <- readLines(in.con,n = 2)
  if (length(chunk)==0) break;
  print(chunk)
}
close(in.con)

resulting in the output:

[1] "1" "2"
[1] "3" "4"
[1] "5" "6"
[1] "7" "8"
[1] "9"  "10"

My version information is:

platform       x86_64-apple-darwin15.6.0
arch           x86_64
os             darwin15.6.0
system         x86_64, darwin15.6.0
status
major          3
minor          4.1
year           2017
month          06
day            30
svn rev        72865
language       R
version.string R version 3.4.1 (2017-06-30)
nickname       Single Candle
r