How would one readLines from a gzip file in R?
I need to read lines in small batches (say, 100 at a time) from a gzip file which is a text file that has been compressed using gzip. I use small batches because each line is extremely long.
However, I am unable to that with something like this (I think the buffer is not updated):
in.con <- gzfile("somefile.txt.gz")
for (i in 1:100000) {
chunk <- readLines(in.con,n = 100)
# If you inspect a chunk in each loop step, say with a print
# you will find that chunk updates once or twice and then
# keeps printing the same data.
}
close(in.con)
How do I accomplish something similar?
Notes:
- For small files, this will work.
- You will need a very large file, and when you try to read it multiple times, you will see that the chunk variable will not update
- I think it is because an underlying scan is not reliable on a gzip file
- The
ivariable is just to limit the loop.iis not needed to be referenced - Some comments seem to be saying that the code will not work with a text file. I'm posting results that show otherwise:
.
in.con <- file("some.file.txt", "r", blocking = FALSE)
while(TRUE) {
chunk <- readLines(in.con,n = 2)
if (length(chunk)==0) break;
print(chunk)
}
close(in.con)
resulting in the output:
[1] "1" "2"
[1] "3" "4"
[1] "5" "6"
[1] "7" "8"
[1] "9" "10"
My version information is:
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 4.1
year 2017
month 06
day 30
svn rev 72865
language R
version.string R version 3.4.1 (2017-06-30)
nickname Single Candle