NerdVana

Our software never has bugs; it just develops random features.

Python: Rapid Reverse Readline

Written by Eric Schwimmer

There are many solutions out there with which to beat this particular ex-equus, but none of them were quite performant enough for me. The particular software I was working on was reading multi-terabyte files, starting from the end, so loading the entire file into memory was obviously a non-starter. And given the ginormity of the files I was parsing, the code had to be as tight as possible. Here is what I came up with:

# revReadline: Take an read-mode filehandle and
# return a generator that will read the file backwards, 
# line by line
def revReadline(fh, bufSize=4096):
    fh.seek(-bufSize,2)
    filePos = 1
    fragment = ""

    while filePos > 0:
        readBuffer = fullBuffer = ""
        while "\n" not in readBuffer and filePos > 0:
            readBuffer = fh.read(bufSize)
            fullBuffer += readBuffer
            fh.seek(bufSize*-2, 1)
            filePos = fh.tell()

        lines = fullBuffer.split("\n")
        lines[-1] += fragment
        while len(lines) > 1:
            yield lines.pop()

        fragment = lines[0]

    yield fragment  


comments powered by Disqus