The io.Reader
interface is a small interface that defines a single Read
method. Callers to a Reader
implementation pass a byte slice which is then filled with bytes from the underlying source. This source could be a file, a network socket, etc.
type Reader interface {
Read(p []byte) (n int, err error)
}
However, this interface presents a challenge. It necessitates that the bytes from the source be copied into the byte slice which is given by the caller. In the case where the source is already in memory, it would be more efficient to allow the caller to read directly from the array that is already in memory. In this post I’ll go over a couple examples of this scenario.
Slices and Arrays in Go
It’s useful to quickly review slices in Go. The post Go Slices: usage and internals from the Go blog provides a good overview of their implementation.
Slices are backed by an array in memory and the slice provides a “view” of sorts over a subset of the array.
a := [10]int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
s := a[3:6] // 3,4,5
fmt.Println(cap(s)) // 7, indicating the capacity from slice start to end of array
The slice however will retain the full backing array in memory. This means that you can create slices to view subsets of the data without allocating a new array and copying data into it. One caveat is that if you overwrite data in the slice the original array will be modified.
We would like to use this property of slices to be able to read over arrays without making unnecessary copies.
bytes.Reader
bytes.Reader
is a popular type which implements the io.Reader
interface over a byte slice. Unfortunately, this doesn’t allow for zero copy reads from the underlying []byte
by directly using the methods. Instead, you have to take a more indirect route and use WriteTo
in which bytes.Reader
will pass a slice of the underlying []byte
to the given io.Writer
. This allows us to read the underlying data without making copies.
type zeroCopyWriter struct{}
func (w *zeroCopyWriter) Write(b []byte) (int, error) {
fmt.Printf("%v", b)
return len(b), nil
}
func main() {
r := bytes.NewReader([]byte("Hello, 世界"))
r.WriteTo(&zeroCopyWriter{})
}
bufio.Reader
The bufio.Reader
in Go reads from an underlying io.Reader
and stores the data in a buffer. This allows the program to make fewer read system calls by batching them, and allowing the caller to read from the stored buffer instead.
When a call to bufio.Reader.Read
is made, if the reader’s buffer does not have enough bytes the bufio.Reader
makes a call to the underlying io.Reader
to fill the buffer whose default size is 4096 bytes. Usually this results in a read
system call to read from a file or network socket. The bytes are then returned from the buffer. Once the buffer is filled however, any subsequent calls can be read directly from the buffer provided it contains enough data. This is very helpful as many programs will make many small calls to Read
which could degrade performance if every call resulted in a system call to the operating system.
This first copy into the buffer can’t be avoided but we can avoid a second copy from the buffer into another array. We can’t do this with the Read
method but we can use a combination of the Buffered
, Peek
, and Discard
methods.
b := []byte("Hello, 世界")
r := bufio.NewReader(bytes.NewReader(b))
// Determine how many bytes to read.
numBytesToRead := r.Buffered()
if numBytesToRead < 5 {
numBytesToRead = 5
}
// Get a slice of the buffer.
p, _ := r.Peek(numBytesToRead)
fmt.Println(string(p))
// Discard the bytes read.
_, _ = r.Discard(len(p))
Peek
gives us a slice of the underlying buffer which allows us to read from the buffer directly. We can then call Discard
to advance the reader after processing the bytes. Because the slice returned by Peek
points to the underlying byte array it is no longer valid after the reader is advanced because the buffer could have been overwritten.
I used this style when implementing my buffered rune reader in ianlewis/runeio
so that callers can peek at the rune stream without advancing the reader with zero copy semantics.