A recent vulnerability in the go-redis library allowed correctly configured clients to return the results of other redis commands in the event of redis failures. While all code is subject to bugs, this one could have been prevented if proper care was taken when creating and using interfaces.

Bug behavior Link to heading

Most clients reuse TCP connections between RPC calls to eliminate the delay of TCP connection overhead. TCP connections have both an input and output stream. Redis operates on a request/response pattern where commands are shared on the same socket by sending multiple requests and expecting multiple responses. Most of the time responses are read after the request is sent. For example, there could be two requests to get the key “money:*” for two different users with jack having $200 and john having $100.

Picture of normal redis usage flow.

When things work .

A bug in the client caused it, during error conditions, to reuse connections between calls without draining the $200 from the request for “GET money:jack”. In that case, the flow would look like this.

Data during a race condition

When bugs happen

The request for “GET money:jack” could fail with an error before reading $200, while the request for “GET money:john” would return the $200 that was intended for the “GET money:jack” request and the “$100” response would get orphaned. The request for “GET money:john” has no idea there was any failure or that the answer it sees, $200, is incorrect. This can happen even when the requests for money:jack and money:john are across threads and incoming RPC calls, due to transparent sharing of connections inside the redis client.

How serious a vulnerability this is depends on what you use redis for. For us, it was pretty bad. All code can have bugs, but it’s important to investigate why this bug happened. The root of this bug is how the library uses interfaces.

There are three core rules to using interfaces correctly.

  1. When you create an interface, document the contract you expect with the interface.
  2. When you use an interface, depend upon only the documented contract and not an implementation of the interface.
  3. When you implement an interface, you must implement the documented contract.

The go-redis client broke some variant of all three of these rules which both allowed this bug to exist and made it difficult to discover.

Rule 1: Document the contract Link to heading

When you create an interface, document the contract you expect with the interface.

The redis client depends upon a pool abstraction copied below.

type Pooler interface {
    NewConn() (*Conn, error)
    CloseConn(*Conn) error

    Get() (*Conn, error)
    Put(*Conn)
    Remove(*Conn)

    Len() int
    IdleLen() int
    Stats() *Stats

    Close() error
}

The Pooler interface has no documentation or contract explained at all. On the surface you think a pool concept is common enough to not need documentation. Unfortunately, there are many aspects that are open to interpretation.

  1. Is the Pooler expected to be thread safe?
  2. Can a Conn be used after a call to Remove?
  3. Can a Conn be used after a call to Put?
  4. Can you Put a connection into the pool that wasn’t returned from Get?
  5. Does a connection need to be Removed after you call CloseConn?
  6. Can the outstanding connections of a pool be used after Close is called?
  7. Len is the length of what?
  8. Can a Removed Conn be re-returned by NewConn?

Rule 2: Depend upon the contract Link to heading

When you use an interface, depend upon only the documented contract and not an implementation of the interface.

By not having a contract, go-redis de facto depends upon an implementation. This confusion can make it difficult to see where a bug happens.

When go-redis is done with a connection, it releases it.

func (c *baseClient) releaseConnStrict(cn *pool.Conn, err error) {
    if c.limiter != nil {
        c.limiter.ReportResult(err)
    }

    if err == nil || internal.IsRedisError(err) {
        c.connPool.Put(cn)
    } else {
        c.connPool.Remove(cn)
    }
}

The problem is what does it mean to Remove a connection? The function releaseConnStrict is used assuming Remove works like the implementation of ConnPool: one implementation of the Pooler interface.

func (p *ConnPool) Remove(cn *Conn) {
    p.removeConn(cn)
    p.freeTurn()
    _ = p.closeConn(cn)
}

Here a connection is closed when it is removed from the pool.

Rule 3: Implement the contract Link to heading

When you implement an interface, you must implement the documented contract.

Go-redis contains a SingleConnPool type that also implements the Pooler interface.

type SingleConnPool struct {
    cn *Conn
}

func (p *SingleConnPool) Remove(cn *Conn) {
    if p.cn != cn {
        panic("p.cn != cn")
    }
}

func (p *SingleConnPool) Get() (*Conn, error) {
    return p.cn, nil
}

On the surface, this is a very reasonable way to implement a connection pool that is intended to contain a single connection. Inspecting this code makes it very difficult to identify there is a bug here. Unfortunately, the contract for Remove, from use, seems to be that Remove should close the connection. SingleConnPool does a reasonable thing on Remove but fails to implement the implicit contract of Pooler.

The important point here is that none of these by themselves look like a bug. Looking at any one of these it’s not obvious that Pooler, releaseConnStrict, or SingleConnPool are doing anything wrong. It’s only when go-redis wraps a connection into a SingleConnPool and uses it in a call stack that normally works with a ConnPool type, that the bug exposes itself.

Maintainable abstraction requires strict documentation of contracts

Avoiding interface misunderstanding Link to heading

Luckily there are things you can do to avoid similar bugs.

Avoid abstraction bugs by avoiding abstraction Link to heading

Go’s implicit interfaces make it easy to refactor code into abstraction when needed. I talk about this more in my post Preemptive Interface Anti-Pattern in Go. Abstraction makes it difficult to identify bugs by hiding implementation details behind function calls that have no direct implementation. Directly calling the function you need makes it clear to the reader what behavior is expected.

Revisit contracts as you use your abstraction Link to heading

It’s a bit impractical to think of every intended behavior you want with your abstraction before you practically use it. Often you’ll have the idea you want a Pooler abstraction but not clearly have a picture of the details you will depend upon until you use the interface. That’s fine!

Make a best attempt at a contract when you create your Pooler interface. Then, as you practically use the interface revisit your original contract.

For example, you may not know what contract you want at first, so Pooler may start with something simple

type Pooler interface {
    // Remove a Conn from the pool.  Expects a Conn previously returned by Get
    Remove(*Conn)
}

Then you start using Pooler and notice that Remove should mean that a Conn is no longer valid.

    if err == nil || internal.IsRedisError(err) {
        c.connPool.Put(cn)
    } else {
        c.connPool.Remove(cn)
    }

So you go back to your Pooler interface and add this contract.

type Pooler interface {
    // Remove a Conn from the pool.  Expects a Conn previously returned by Get.
    // A removed connection should be considered invalid and no longer returned by
    // Get
    Remove(*Conn)
}

This is easy to do when your abstraction is private or your library is still in the development phase. However once you publish your interface, changing a contract can have serious unintended consequences.

Unit test contracts Link to heading

Unit tests aren’t just for implementations: they can also be used for contracts. These tests become helpers that other implementations can use to verify their contract. Here is an example for the above contract.

func testRemove(t *testing.T, p Pooler) {
    c1, err := p.Get()
    if err != nil {
        return
    }
    p.Remove(c1)
    c2, err := p.Get()
    if err != nil {
        return
    }
    if c1 == c2 {
        t.Error("failed get/remove/get contract")
    }
}

func testNewConn(t *testing.T, p Pooler) {
    c1, err := p.NewConn()
    if err != nil {
        return
    }
    c2, err := p.NewConn()
    if err != nil {
        return
    }
    if c1 == c2 {
        t.Error("expected distinct connections from NewConn")
    }
}

We can use these contract tests with our implementations.

func TestSingleConnPoolContract(t *testing.T) {
    p := SingleConnPool{/* .... */}
    testRemove(t, p)
    testNewConn(t, p)
}

func TestConnPoolContract(t *testing.T) {
    p := ConnPool{/* .... */}
    testRemove(t, p)
    testNewConn(t, p)
}

Interface Contracts Matter Link to heading

Abstraction is a necessary part of code reuse. If you want to use interfaces in a complex codebase in a maintainable way, remember to follow the three core rules.

  1. When you create an interface, document the contract you expect with the interface.
  2. When you use an interface, depend upon only the documented contract and not an implementation of the interface.
  3. When you implement an interface, you must implement the documented contract.