The Flyweight Pattern in Go


Definition: In computer programming, flyweight is a software design pattern. A flyweight is an object that minimizes memory usage by sharing as much data as possible with other similar objects; it is a way to use objects in large numbers when a simple repeated representation would use an unacceptable amount of memory.

In the following, I’ll demonstrate the Flyweight Pattern with two examples in Go. First, I’ll optimize two memory-based caches that rely on the same underlying data, and then I’ll optimize another cache that contains repetitive data. Let’s get to it. All examples are written in Golang, gists are provided in the appendix.

Preamble

For both examples, consider this type:

// Player represents a player in a game.
type Player struct {
    ID uint32
    Handle, Name, Country string
    Games []string
}

Players on a gaming platform. This is a bit contrived, but fairly straightforward for our purposes. Example:

playerOne := Player {
    ID: 1,
    Handle: "Razzor",
    Name: "Dave McTesty",
    Country: "US",
    Games: []string{"Fire Emblem", "Smash Bros Ultimate"},
}

First Example: One Cache, Three Lookup Methods

Assume that we wish to cache every single Player in our app in memory for quick access, and we need to be able to look them up by ID, Handle and country Code. A first (naive) implementation might look like this:

var (
    playerIDMap      = make(map[uint32]Player)
    playerHandleMap  = make(map[string]Player)
    playerCountryMap = make(map[string][]Player)
)

These are filled with data from the database by looping over the individual table rows and, for each one, populate an instance of Player with the fields from each row and adding it to each map.

Maps are used for the lookups because they are efficient: when you already have the key, you don’t need to loop over the entire data structure every time - an O(n) operation - but can do a direct lookup in constant time, O(1). This is an important optimization when there are thousands or millions of players.

Let’s take a quick look at the lookup functions for each one of the above:

// FindPlayerByID returns the player with the given ID, if exists.
// Otherwise, an empty Player is returned.
func FindPlayerByID(ID uint32) Player {
    if p, ok := playerIDMap[ID]; ok {
        return p
    }
    return Player{}
}

// FindPlayerByHandle returns the player with the given Handle,
// if exists. Otherwise, an empty Player is returned.
func FindPlayerByHandle(handle string) Player {
    if p, ok := playerHandleMap[handle]; ok {
        return p
    }
    return Player{}
}

// FindPlayerByCountry returns all players within the given country,
// if exists. Otherwise, an empty slice is returned.
func FindPlayerByCountry(code string) []Player {
    if ps, ok := playerCountryMap[code]; ok {
        return ps
    }
    return []Player{}
}

Fairly straightforward.

But there’s a problem: each player exists three times, taking up three times more memory than necessary. Let’s go back to our cache declarations and fix that by appointing one of the maps as the “base of truth”, or “point of reference”, if you will. The declaration that uses the Player ID seems appropriate for two reasons: because the ID serves as a primary key, and because an uint32 takes up much less space than the Player’s Handle, a string of arbitrary length. Let’s refactor to:

var (
    playerIDMap      = make(map[uint32]Player)
    playerHandleMap  = make(map[string]uint32)   // ref: playerIDMap
    playerCountryMap = make(map[string][]uint32) // ref: playerIDMap
)

Now, playerCountryMap and playerHandleMap both serve as references to playerIDMap. Obviously, we’ll need to refactor our lookups functions a bit:

// FindPlayerByID returns the player with the given ID, if exists.
// Otherwise, an empty Player is returned.
func FindPlayerByID(ID uint32) Player {
    if p, ok := playerIDMap[ID]; ok {
        return p
    }
    return Player{}
}

// FindPlayerByHandle returns the player with the given Handle,
// if exists. Otherwise, an empty Player is returned.
func FindPlayerByHandle(handle string) Player {
    if id, ok := playerHandleMap[handle]; ok {
        return FindPlayerByID(id)
    }
    return Player{}
}

// FindPlayerByCountry returns all players within the given country,
// if exists. Otherwise, an empty slice is returned.
func FindPlayerByCountry(code string) []Player {
    if ids, ok := playerCountryMap[code]; ok {
        ps := make([]Player, len(ids))
        for i, id := range ids {
            p := FindPlayerByID(id)
            ps[i] = p
        }
        return ps
   }
    return []Player{}
}

Note that the first function, FindPlayerByID, hasn’t changed. The second, FindPlayerByHandle, will now retrieve the Player ID instead of a Player and proceed to call FindPlayerByID to finalize the lookup. Due to the two lookups, the complexity is now O(2) rather than O(1), which is still constant time, and the difference in performance between these two approaches can therefore be considered negligible.

The third function is a bit more interesting. We create a slice initialized with the size of the ID slice we get from the first lookup, and then loop over the ID’s while looking them up individually and adding the Player to the slice. We don’t need to check for the presence of the ID, because we are in control of the cache; we know it exists. Due to the loop, the complexity is O(n+1), or just O(n). This is worse than a direct lookup, so you have two strategies here: 1), continue with this implementation, which is a bit slower but saves memory, or 2), stick to the original implementation which takes more memory but has a faster lookup.

You can - and should - mix and match these two strategies on a case-by-case basis; always choose the strategy that fits the situation.

As a final note for this example, you could also use pointers for reference instead of uint32’s. The concept would be the same, and you could stick to always using pointers that reference the original structs rather than doing several, interdependent lookups. But keep in mind that having a cache with thousands - or millions - of pointers also increases the number of pointers that the GC has to manage correspondingly, which can negatively affect GC pause times. Values are preferable when you don’t need to change the data.

Second Example: Data Replication

Let’s assume a scenario where we’re still caching Players, but for this particular part of the app, we can say with absolute certainty that they have all played exactly the same games; perhaps because one of the video game publishers, CrashyGames, Inc., has requested a locally manageable list of all the players on the platform that have played all their games - perhaps so they can apologize to them about the questionable quality of their games?

We can immediately see that we can save a lot of memory by not storing the field Games for each Player. Since we still need that field to be part of the datatype, we’ll store it separately, and then add it to an ad-hoc Player when one is requested. For this purpose, we’ll need a second datatype, which we’ll call cachedPlayer:

// cachedPlayer is a Player, but without the Games field.
type cachedPlayer struct {
    ID uint32
    Handle, Name, Country string
}

You may have noticed that the name starts with a lowercase “c”; we won’t be exporting this type, but only using it internally, in the scope of our cache.

This is our cache, and our games list:

var games = []string{"Flappy Fish", "Ducks'n'Dogs", "Backflip Pro"}

var crashyGamesPlayers map[uint32]cachedPlayer

This time, we’ll just look them up directly by ID, which is represented by the uint32. But CrashyGames has millions of players, making this optimization worthwhile. So let’s dive into some code. It doesn’t take much work. First, we’ll declare a convenience method on cachedPlayer that converts it into a regular Player:

// convertWith returns the Player that matches cachedPlayer,
// with game titles attached.
func (c cachedPlayer) convertWith(games []string) Player {
    return Player{
        ID: c.ID,
        Handle: c.Handle,
        Name: c.Name,
        Country: c.Country,
        Games: games,
   }
}

Now we can implement the cache lookup:

// FindPlayerByID returns the player with the given ID, if exists.
// Otherwise, an empty Player is returned.
func FindPlayerByID(ID uint32) Player {
    if cp, ok := crashyGamesPlayers[ID]; ok {
        // cachedPlayer found.
        return cp.convertWith(games) // Using globally cached games.
    }
    return Player{}
}

I’ve kept the method cachedPlayer.convertWith pure, taking extra fields as parameters, while FindPlayerByID is a closure over the variable games, which is just stored in a package-level variable, but you can implement it any way you like.

And voilà! Now we store everything exactly once. You could even take it to the next level and save the Player names in a separate data structure if there are many repetitive names. But beware of the overhead of performing the name comparisons, and of growing the data structure with names. Do it only when there are real savings to be had.

Conclusion

The Flyweight Pattern is your friend, but as with most things, there’s a balance to be struck; optimize too much, and you introduce undesirable complexity - optimize too little, and your app becomes clunky and slow.

Consider the size of your data structure, and the impact on code readability and on the maintainability of your app before deciding to use the Flyweight Pattern. And remember the old saying by Donald Knuth:

Premature optimization is the root of all evil.

Now go optimize your code!

This article was also posted on Medium.

Appendix

Example One and Two after applying the Flyweight Pattern: