Setup Link to heading

Most cloud providers allow you to set commands to run when a host comes online. The general format for this is called cloud-init or user-data. There’s full documentation for cloud-init here. In my particular case, I had extra files I wanted to write to the host using pulumi. To make it all fit, I was gzip compressing the files then base64 encoding the result.

Example code Link to heading

Example code to do this in Python is:

import gzip
import base64

def gzip_string(input_string):
    compressed = gzip.compress(input_string.encode())
    compressed_b64 = base64.b64encode(compressed)
    return compressed_b64.decode()

original_string = """
# cloud-config
"""

instance_userdata = gzip_string(original_string)
print(instance_userdata)

Instance metadata may look something like this:

H4sIABu242YC/+NSVkjOyS9N0U3Oz0vLTOcCAIev2ZQQAAAA

The problem Link to heading

Everything worked, but every time I ran pulumi up I saw state drift in the instance metadata. It was base64+gzip, which made it difficult to understand why.

However, going back to the sample code above if you wait … 1 second, and run the code again, you’ll get a different output.

H4sIAFK242YC/+NSVkjOyS9N0U3Oz0vLTOcCAIev2ZQQAAAA

It’s subtle, but H4sIAB became H4sIAFK. This is because gzip has a header (MTIME) that includes the time the file was compressed. From the RFC:

         +---+---+---+---+---+---+---+---+---+---+
         |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
         +---+---+---+---+---+---+---+---+---+---+

Python’s gzip.compress defaults mtime to the current time, but has a mtime parameter that you can set to zero to avoid this.

def gzip_string(input_string):
    compressed = gzip.compress(input_string.encode(), mtime=0)
    compressed_b64 = base64.b64encode(compressed)
    return compressed_b64.decode()

Changing my pulumi to default mtime=0 removed the state drift.

Other IaC tools Link to heading

Terraform has a base64gzip function that does this for you code here

    var b bytes.Buffer
    gz := gzip.NewWriter(&b)
    if _, err := gz.Write([]byte(s)); err != nil {
        return ...
    }
    if err := gz.Flush(); err != nil {
        return ...
    }
    if err := gz.Close(); err != nil {
        return ...
    }
    return cty.StringVal(base64.StdEncoding.EncodeToString(b.Bytes())), nil

Reasonably, the config for gzip.NewWriter defaults to mtime=0 because empty structs are zero for Go, so terraform users don’t encounter this when using base64gzip.