Unmarshaling JSON in Go: The weird parts

Quick, what does this program print?

package main

import (
	"encoding/json"
	"fmt"
)

func main() {
	content := []byte(`{
	"fieldone":"zero",
	"field_one":"one",
	"fiELD_one":"two",
	"field_One":"three",
	"field2":123
}`)
	type d0 struct {
		Fieldone  string
		Field1    string
		Field_One string
	}
	type d1 struct {
		Fieldone  string `json:"field_one"`
		Field1    string `json:"fiELD_one"`
		Field_One string `json:"field_One"`
	}

	type d2 struct {
		Fieldone  string `json:"field_one"`
		Field1    string `json:"fiELD_one"`
		Field_One string `json:"field_one"`
	}

	type d3 struct {
		Fieldone  string `json:"field_one"`
		Field1    string `json:"fiELD_one"`
		Field_One string
	}

	dZero := &d0{}
	dOne := &d1{}
	dTwo := &d2{}
	dThree := &d3{}
	_ = json.Unmarshal(content, dZero)
	fmt.Println(dZero)
	_ = json.Unmarshal(content, dOne)
	fmt.Println(dOne)
	_ = json.Unmarshal(content, dTwo)
	fmt.Println(dTwo)
	_ = json.Unmarshal(content, dThree)
	fmt.Println(dThree)
}

> go run main.go
&{zero  three}
&{one two three}
&{ three }
&{three two }

What the heck!? What’s going on with &{zero three}? Go uses a variety of hints to try and map JSON keys to struct names, and the rules, while deterministic, aren’t always intuitive and can lead to some strange results.

Let’s work out the rules from most specific to most general.

Try to map the JSON key directly to a struct tag, with case sensitivity. Call this “strict” matching.
Try to map the JSON key directly to a struct tag, without case sensitivity. Call this “loose” matching.
Try to map the JSON key to a field name, case sensitively.
Try to map the JSON key to a field name, case insensitively.

If you look back through the examples, you can see how this works, with one wrinkle: duplicated struct tags. That’s part of what causes &{ three }: Fieldone and Field_One both have the same struct tags, so neither of them wins and the data is discarded. Hm. But why is the value three and not two? Surely "fiELD_one" is a better match than "field_One"?

The other tricky part is key ordering, and case sensitive matching. The JSON is processed a key at a time, mapped into struct fields. Whether a struct field was previously filled or not, doesn’t matter. So let’s look at the d2 struct:

type d2 struct {
	Fieldone  string `json:"field_one"`
	Field1    string `json:"fiELD_one"`
	Field_One string `json:"field_one"`
}

We already know the first and last fields won’t get any data because they have the same struct tag. The data is deserialized in this order:

"field_one":"one",
"fiELD_one":"two",
"field_One":"three",

ALL of these match with "fiELD_one", and for each key that’s the field that matches best. If you shuffle the order of these keys, you’ll get a different outcome.

"field_One":"three",
"fiELD_one":"two",
"field_one":"one",

gives us &{ one }

And this is just with strings! What if you create a struct with two fields of different types that could be mapped to the data?

type d struct {
	S string  `json:"field_one"`
	F float64 `json:"field_One"`
}

try unmarshalling this:

{
  "field_ONE": 123.0
}

and you’ll get an error:

json: cannot unmarshal number into Go struct field d.field_one of type string

The code isn’t kind enough to check for other fields that might have a field name that matches “loosely” AND has the right type, it just takes the first “loose” match it can find.

Well, fine, but how often do problems like this actually happen? All you have to do is make sure that all the fields in your struct have different struct tags, right?

Right?

Yes, that’s true. But it’s not always obvious if you have unique struct tags, or struct tags that are unique in the “loose” sense. Consider struct embedding:

type A struct {
	Field float64 `json:"field"`
}

type B struct {
	Field int `json:"field"`
}

type C struct {
	A
	B
	
	Field string `json:"field"`
}

That’s a perfectly legal struct - each of the Fields is namespaced out of each other’s way. But that’s not the way the JSON decoder sees it. Try and deserialize into this and you’ll get an empty struct every time.

So, some lessons:

Normalize your JSON input, if that works with your API semantics. All lower case is just fine, keeping _ and -. This can help with unexpected “loose” matching.
Always set JSON struct tags for ser/des types. This also helps control “loose” matching and makes sure the data ends up in the field you expect it to go to.
Avoid struct embedding for ser/des types. You never know when an embedded struct could change under you. It’s “spooky action at a distance” and you should try to avoid it.

Unmarshaling JSON in Go: The weird parts

Some tricky bits