Think inside the Box
I actually intended to start sending out issues of this newsletter from Monday and I’m 3 days late, and I actually wanted to write about something that would have been a much longer article than this delay affords, so this topic I’m writing about is the result of the procrastination monkey jumping from topic to topic, in a bid to reduce any further delays in sending out this week’s article resulting in me learn about something extremely interesting -
Boxes - https://github.com/cdgriffith/Box
It’s an spin on the way we deal with the humble dictionary, especially aimed at retrieving items as if they were attributes - independent of how nested they are.
For example, consider a dictionary like this -
books1 =
{"A Book You Read A Long Time Ago":{"title":"The Count of Monte Cristo","pages":782},
"A Book Published in 2010" : {"title":"The Way of Kings","pages":800},
"A Book On your Shelf" : {"title":"Fluent Python","pages":600}}
Now there’s a lot wrong with this. Not the least of which is that there probably needn’t have been a dictionary like this. Ideally, data modellers would prefer to fashion the dictionary to be more meaningful - something along the lines of -
books2 =
[{"desc":"A Book You Read A Long Time Ago",
"name":"The Count of Monte Cristo",
"pages":782},
{"desc":"A Book Published in 2010",
"name":"The Way of Kings",
"pages":800},
{"desc":"A Book On your Shelf" ,
"name": "Fluent Python",
"pages":600}]
Obviously the second dictionary books2
looks a lot cleaner with more metadata and a path forward to converting this into a Pandas Dataframe if necessary. But not all data is presented in this format and it’s possible we don’t have the luxury of converting provided data into understandable format. That leaves us with only one option - handling data the way it is provided.
So assume, we only have books1.
How do we access the elements? If we wanted to access the name of the book we read a long time ago -
books1["A Book You Read A Long Time Ago"]["title"]
would give us “The Count of Monte Cristo”.
This is where the package Box comes in.
🧪 Installed as :
pip install python-box[all]
🧪 Imported as :
from box import Box
A dictionary can be converted to a Box object by passing the dictionary object as an argument to Box like so - book_box = Box(books1)
From the outside, book_box has all the markings of a dictionary -
But when you try to access the keys :
Things get extremely interesting. If you’ve kept track, you’ll notice that all the keys with spaces got replaced by keys that are not only replaced with strings with underscores but are now accessible as attributes. Resulting in an access like -
Which means, the title that we earlier accessed by way of keys, can now be accessed as nested attributes -
What else can Box do?
The keys we used in books1 were not only separated by spaces, they also used CamelCase. This can be somewhat hard to manage in addition to being poor coding notation. This can be fixed using an argument ‘camel_killer_box’
book_box = Box(books1, camel_killer_box=True)
Normally, when we try to retrieve the key from a dictionary that does not exist, we get an error complaining about its non-existence, unless the retrieval is modified to be along the lines of -
books1.get("New Book","")
Where the second argument is a default value, here -
""
If we’d like to achieve similar results using the Box, we’d create it as follows -book_box = Box(books1, default_box=True, default_box_attr="NonExistent")
Here,default_box
suggests that any retrieval of keys regardless of their presence be handled and by default, the returned value is an empty string. If we wanted to override the default value, we would usedefault_box_attr,
here the value was"NonExistent".
Conversion of Boxes -
Once we’ve converted a dictionary to a Box, it can be used as an intermediary type to convert to other formats supported - dict (reverse), YAML, json, TOML, msgpack. This is a pretty powerful feature, given the fact that each of these conversions in the usual scenario would require separate libraries for each.
Example -
Dictionary to Box :
Box to YAML :
YAML to Box :
What does it cost?
Obviously the usage of a library is not without its cost over and above the primitive usage of a dictionary.
So as we can see, reading a JSON file using Box sometimes takes a little more time than if you’d read it as a simple dictionary, and this number can potentially increase with the complexity of the file in question -
Upon repeating this experiment multiple times, varying results were obtained, but with scale, Box will most likely perform poorer than dictionaries in terms of time.
Where to use Box?
So with all these pros and cons, where does one use this library?
At 2000 stars on Github, this seems like a pretty popular and consequentially, a somewhat trusted library. So it Can be used in your production code if only to improve ease of access of nested dictionary keys. From the author’s own examination, the amount of memory used by this package isn’t much compared to the primitive dictionary.
But if you Do use extensive nested dictionaries, I’d question your data model in the first place. 😁
The code used in this post has been uploaded here along with some references: https://github.com/everythingpython/post4
PS - In the usage of this package, a mention of dataclasses has to be made. A relatively new concept to Python, dataclasses makes declaration and usage of class variables extremely convenient. But I will retain further elaboration for a future post.