Python in the Enterprise Part 1: Dataclasses
Creating a class that is just a simple container for data can be quite verbose. Dataclasses are a solution to this verbosity.
Usually, when we want to create a class in Python, we need to:
- define the constructor
- define the parameter in the constructor
- set the instance variable in the constructor
- implement
__repr__
- implement
__eq__
- implement
__hash__
Dataclasses provide us with a much more concise syntax for doing the above (and more).
Basic syntax
Dataclasses consists of fields. Fields are class variables that have a type annotation.
from dataclasses import dataclass
@dataclass
class Block:
body_format: str
body: str
b = Block(body_format="markdown", body="blah blah blah")
print(b.body)
blah blah blah
Fields can have default values.
@dataclass
class Block:
body_format: str = "markdown"
body: str = "this is the body"
b = Block()
print(b.body)
this is the body
Fields with default values should come after fields without values.
# This is okay:
@dataclass
class Block:
body_format: str
body: str = "this is the body"
# Error: TypeError: non-default argument 'body_format' follows default argument
@dataclass
class Block:
body: str = "this is the body"
body_format: str
If the field’s default value is mutable e.g. list, dict, set etc.. it should be defined using dataclasses.field
.
from typing import List
from dataclasses import field
@dataclass
class Page:
blocks: List[Block] = field(default_factory=list)
# Error: ValueError: mutable default <class 'list'> for field blocks is not allowed: use default_factory
@dataclass
class Page:
blocks: List[Block] = []
Adding Logic During Creation Time
Since dataclasses implement __init__()
, to implement logic at instance creation time we need to override a different method - __post_init__()
.
@dataclass
class Block:
body_format: str
body: str
rendered: str = None
def __post_init__(self):
if self.body_format == "markdown":
import markdown2 # pip install markdown2
self.rendered = markdown2.markdown(self.body)
else:
raise Exception("Invalid body_format")
block = Block(body_format="markdown", body="*Hello*")
print(block.rendered)
<p><em>Hello</em></p>
repr()
__repr__()
is automatically implemented on data class instances to create a string representation of your instance.
repr(block)
"Block(body_format='markdown', body='*Hello*', rendered='<p><em>Hello</em></p>\\n')"
==
__eq__()
is automatically implemented on data classes for us to compare dataclass instances.
Block(body_format="markdown", body="blah") == Block(body_format="markdown", body="blah")
True
Block(body_format="markdown", body="blah") == Block(body_format="markdown", body="not blah")
False
asdict()
dataclasses.asdict()
can be used to convert an instance of a dataclass to a dictionary. asdict
is smart enough to handle nested dataclasses, dicts, lists and tuples.
from typing import List
from dataclasses import field
@dataclass
class Block:
body_format: str
body: str
@dataclass
class Page:
blocks: List[Block] = field(default_factory=list)
block1 = Block(body_format="markdown", body="This is block 1")
block2 = Block(body_format="markdown", body="This is block 2")
page = Page()
page.blocks.append(block1)
page.blocks.append(block2)
from dataclasses import asdict
asdict(page)
{'blocks': [{'body_format': 'markdown', 'body': 'This is block 1'},
{'body_format': 'markdown', 'body': 'This is block 2'}]}
Converting a dataclass instance to json.
import json
json.dumps(asdict(page))
'{"blocks": [{"body_format": "markdown", "body": "This is block 1"}, {"body_format": "markdown", "body": "This is block 2"}]}'
Immutable Dataclasses
Dataclasses can be made immutable by setting them as frozen.
@dataclass(frozen=True)
class Block:
body_format: str
body: str
b = Block(body_format="markdown", body="blah")
b.body = "something else"
# Error: FrozenInstanceError: cannot assign to field 'body'
__hash__()
is automatically implemented for frozen dataclass instances.
@dataclass(frozen=True)
class Block:
body_format: str
body: str
b1 = Block(body_format="markdown", body="blah")
hash(b)
8462700314862027229
b2 = Block(body_format="markdown", body="this is something else")
hash(b)
8462700314862027229
blocks = [b1, b2]
b1 in blocks
True
Making non frozen data classes hashable
Mutable (non-frozen) data classes can get __hash__()
using by setting unsafe_hash=True
.
@dataclass(unsafe_hash=True)
class Block:
body_format: str
body: str
b1 = Block(body_format="markdown", body="blah")
hash(b)
8462700314862027229