Python in the Enterprise Part 1: Dataclasses

Creating a class that is just a simple container for data can be quite verbose. Dataclasses are a solution to this verbosity.

Usually, when we want to create a class in Python, we need to:

Dataclasses provide us with a much more concise syntax for doing the above (and more).

Basic syntax

Dataclasses consists of fields. Fields are class variables that have a type annotation.

from dataclasses import dataclass

@dataclass
class Block:
    body_format: str
    body: str

b = Block(body_format="markdown", body="blah blah blah")
print(b.body)
blah blah blah

Fields can have default values.

@dataclass
class Block:
    body_format: str = "markdown"
    body: str = "this is the body"

b = Block()
print(b.body)
this is the body

Fields with default values should come after fields without values.

# This is okay:
@dataclass
class Block:
    body_format: str
    body: str = "this is the body"
# Error: TypeError: non-default argument 'body_format' follows default argument
@dataclass
class Block:
    body: str = "this is the body"
    body_format: str

If the field’s default value is mutable e.g. list, dict, set etc.. it should be defined using dataclasses.field.

from typing import List
from dataclasses import field

@dataclass
class Page:
    blocks: List[Block] = field(default_factory=list)
# Error: ValueError: mutable default <class 'list'> for field blocks is not allowed: use default_factory
@dataclass
class Page:
    blocks: List[Block] = []

Adding Logic During Creation Time

Since dataclasses implement __init__(), to implement logic at instance creation time we need to override a different method - __post_init__().

@dataclass
class Block:
    body_format: str
    body: str
    rendered: str = None

    def __post_init__(self):
        if self.body_format == "markdown":
            import markdown2 # pip install markdown2
            self.rendered = markdown2.markdown(self.body)
        else:
            raise Exception("Invalid body_format")

block = Block(body_format="markdown", body="*Hello*")
print(block.rendered)
<p><em>Hello</em></p>

repr()

__repr__() is automatically implemented on data class instances to create a string representation of your instance.

repr(block)
"Block(body_format='markdown', body='*Hello*', rendered='<p><em>Hello</em></p>\\n')"

==

__eq__() is automatically implemented on data classes for us to compare dataclass instances.

Block(body_format="markdown", body="blah") == Block(body_format="markdown", body="blah")
True
Block(body_format="markdown", body="blah") == Block(body_format="markdown", body="not blah")
False

asdict()

dataclasses.asdict() can be used to convert an instance of a dataclass to a dictionary. asdict is smart enough to handle nested dataclasses, dicts, lists and tuples.

from typing import List
from dataclasses import field

@dataclass
class Block:
    body_format: str
    body: str


@dataclass
class Page:
    blocks: List[Block] = field(default_factory=list)

block1 = Block(body_format="markdown", body="This is block 1")
block2 = Block(body_format="markdown", body="This is block 2")
page = Page()
page.blocks.append(block1)
page.blocks.append(block2)
from dataclasses import asdict
asdict(page)
{'blocks': [{'body_format': 'markdown', 'body': 'This is block 1'},
      {'body_format': 'markdown', 'body': 'This is block 2'}]}

Converting a dataclass instance to json.

import json
json.dumps(asdict(page))
'{"blocks": [{"body_format": "markdown", "body": "This is block 1"}, {"body_format": "markdown", "body": "This is block 2"}]}'

Immutable Dataclasses

Dataclasses can be made immutable by setting them as frozen.

@dataclass(frozen=True)
class Block:
    body_format: str
    body: str

b = Block(body_format="markdown", body="blah")
b.body = "something else"

# Error: FrozenInstanceError: cannot assign to field 'body'

__hash__() is automatically implemented for frozen dataclass instances.

@dataclass(frozen=True)
class Block:
    body_format: str
    body: str

b1 = Block(body_format="markdown", body="blah")
hash(b)
8462700314862027229
b2 = Block(body_format="markdown", body="this is something else")
hash(b)
8462700314862027229
blocks = [b1, b2]
b1 in blocks
True

Making non frozen data classes hashable

Mutable (non-frozen) data classes can get __hash__() using by setting unsafe_hash=True.

@dataclass(unsafe_hash=True)
class Block:
    body_format: str
    body: str

b1 = Block(body_format="markdown", body="blah")
hash(b)
8462700314862027229

Comments

comments powered by Disqus