Miller: Command Line CSV File Processing

Fri, Jan 15, 2021

Often times the lowest common denominator for integrating two systems is a CSV file. Usually I would resort to scripts for processing these CSV files. However, for certain simple operations scripting could be overkill. A battle tested command line would be much more appropriate than writing a possibly bug ridden one off script.

Miller is a command line tool for CSV file processing. It can replace Python scripts for common data transformation tasks such as:

Change file delimiters
Converting between CSV variants
Adding/Removing column
Calculating Statistics
Sorting
Remove records based on some criteria

Python in the Enterprise Part 3: Unit testing with pytest

Sun, Nov 1, 2020

The unittest module, in the Python standard library, is based on JUnit (a Java library) and thus doesn’t feel very Pythonic. Pytest turns this upside down and provides a more Python developer friendly experience for writing unit tests.

Python in the Enterprise Part 2: Static Type Checking

Tue, Sep 1, 2020

Due to the size of the code bases involved and often the lack of time to implement unit test, static typing is a must-have when developing enterprise systems.

Thankfully, Python now has support for optional static type checking in the form of the mypy static type checker, and the addition of type annotations to the Python language.

In this article, I’m going to cover the features of mypy and Python type annotations that I use the most.

Kubernetes Cronjobs in 10minutes

Sat, Aug 15, 2020

A short screencast I did explaining the basic usage of Kubernetes Cronjobs.

Python in the Enterprise Part 1: Dataclasses

Sat, Aug 1, 2020

Creating a class that is just a simple container for data can be quite verbose. Dataclasses are a solution to this verbosity.

Implement File Processing Workflows in Python using Cadence

Fri, May 1, 2020

It’s common for use cases such as data science and CI/CD to need to create workflows that process large files. In this post, I’ll show how workflows such as these can be implemented in Cadence using Python.

Getting around Protobuf's Python limitations - no option python_package

Thu, Apr 30, 2020

While working on converting a Thrift codebase to GRPC, I noticed a limitation in GRPC/Protobuf. The Protobuf compiler doesn’t seem to allow me to specify the target Python package of the generated stubs.

Long Running Business Logic in Plain Old Code Part 2

Sun, Apr 26, 2020

In the previous post, we went through the problem domain that we would be implementing - a loyalty system. We also looked at implementing various Temporal components for our problem domain - workers, workflows and activities.

In this post we’ll look at how we can use signal methods and query methods to implement the “earning points” portion of the loyalty system.

Long Running Business Logic in Plain Old Code Part 1

Sat, Apr 18, 2020

Usually, when programmers are tasked with programming a long running piece of business logic (e.g. subscriptions, gamification, marketing campaigns, any customer journey) they will reach for familiar tools such as cron, message queues and manual state management with their db of choice.

In this series of posts I want to show an alternative approach - implementing long running business logic as a single long running function. Traditionally, we cannot implement anything long running as a single function because processes and machines fail and the function’s state is in volatile memory. However, what if there was a tool that allowed us to code in such a way that we could treat memory as persistent and reliable (Smalltalk programmers might be familiar with this concept)