Never Forget

Jun 16, 2023

The fun thing about writing this series is that it’s all stuff I’ve “known” for 25 years. Now, having to explain it, I get to understand why. Understanding why is my biggest driver & here’s a big chance.

I took a step too far when I introduced bi-temporality in the first post of this series. I’m going back to the beginning & start over & take one step forward.

The Big Contradiction

We want:

The flexibility & reasoning of personal, human attention to the business problems we solve.
But also the scalability, repeatability, & efficiency of automated processing.

If we only wanted the first goal, we could just have humans dealing directly with reality. They’d see someone disabled (‘ll be using disability insurance here as an example) & they would just pay them directly.

Cloudy reality & humans interacting with it directly

It’s impossible to make this consistent or scalable. We want to distribute the benefits of disability insurance as widely as possible.

The System

And so we write The System to model disability contracts. Something is lost, sure, but something is gained (we can debate capitalism & shareholder value another time—right now we’re trying to build a system that meets its goals through being more complex than the bare minimum).

The people can interact with the system or the system can act scaleably & efficiently.

The Simplest Thing That Won’t Work…

The simplest way to represent reality in the system is to record what we believe to be true at the moment. As a simplified example, suppose we have a Customer with an Address.

customer = Customer() assert customer.address == "" customer.address = "123 Main St." assert customer.address == "123 Main St."

The implementation of this can be dirt simple (ignoring all the juicy stuff because addresses aren’t strings, use properties, yadda yadda yadda, not my point).

class Customer: def __init__(self): self.address = ""

Two Watches

When we split the world into reality out there & what we have recorded in the system, though, we have the Two Watches Problem. When you have one watch, you know what time it is. When you have two watches, you no longer know what time it is because the watches are never exactly the same & you don’t know which one is right.

We have two timelines to reconcile. Reality advances one second every second. Our system advances in fits & spurts. Sometimes we jump forward a day. Or a year. Sometimes we learn that something changed in the past & we have to repair changes we made in the meantime.

How can we handle the disconnect between reality & the system?

Save All The Data

And so we come to the first principle of business architecture—never discard information.

We are in a tradeoff space here—what is the cost of keeping information versus what is the cost of discarding information? If we discard information we save a little space, but space is cheap. We also simplify our model—there’s an address, that’s the address, end of story. However, when we discard information we are stuck when trying to explain the past. Where did you send last year’s statement? I don’t know, we have your correct address now.

If we save all historical information, we have to pay for that storage both in electrons somewhere but also in the complexity of our model. What’s the address? The address when?

For the long term economical functioning of the system, we choose to store all information that enters the system tagged by the moment it entered.

So Instead

Our interface is going to look like this.

customer = Customer() customer.set_address("123 Main", 1) customer.set_address("456 Main", 2) assert customer.get_address(1) == "123 Main" assert customer.get_address(2) == "456 Main" assert customer.get_address(3) == "456 Main"

I moved down the street as of time 2. Instead of just setting my address, I have to also include a moment as of which the system is going record the address. Instead of just fetching my address, I have to also specify the moment as of which I want my address (we’ll talk later about the options for the type & granularity of these “moments”).

We replace the single scalar address with a timestamped collection of addresses.

class Customer: def __init__(self): self._addresses = History() def get_address(self, posting): return self._addresses.get(posting) def set_address(self, address, posting): self._addresses.set(address, posting)

As per usual, we cordon off the complexity of storing timestamped values in its own object, History. Setting a value is simple—just record a (moment, event) tuple. (Yes, yes, this is kind of ugly. We’ll make it better later. Chill.)

class History: def __init__(self): self._events = [] def set(self, address, posting): self._events.append((posting, address))

The only cleverness to History is fetching. We sort the events in reverse order of posting time, then return the first event whose posting time is before (or the same as) the event’s posting time.

def get(self, posting): backwards = sorted(self._events, key=lambda x: x[0], reverse=True) for eachPosting, eachEvent in backwards: if eachPosting <= posting: return eachEvent raise KeyError()

Conclusion

There you have it, our first “non-obvious complexity but I promise it’s going to pay off” decision. Save all business information forever. Tag the information with the moment we discovered it to be true.

We’re not done yet. I said jumped too quickly to bi-temporality. With the code & concepts above we’re ready to take a short step to motivating the second dimension of time—effective dating. But that’s for the next post.

Younes

Jun 21, 2023

I can get the "How can we handle the disconnect between reality & the system?"

But one the hardest parts is figuring out: When do we really have to?

I guess that the hint is here "It’s impossible to make this consistent or scalable. We want to distribute the benefits of disability insurance as widely as possible.", right?

What can help a business figure out if they already need "scalability, repeatability, & efficiency of automated processing"? Maybe, the use case is so rare, or the market unreachable, or knowledge so insufficient that it wouldn't be profitable. Maybe, in the eXploration phase, we might want to process things manually and learn from that... or maybe that can quickly become a bottleneck.

I still can't figure out the clues our questions to ask ourselves to figure out the sweet-spot between early and late design 🤔

... but maybe this will come up later and I just have to be patient 😉

Expand full comment

4 replies by Kent Beck and others

Petter Måhlén

Jun 20, 2023

https://www.datomic.com/ is built on this principle, as far as I can tell.

21 more comments...

Software Design: Tidy First?

Discussion about this post