Proximity of Behavior and Data

This topic contains 4 replies, has 2 voices, and was last updated by  Max Guernsey 3 years, 1 month ago.

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #28291

    Max Guernsey
    Keymaster

    Starting private. We can always promote to the public forum.

    I’ve heard from more than one person the phrase “programs acting on data” with kind of a negative connotation. I hear this especially often when someone is laying the groundwork for an argument against modeling things with data objects. That is, when making an argument that behavior should live directly on the objects that model the entities in a problem-space.

    If you remember saying that to me, I’m not singling you out or picking on you; even if you are the one who said those words verbatim.

    Anyway, my thought is this: software is programs acting on data. It’s the fundamental nature of the beast. That’s okay. We are programs acting on data, too. So there’s (hopefully) nothing wrong with it.

    I don’t think fact that something is a collection of procedures acting on data is a strong argument against a design. The implication (or, in the case of some conversations, the explication) is that this means a design is procedural.

    I don’t think that’s the essence of procedural code. At least, it’s not the essence of what is bad about procedural code. What’s bad about a procedural design is that it tends to have very little encapsulation – especially as pertains to variation.

    Procedural designs tend not to be open-closed to the addition of new variations in behavior, for instance. There tends to (but do not, necessarily) have very poor encapsulation between behaviors – especially non-varying behaviors.

    There’s a litany of design flaws in a true procedural design and almost none of them have to do with the fact that the data and the behaviors which pertain to those data are separate.

    It seems like we all agree that “data hiding” is not even the most important kind of encapsulation, even though it is often treated as the quintessential example thereof. I’m going to go a step further and argue that, for designs with a high behavior-to-entity ratio (like 2:1 or 3:1), it actually degrades sustainability to put behaviors and data in the same class.

    I’ll expand on that argument if there are any takers but this is already long enough that I maybe should write a blog entry.

    #28299

    Scott Bain
    Participant

    Modeling with data objects is certainly not inherently procedural. The only arguments I could make against data objects are:

    1) If there is behavior associated with the data/datum, and it is not here, then it must be elsewhere. So, this impedes our ability to completely encapsulate the data/datum and leads to a degree of coupling.
    2) If the behavior is needed by more than one entity, then it may become redundant if these clients implement the behavior for themselves.

    Both of these things can be overcome with good technique. For example, the behavioral object can be an inner class of the data object, in a language that supports this.

    So as long as the objections above are dealt with I have no quarrel. I think this comes down to a matter of style more than anything else. Design must be clear, logical, and maintainable. There are many ways to achieve these ends.

    • This reply was modified 3 years, 1 month ago by  Scott Bain.
    #28302

    Max Guernsey
    Keymaster

    I agree with everything you said up there, including the fact that the objections can be overcome with good techniques.

    Let me lay out some objections I have to putting behavior with data and maybe you can suggest some good techniques which overcome those objections. If they exist, then I’d agree it’s a matter of style. If there isn’t a way, I think I would want to push back a little bit more.

    • Sometimes there are a lot of behaviors which pertain to an entity, placing those behaviors in the same class as the entity can make the class quite large and non-cohesive.
    • I question the level of fullness in the encapsulation of data when an object has both data and behavior. Especially an object with several behaviors. It seems more like it’s just hiding the fact that the data aren’t encapsulated than it is actually encapsulating them.
    • I find testing behaviors that…want?…to be run in a sequence to be kind of a tangled mess when they live on the class that contains their pertinent data. It can be comparatively easy, or even trivial, when the two things are separated.

    There are a bunch of others I’ve heard – like what do you do when you suddenly want a behavior to start varying independently of the kind of data on which it acts – but I already know how to address those. (Aside: it might be fun to compile a list of “gotchas” and corresponding “get out of jail free” cards.)

    I also have one more thought which (a) isn’t really an objection and (b) I think sits more closely to the border between style choices and impactful design decisions. To be is a verb. A class that models something is already doing one thing; it has one responsibility. Adding another responsibility would mean that it has more than one responsibility. See where I’m going with that? 🙂

    For designs with low behavior-to-entity ratios, I think I agree. It can at least look like a style decision. So long as the ratio doesn’t change, I’d agree that it can be treated as a matter of style. I’m really bad at predicting where and when those ratios will suddenly spike.

    My worry is that it’s like the first deployment of a database design – every method looks like it works just fine. It’s not until you need to make changes that the folly of certain design decisions becomes clear.

    #28304

    Scott Bain
    Participant

    Sometimes there are a lot of behaviors which pertain to an entity, placing those behaviors in the same class as the entity can make the class quite large and non-cohesive.

    Cohesion has primacy, so if an object has become overly bloated with behaviors it should be broken up. But how far should you take this? To me a good bit of guidance is: can you write the test(s) you want (those that have business value), and can you avoid writing the test(s) you don’t want (testing against side effects to do a poorly-cohesive design).

    I question the level of fullness in the encapsulation of data when an object has both data and behavior. Especially an object with several behaviors. It seems more like it’s just hiding the fact that the data aren’t encapsulated than it is actually encapsulating them.

    I suppose one could make a distinction between internal encapsulation and external encapsulation, just as one can with dependencies. When data is packaged with function, the data is not encapsulated from the function but it is encapsulated from outside entities, as an example of what I mean. As with all things, it is a matter of balance.

    I find testing behaviors that…want?…to be run in a sequence to be kind of a tangled mess when they live on the class that contains their pertinent data. It can be comparatively easy, or even trivial, when the two things are separated.

    I’d like to think about this a bit, but my initial reaction is that sequence should be encapsulated like anything else, and thus the test(s) should be decoupled from it. to ensure a sequence is correct is a factory issue, and the test of the factory would be the only test exposed to it. But, as I said, I think I need to think more about this.

    There are a bunch of others I’ve heard – like what do you do when you suddenly want a behavior to start varying independently of the kind of data on which it acts – but I already know how to address those. (Aside: it might be fun to compile a list of “gotchas” and corresponding “get out of jail free” cards.)

    I think I agree with Ken’s “prefactoring” view, in that I always ask “if I’m wrong here, what’s the refactor?” If is seems trivial then I avoid the prediction game. If it does not, then I might be more picky.

    To be is a verb.

    Certainly. But even to exists implies action. I think therefore I am. Considering it that way, maybe this whole thing is less a question of “should data objects have behavior” and more “what kind of behavior belongs in a data object?” Maybe we can better define “behavior.” We might still disagree, but it might be a different disagreement.

    For designs with low behavior-to-entity ratios, I think I agree. It can at least look like a style decision. So long as the ratio doesn’t change, I’d agree that it can be treated as a matter of style. I’m really bad at predicting where and when those ratios will suddenly spike.

    My worry is that it’s like the first deployment of a database design – every method looks like it works just fine. It’s not until you need to make changes that the folly of certain design decisions becomes clear.

    Again, it’s a question of how difficult it will be to refactor at a later time. We all know practices that help address that question (encapsulating constructors, programming by intention, etc…).

    I think the database question may be something of a false dichotomy since, as I learned from you, databases don’t fit the evolutionary paradigm the way systems design does.

    #28307

    Max Guernsey
    Keymaster

    I’d like to think about this a bit, but my initial reaction is that sequence should be encapsulated like anything else, and thus the test(s) should be decoupled from it. to ensure a sequence is correct is a factory issue, and the test of the factory would be the only test exposed to it. But, as I said, I think I need to think more about this.

    No problem. To be clear, I’m not talking about the sequence, itself. I’m talking about a dependency between behaviors that can give rise to the need for a sequence; E.g., you have to open a connection before you can transmit data on it. That’s easier to test if everything is broken apart than it is if it’s all tangled together.

    As an aside: the sequence, itself, should also be encapsulated. The sequences end up being at least as easy to test when behaviors are isolated from data as they would be when behavior and data are in close proximity.

    I think I agree with Ken’s “prefactoring” view, in that I always ask “if I’m wrong here, what’s the refactor?” If is seems trivial then I avoid the prediction game. If it does not, then I might be more picky.

    So do I. In fact, I think that’s the principle on which my inquiry is based. The refactoring to snaggle things together is always cheaper than teasing them apart.

    I guess the real foundation of my position is a close cousin, though: I don’t think you’re very likely to need to refactor away from a separated design in the first place whereas it seems more or less inevitable that you will need to do so when related behaviors are clustered more tightly.

    Speaking of foundations, there’s also a matter of “psychohistory” involved in my position. It’s not just how likely are you to need to perform a refactor and what will the cost be when the need arises. There’s also the question of whether or not people actually will do a refactor when the time comes.

    My experience leads me to believe people shy away from solving that particular problem, whether or not it should be solved becomes irrelevant in that case. It was not and I have my guesses as to why but they are only conjecture and probably should be floating across the rim of a scotch glass, rather than committed to a medium like this.

    It seems like the factors to consider are:

    1. What’s the cost, now?
    2. What’s the chance I’ll have to go to the other way?
    3. What’s the cost if I have to change?
    4. What’s the likelihood that the change will occur when shown to be necessary?
    5. What’s the cost if the change should have occurred but didn’t?

      For this problem, my experience says:

    1. Low for either design choice.
    2. Extant for the data-with-behavior designs, negligible for the everything-separated designs.
    3. Low for either design choice.
    4. Low for the data-with-behavior desigs, no basis for an estimate for the everything-separated designs.
    5. Catastrophic for the data-with-behavior designs, no basis for an estimate for the everything-separated designs.

    Maybe we can better define “behavior.” We might still disagree, but it might be a different disagreement.

    Yes, we can better define it. I doubt that would expose that the disagreement is something other than what we think it is. I’m not even sure there is a disagreement. I’m just exploring the other (data-with-behaviors) way of thinking the only way I know how.

    I think the database question may be something of a false dichotomy since, as I learned from you, databases don’t fit the evolutionary paradigm the way systems design does.

    I don’t think I presented an either-or option that I couldn’t defend. I think you mean a false analogy. 😉

    That said, I’m not sure it is a false analogy. It wasn’t meant to be. What I was trying to express is that there might (and, therefore, there might not) be an apt analogy there.

    To you’re point, you’re right that database designs evolve but databases, themselves, do not as opposed to software deployments which do evolve. I wasn’t trying to play off of that property. I was talking about the more fundamental property of making a bad decision that gets you in hot water later.

    Yes, I agree that software has no irreversible decisions but there are decisions that can be effectively irreversible by way of a prohibitively high cost of reversal. I think those happen in design decisions more often than we want to believe.

    An example with which I’m sure you’d agree is the public constructor. It’s reversible. Even if you’ve deployed an API to the wild, with enough time and effort, you can switch over to forcing a GetInstance method (or somesuch) to be called. The cost of doing so can be so high that people won’t pay it – even if they know it will pay off in the long run.

    We have clients that have gotten themselves into this position today. In case this thread is ever made public, I won’t name them. I think you know who they are, though.

    • This reply was modified 3 years, 1 month ago by  Max Guernsey.
    • This reply was modified 3 years, 1 month ago by  Max Guernsey. Reason: duplicate "as an aside:" clauses
Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.