In my search for understanding in the world of computing I found a curious decision that was made
back in 1957 and it still propagates to this present day programming. And for now I shall consider it
a 70 year old sin in computing.
The rest of this text should not be considered an "attack" on modern programming languages and it ideologies.
But simply observations from a different perspective, there is no right or wrong answer here; Just trade offs. Ultimately
these topics must be discussed for further development of our control over the machine, minimize errors and
increase simplicity.
Back in the day when computing was done manually with 0's and 1's, as you might expect this was a tiresome method if you
wanted to do real programs. Naturally, we decided to associate human readable representations to those machine operations
to ease development, the so called assembly language. Great! Now we have all we needed right? No.
We later found out as well that a lot of operations repeat themselves or even multiple identical instructions, because
of that we created the macro assembler, simply a more complicated assembler that could accept special names
("macros") and execute X number of instructions represented by said name.
The major point of this advancement was the decoupling of linear correspondence between source code and the resulting
machine instructions. Most believe this to be an actual advantage but it gives more restrictions
than freedom.
Because programs aren't that simple. There is conditionals statements, loops and branching (you will find in assemblers this functionality to be called "jump instructions" or on other languages called "GO TO" commands).Mixing these concepts
in a naive way creates what is dominated "spaghetti code".
To reduce the "spaghetti" a new trend emerged. Modular programming; the new hot trend introducing subroutines, the concept is dividing a problem into smaller problems to reduce the "spaghetti" situation. Again, this has many
disavantages:
- Saving registers before jumping to the subroutine and restoring them afterwards
- The invisible code needed to pass parameters to and from the subroutine
- Subroutines mandate to be invoked in a certain way and even more strict how you pass data to and from them
- In consequence of the last point, to test them independently you need a special testing
program to invoke them
Note: In some languages subroutines may be called functions, methods, etc...
With all this constraints Engineers found that planning was needed, planning an entire project became the norm since
unexpected changes were/are hard to apply. But interestingly at the time Engineers didn't know Humans are not good
at predicting the future! (Include sarcasm here)
All this characteristics were popularized by languages like FORTRAN and BASIC, consequently languages like C and to our present day languages. Today, engineers still face pain points in testing, refactoring, and maintainability; perceived complexity seems to increase even as tools improve
Hopefully you already found a pattern here...
Data flow became the structural pillar of programs
Once data flow became the structural pillar of programs, a shift happened how we reason about software. Programs
started to be described of how data moves between named spaces instead of describing transformations.
Subroutines, and later functions, methods and procedures formalized this idea. Each unit of code became defined not by what it does, but by:
- what data it receives
- how that data is represented
- who owns it
- how it must be returned
But Seaker, isn't that a good thing? In a naive way it sounds good, but introduces rigidity at the very
core of the system! Behavior became secondary to data contracts!
Refactoring on a data-centric ideology
Refactoring usually is described as "changing the internal structure of code without changing it's behavior". But under data flow centric models refactoring is, sadly, rarely internal.
Changing behavior in this paradigm means:
- changing function signatures
- reshaping structures
- updating callers
- renegotiating ownership
- rewriting tests whose only job is to satisfy invocation rules
The irony is incredible:
to make code "easier to maintain", we first make it harder to change.
As systems grow, the effort required to refactor grows non-linearly, because the data flow is global, explicit and rigid.
A local idea becomes a system-wide negotiation.
The hidden cost of explicit arguments
Explicit arguments truly feel precise and readable, but they encode decisions too early.
Each argument fixes:
- what matters
- in what order
- under what representation
- and under what lifetime assumptions
The thing is... once these decisions are made, they spread and become structural. The program no longer flows.
Testing suffers too again. A subroutine cannot be tested in isolation unless it is invoked exactly as prescribed.
Behavior is inseparable from it's calling ritual.
A concrete example
In C:
#include
int compute(int x, int y, int z) {
return (x + y) * z;
}
int main() {
int a = 2, b = 3, c = 4;
int result = compute(a, b, c);
printf("%d\n", result); // 20
return 0;
}
In Forth:
\ in Forth the "arguments" to words are comments, and optional
: compute ( x y z -- result )
+ *
;
2 3 4 compute . \ prints 20
Now imagine you want to invert the arguments 'x' and 'y'.
int compute(int y, int x, int z) { // signature changed
return (x + y) * z;
}
Now all calls must be updated!
int result = compute(b, a, c) // be careful to not break the logic!!
If you forget one call or you don't update a test the program will break.
In Forth all you need to is:
: compute ( x y z -- result )
>R swap R> + *
;
And no external calls need to change!
Even in this tiny example, the cost of changing a function signature whether swapping arguments, adding
a new parameter or even replacing a function entirely is obvious.
In real programs, with dozens of calls and
dependencies, this cost multiplies quickly.
A different perspective: transformation without identity
Languages like Forth take a different approach because there are no formal arguments, no signatures in a traditional
sense and no persistent identity attached to data.
Instead there is:
- A stack
- A sequence of transformations
Note: Forth also includes variables and what not, in Forth every paradigm is possible but in it's purest way this is the
main concept
Data flow is implicit! Words in Forth (functions) describe what happens to data, not how data is passed around.
- Data does not have a name
- It does not have an owner
- It does not have a history
It simply exists long enough to be transformed and then disappears.
Logically this reduces the space of possible errors.
Rust: enforcing discipline, not changing the premise
Rust is often presented as a solution to memory safety, but it does not reject the data centric model.
On the contrary; It accepts it and imposes strict laws.
Ownership, borrowing and lifetimes are mechanisms to police data identity over time. They are truly effective,
but they also make data flow even more rigid.
Refactoring in Rust is difficult not because the language is poorly designed, but because it takes data flow
ideology seriously; and enforces it consistently.
- Safety is gained
- Plasticity is lost
This is not a flaw. It's a trade-off. And something to keep in mind as the language seems to keep getting more adoption
on larger projects
Fewer states, fewer mistakes
The more explicit the data flow is, the larger the state space of the program.
- Each reference multiplies possibilities.
- Each lifetime introduces temporal constraints
- Each mutation creates new invalid intermediate states
In Forth like systems for example:
- the observable state is small
- transformations are local
- errors surface immediately
There are fewer ways to be wrong overall.
A 70 year old assumption
For roughly 70 years, computing has been built on the assumption that:
programs should be structured around the movement and ownership of data
This assumption gave us powerful abstractions, but also increasing complexity, fragility and resistance to change.
Perhaps the real mistake was not technical, but ideological.
Programs are not static models of the world. They are processes.
And processes are often better described as transformations, not as carefully managed data pipelines.
Closing thoughts
This is in no way a call to abandon modern languages, nor a claim that one model is "correct".
It is an invitation to question a long standing belief:
that making data the center of the program necessarily leads to clarity and safety.
Sometimes it leads to rigidity.
Sometimes it creates the very errors we then spend decades trying to eliminate.