How the CIA Ruined Programming
We’re only just now starting to recover. And we have a long way to go.
My entire adult life, since I first learned to program, I felt like Neo in the Matrix:
…there's something wrong with the world. You don't know what it is, but it's there. Like a splinter in your mind - driving you mad.
I felt that this couldn’t be right, that programming surely doesn’t have to be so hard, so tedious, so error prone. Particularly business software. Every time I had to hand-roll another HTML form, I died a little inside. Shouldn’t the computer be doing more of this for me?
The answer was there all the time.
I’ve investigated every alternative programming approach I could find. All of the approaches that showed promise were “declarative” — they let you solve problems without having to write Turing complete code. GUI design using a GUI editor like Visual Basic. Direct data manipulation using Access or FileMaker. HTML is declarative, as is CSS.
But they’re all piecemeal and narrow efforts. What I realise now that I always wanted was a more general approach to making more of programming declarative.
The tragedy is that this approach exists. It is well-known, well-studied, simple, effective, and old, insofar as such things are in our industry. This approach is the relational model, first described by E. F. Codd in 1969.
The theory underlying the relational model starts with how a small collection of simple and well-known data manipulation algorithms can be assembled into a declarative First Order Logic1 engine. This is remarkable. The theory leads not only to a simple way to implement a logic engine, but also to how to optimise its performance.
We should be using this universal approach to declarative programming everywhere. Just about every place in your code that’s a case statement or a lookup in a Map should be a much richer relational thing. But it’s not, and it’s because of the CIA.
Because SQL. Because the CIA.
I don’t actually blame anyone at IBM or Oracle or the CIA (the principle players in this story) for what happened. At the time, the decisions everyone made were reasonable.
The timeline goes something like this:
1969: Codd publishes A Relational Model…
1974: Chamberlain and Boyce, working at IBM, first describe SEQUEL, which we now know as SQL
1979 IBM starts producing products implementing SEQUEL.
1979 Relational Software (now ORACLE) produces Oracle, aiming to sell it to the CIA, the Navy, and other US government agencies
The name Oracle came from a CIA code name. And targetting US government agencies appears to have been based at least in part on political considerations of Larry Ellison, the founder of Oracle.
Again, at the time it was developed, SEQUEL was a reasonable, even somewhat great idea. Programming languages are still developing rapidly today. In the 70s, it was popular — and not clearly wrong — to develop programming languages that resemble English (hello, COBOL!). Bear in mind that the cutting edge in user interfaces at the time was the CRT terminal, which first appeared in 1964 and which was only becoming ubiquitous around the time Codd was publishing about relations.
If you wanted non-programmers to access data, you wanted a simple language that could be used from a terminal. This was the original intent of SQL. For that purpose, at that time, it was pretty good!
At the time SEQUEL was being described, there were other, much better database languages being developed for programmers. Datalog, the best such language started to come together at a conference in 1977.
Datalog and SQL bear almost no resemblance. It’s hard to imagine they’re based on the same underlying theory. Datalog is terse, elegant and flexible, none of which can be said of SQL.
What happened is that the founder of Oracle, at the time when database standards and technologies were a movable subject, used guaranteed and generous support from the CIA and other agencies to achieve dominance with his SQL-based product.
What might have been…
If your only exposure to the relational model is SQL, go read about Datalog. Comparing it to SQL is the best way to understand the ghetto that SQL has driven the computer industry into.
Where we are today is that the only significant way we employ the relational model is through SQL. Along with the terrible, no good, very bad language itself comes a bunch of other fairly arbitrary assumptions arising from the original use of such tools in banks and the like. So an SQL database is a heavyweight, local file-based, bureaucratic thing designed primarily around concerns of data integrity in the face of multiple users (transactions, MVCC and the like).
These are fine features for a lot of use cases, but are unnecessary bloat for a lot of others. No wonder the NoSQL movement has developed other products around other storage models, such as not needing a predefined schema, or distributed and eventually consistent stores. Such a pity, then, that their rejection of the arbitrary rigidities of SQL led them to abandon the relational model at the same time. Such wasted effort!
Imagine instead that we had standardised on Datalog. Your favourite programming language lets you manipulate declarative logic as easily as you do arrays and strings.
Relational models are naturally distributed:
You would still have a lot of the benefits that SQL databases have: logic represented relationally can be trivially employed from all the programming languages, for example.
Wherever in your business application you are enforcing rules, you can do so more naturally and entirely declaratively, making those rules more sharable and malleable.
Most of a modern business application can naturally be represented in First Order Logic. Such representations are deliberately not Turing complete — meaning you can’t have off-by-one errors or null dereference errors or any of the problems that arise when feeble human minds try to craft the behaviour of a Turing complete system.
Security concerns are naturally expressed and enforced declaratively (even SQL can do this much better than your ad hoc code can).
In short, pervasive and facile deployment of the relational model in software engineering would be a very different software engineering — simpler, easier, more reliable, more easily distributed, and more secure.
And we would be in that world today had Larry Ellison not done a deal with the CIA in the 70s.
Waving my hands only slightly, First Order Logic is the richest logic possible such that it is guaranteed that a computer can work out every possible consequences of the facts and rules in the system. Anything richer, and you run into the Halting Problem/Gödel’s Theorem issues.