May 06, 2017

Not a flying car!

It's been too long since i've posted here, i'm likely to lose my massive readership (lol), but life has been going on, and attention inevitably goes to whatever demands it the loudest; lately, most of my attention has been going to writing code.

I've gotten a good bit written, and reminded myself what a PITA it is to write testcases for every function, but there are still some design issues that i'm struggling my way through before i really get going on implementation.

There are dozens of issues contending in the design-mind these days, mostly circling around how to write a filesystem on top of a filesystem, how to store things, what format and so forth, to maximize performance.

Performance is critical for this project, since it'll be storing at least one database at the end of every emulated machine-cycle.

But i've found that the most crucial aspect of all is to continuously remember what i'm doing.  I shuold put a smiley there, maybe, to indicate that although it sounds funny, it's dead-on serious; but that would take a not-smiley, which would have to be some kind of frownie, which it isn't. 

While you're creating something, the most essential thing (it seems to me) is to maintain a crystal-clear image of what you're creating.  At least until you get it written down, once it becomes manifest you don't have to work so hard at remembering what it is you're bringing into manifestation.  Until then, or at least for today, i'll use this post to revisit the intention of the design i'm struggling to choose for the thing i'm building.  Comma, if i'm lucky.

This is not being written because i'm concerned about losing a nonexistent readership, maybe in hopes of getting someone else interested, but i think mostly i'm writing this post to remind myself of what i'm doing.  At this particular time, i need to remember it to you, so to speak.

What is this thing, anyway?  It has a name BTW.  It's called "hoss".  Reasons aside, that's just what it's called.  I needed a name-prefix to identify files and functions, and it turned out to be "hoss".  But what is it? 

It's an emulator for a new machine architecture, one that doesn't use fixed-length words but named data instead.  It's an assembly language that assembles into exactly its own source code.  It's an operating system within an application within an operating system, or it's firmware within hardware.  It's a tool for philosophical inquiry, for organizing your shopping lists, or implementing a website or an AI construct; it's whatever you can find ways to do with it.  Its applications are binary-portable and self-contained, running on whatever hardware is available.  It's a thin-client that supports a consistent user-interface paradigm of your choice in every application.  It's an interactive debugger and a graphical programming system.  It's a scripting language and it's a lot like a C++ assembler without all the syntactic frippery. It's a user-interface language, and a toy; a high-level low-level language in the middle of things that just need to get done.  It's an IDE and a plain-old Desktop Environment, or maybe only good for sorting email.

It is *not* a flying car.  It might be nothing more than an old man's dream come to nought.  It does however keep me off the streets, and that is a very good thing imo.

I find it interesting how perspective changes with age and freedom.  When i was 23 and fresh out of college, i'd have just started writing code.  Not knowing how much work was actually involved in things, i didn't know enough to be daunted by the size of an undertaking.  Now that i know what's involved, the approach remains the same... write the code.  But knowing how much work is involved, my natural laziness (and *all* programmers are lazy, or we'd be doing work that required heavy lifting of some sort), i find myself using ingenuity not to write code, but to find excuses to write *about* doing it, instead of doing it.

Enough said for now, as always my email address can be found by poking the right blogger links, and i'm back to cranking some code... i am after all crankypuss.

March 02, 2017

done prototyping, mostly

It's been about 6 months since my last blog post.  I've been busy prototyping.  I've also been learning great bunches of things, which after all is the goal of prototyping, in case you didn't get that memo.

Actually I'd be surprised if anybody much did get that memo, it isn't one that Corporate America is likely to be sending around.  Prototyping costs money, and it's always easy to just add one more feature.

Speaking of creeping-feature-ism, i should probably find some fender-flags for my buick, which was shaped in such a way that you can't tell where the front end really is, your view of the hood is apparently trying to induce you not to run over pedestrians, and instead bang the low front air-damn <sic> on every curb in sight.  Better for GM's profit that way, avoid that category of lawsuit and at the same time drive more customers to bodyshops that will buy more GM whatever-the-front-is-called-this week, now that bumpers and grilles and whatever else are integrated into a single piece of something plastic-y.

I figure in about 5-10 years when somebody figures out how to 3D-print from plasma or something and re(?)invents the replicator from Star Trek, those old boys are gonna have to wear Depends for a while, cuz the first thing to go will be paper money, followed quickly by just about every kind of factory there is.  If there's any one device that can change the world, that's about it, as far as i can see.  If i still believed the stock market was anything besides an overrated form of gambling, i'd invest in imaging, the technology that'll soon be good enough to make a 3D interior image of my old buick and let anybody with enough juice print a copy just like it was when it was scanned.

But, back to the real world of real code.  So i've been doing this prototyping, in PHP of all things.  Got quite a bit done, actually.  But it got to the point where PHP has become more trouble than it's worth, and C++ always had far too many syntactic niceities for my taste, so it's back to C, which i never really learnt since i skipped over C going from mainframe assembler to C++ on some now-ancient version of Windows in '92, and C++ was never my favorite language to begin with, but this is the last program i'll ever begin writing in any language other than hoss (crossing fingers), and it isn't going to be that long before i can get enough of a base together to make that initial switch, so the chances that C will keep working as it has for many years are good enough.

Meanwhile, back at the ranch-house, i'm trying to figure out how to deal with setting it up as a package so that once i'm gone it won't just disappear in a puff of greasy black smoke.  And trying to figure out just what to use to write the doc, which will probably be a PDF, and will likely become a historical oddity the moment it's done. <g>

September 03, 2016

Associative Storage and Everything Everywhere

These days, it's hard to buy a computer that has less than 1 gigabyte of main storage.

The personal computer manufacturing trend from spinning disks to solid-state drives seems clear.  Mobile devices today are usually sold with a minimum of 8G of "internal storage" (representing the traditional "hard-disk"), and the ability to extend that using an sdcard or USB devices.  This is in addition to the minimum 1G of main storage.

The only real difference between the three categories of storage (main, internal, and removable) is that removable storage can be removed.  In other words, the "hard-disk" equivalent "internal-storage" in your mobile device is, for all intents and purposes, an sdcard implanted in the mainboard circuitry.

Your computer's main-memory (RAM) is generally faster and of higher reliabiity than its internal-storage (hard-disk), which in turn is generally faster and of higher reliability than its removable-storage (memory-card or USB-stick).  But even removable-storage these days is much faster than any storage that was available RAM, in for example, 1970.  Current manufacturing techniques point to even larger, faster, storage chips.

As a result, the concepts of "main-storage" (traditionally "RAM") and internal-storage (representing the traditional "hard-disk") are beginning to blur together, with the logical result that what programmers have traditionally thought of as "RAM" (main-storage) and "internal-storage" ("hard-disk") become a single storage pool, and "remote-storage" will be the remaining category, containing both local-removable and net-connected storage, extendable indefinitely.

The question then becomes one of addressing main-storage, internal-storage, and removable storage, to find a unified approach, one that will survive the transition between using separate address spaces for main-storage and internal-storage ("RAM" and "hard-disk"), and using a single address-space for both categories of non-removable storage.

The most obviously workable approach is to use a hierarchically-qualified namespace similar to that used for network addressing, or for a Unix-style directory structure.  Thus each successive level of qualification limits the semantic tree within which subsequently qualified data may exist, while the overall tree-described namespace remains unlimited.

Linux has used this concept, even to the point of having files in the /sys hierarchy which are dynamically implemented and contain only a single value.

At this point, enough of the characteristics of "Totally Portable Software" have become obvious and it is time to get down to specifics... ie, writing code. 

Since publishing the previous post I have been struggling with syntax issues, and think it is now time to stop feeling guilty that I haven't finished some blog post and concentrate on the prototype.  I've also taken a few bites of the Apple and find that although linux remains my development platform of choice, iOS will be a very good first-port candidate because of its many restrictions.

The concept described in this and the preceding posts is being implemented as yet another new language, because, you know, there aren't enough languages yet, right? 

No, it's being implemented as a new language for some very good reasons, it seems to me.

March 26, 2016

Documentation and Portability

The whole civilized world has become dependent on the internet.

When you find yourself without network connectivity, many of today's applications (especially Android applications) simply won't work, they display a message telling you to connect to the network, and they're done; if you need to use your application in order to do something, you're out of luck until your network connection and the remote server both become available again.

Even if your particular application is not inextricably bound to the existence of a remote server to perform its function, there is an increasing trend toward placing all documentation on some website that's almost guaranteed to be down when you most need to look something up.

If an application is to be totally portable, it cannot bound to the internet for the performance of its function; if it is bound to the internet, it is only portable to those locations where internet connectivity is available.

If the network is available, fine; but if it is not available, a portable application should still be able to perform its full functionality, including the presentation of whatever documentation is necessary.

While this seems to call into question the portability of any specialized client having no function other than to interface with a specific remote server, a portable client's user interface is always the same no matter where it's running, and indicating whether or not its associated server is available should always be part of such a client application's function.

Ideally the end-user should never have to RTFM, but when it's necessary, it should be possible, and the documentation immediately available should correspond exactly to the application version installed on your media.  If you are at a wifi hotspot and download a new application onto your Android phone, it should work later when you have no network connectivity, and it should work identically to the copy you downloaded or copied onto your Mac or Windows or linux system, documentation and all.

Nobody needs documentation to use a hammer, a hammer's use is so obvious in its implementation that a monkey can use one without the need for documentation.  Computer programs are just specialized hammers made for beating data into shape, and their operation can be just as obvious as a hammer if they're done right; they seldom are, so we end up needing documentation if we're to use the program, whether or not the network is available.

February 29, 2016

Addresses - What Good Are They?

In order to construct applications that are truly portable, we need to address all the requirements for true portability.

If you can't move your application, and its associated data, from your Windows or Mac or linux system that's running on an Intel processor, to your Android tablet that's running an ARM processor, or your BlackBerry phone running OS-10 on top of a who-knows-what processor, it isn't truly portable.  If your application is running on a little-endian system and it has to be modified in any way to run on a big-endian system, it isn't truly portable.

I'm not talking about some conversion program that claims to convert things for you, and might, or might not, do it quite right, and I'm not talking about rebuilding your applications to run on a different processor architecture: I'm talking about a binary file copy operation.

If you can copy your application's executable and associated data files from wherever you last used them onto a USB stick, and then copy them from that USB stick onto the next system you use, regardless of processor architecture, and your applications run identically, they are portable; otherwise, they are not portable.

It's that simple, that's my definition of portability: same program, same binary datastream, same operation, on any system.

From this, the concept that an application should not be bound to instruction-set or architecture-defined wordsize, it becomes clear that addresses, and in fact all forms of word-based arithmetic, are issues that must be addressed if Totally Portable Software is to be more than some distant ideal.

In early days, when applications were measured in kilobytes instead of megabytes, and data was counted in bytes rather than gigabytes, we used addresses.  We assembled our code into machine instructions, and we counted the bytes of each instruction we wrote, and gave each label used in our program the address of the instruction or data it represented.

The only purpose an address has ever served, in any compter, regardless of architecture,  is to locate some code or some data in memory.

The labels, names we assigned to control-points or data items, were always for humans, and the addresses were always for the computer; programming proceeds from semantics to syntax, meaning to implementation.  The fact that we have compilers and linkers to take care of the details of address-tracking does not make the addresses somehow more important than the meanings of the labels that such transient calculated addresses represent.

Totally portable application code cannot include binary memory addresses because of variations in wordsize and endianism, but binary addresses are mere binding details with no inherent semantic value; the label "returnPoint" has a meaning of its own, but the instruction address 000137F4 does not have any meaning of its own, all of its semantic content derives exclusively from the fact that it represents "returnPoint".

We can cut out the "address" middleman and access all control-points and data-items by name.  In fact one could say that the only reason for ever having used binary addresses at all is that they could be implemented with the hardware of early-generation computers.  We have enough to work with now that binary addresses are no longer necessary; we can address control-points and data-items by name.

Likewise binary, word-oriented, numeric representations are problematic.  For portability we do not want to impose constraints related to endianism.

However, as with addresses, the binary representations of numbers are not the numbers themselves.  We use the character representations of the numbers when entering them in as data, and when printing them on reports.  The question of which can be stored more compactly becomes relatively unimportant when the main-storage of systems one can actually purchase today begins at 1 gigabyte and becomes much larger, and only USB sticks come in quantities of less than hundreds of gigabytes.

The question of performance becomes a potential issue with numbers stored in their character representations.  The question of whether decimal-arithmetic based on a character string, is faster or slower, than the conversion of the same string to binary form followed by binary arithmetic of fixed and "non-portable" precision followed by the result's conversion back to a character string, is mostly answered by how much actual arithmetic is being done on the binary form.  And there are still such things as "math coprocessor" components that can be called upon at need.

Totally Portable Software does not require either numeic instruction addresses or binary arithmetic, in fact it requires that neither be supported.

This does however make it more clear that Totally Portable Software be written in a fully interpretive language, which in turn requires an interpreter that (1) runs on as many different system configurations as possible, and (2) runs efficiently enough to make it worth the trouble.

Having written massively complex applications in interpretive languages that support associative-storage (VM/Rexx and linux PHP), having observed the huge increase in processor-speeds and storage-per-dollar over the past few decades, seeing the trend from desktop to laptop to tablet, and looking at the way tablets and smartphones are set up, I am convinced not only that Totally Portable Software is possible today, but also that it is past time to get started.

February 20, 2016

Whats wrong with linux?

The title of this post might lead readers with a Windows or Mac background to expect a bitch-session denigrating linux; such readers will probably be disappointed, unless they recognize the fact that all operating systems are flawed, usually in different (though no-less-annoying) ways.

At the outer, most user-centric level, many people, even those who use a linux distro on a daily basis, think there is a linux operating-system.  In my view, there isn't.  An operating-system has a single well-defined programming interface, available to all applications.  Linux has no such thing: it has not yet evolved one.

Linux, technically, is the linux kernel.  A linux distro (distribution) consists of the linux kernel, the core-command set, some standard and non-standard "daemons" ('daemon' is Unix, thus linux, terminology for a local service process), a whole slew of libraries that attempt to provide the glue between the "operating-system" and its applications, and an eclectic assortment of additional "non-core" commands, daemons, and applications, whose developers have managed to bring them to a "working" state on top of a furiously-shifting environment.

The linux kernel is not "done" yet (software is never really "done", the part not yet done is called "the next release"), and new hardware and new ideas continue to result in further kernel improvements.

However, when an application's best access to system functionality is the textual output of one of the core-commands, bringing that application to a state acceptable as "working" is no small feat, and one based on shifty sand at that; textual output has to be parsed, and if the output changes in either layout or content, the previous parsing must be manually updated.

That's just how it is, and this is no invention new to linux, Unix has worked that way for decades, even though at about the same time that Unix was being developed, they were teaching us in undergraduate Computer Science classes that we should never, ever, parse the textual output of a command and then use it as input; that puts your programs at the mercy of whoever is coding the message output.   Hopefully, it also inhibits that person in the improvement of the message text, because changing it daily would incite riots within the ranks of the software development community.  Instead, we were taught to use the defined operating-system interfaces, which did not require textual parsing and could be expected to "hold still" for a much longer period of time. 

Apparently those who initially developed Unix felt that this was a non-problem, but I strongly disagree; I've had my utilitiy code jerked around by new versions of "the operating system" about once a year, on average.

When the closest a system has to a defined application-programming interface *is* the message output of commands, the application programmer is between a rock and a hard place.   On one hand, the system only provides the information in the form it supports; on the other hand, your applications (and subsequently you, if you use what you write, and if you don't use what you write, you shouldn't be the one writing it) end up getting jerked around like a puppet whenever some core command changes. Between that and having laptops reach what appears to be their planned-obsolescence limit of 2 years, and trying to get a clean distro install on top of whatever firmware "the industry" has decided the market must-have, it can be tough to keep a consistent forward momentum on application development projects.

In addition to the issue of applications that are forced to try and remain current while the libraries and line-mode commands they use change, there is also the configuration issue, that inevitably turns up, in some unpredictably different way, with every fresh install.

There are at least dozens, probably hundreds, of different configuration files that have to be manually updated in order to get things working usably on a Linux Desktop, which of course is not linux, but only a desktop-environment-application that runs on top of the linux kernel and a lot of libraries. 

Each DE (Desktop Environment) attempts to offer something in the area of system configuration, but there is little consistency in config-file format, and less integration between the various system-level programs involved. 

Each distro includes the configuration-defaults for every Desktop Environment package installed, as part of their DE packages.  An end-user attempting to control the computing tool from a given GUI environment can easily become confused about which GUI configuration tools are actually related to whatever config files actually determine how the system operates.

It's a mess.  It isn't bad, just messy.  Windows advocates will try to explain how Windows is better in this way or that.  Having used various versions of Windows between 1993 and 2013, having developed custom controls for Windows from the frame up, having had the user interfaces to the tools I needed to use jerked around with basically every release of every product, I will tell you that Windows is no better than linux in any way that I ever ran into.  The Win32 interface may have been a unified operating system interface, but in my opinion it was as bad or worse than anything I've seen to date in linux.

As messy and confusing as linux might be, there's no other operating system I know of that meets my "min spec".  Of course we all have different views of what "min spec" might comprise.

Approaching software development scientifically requires that we be able to exactly reproduce the starting conditions for our tests, which includes the testing we do as a normal part of the development process.  To understand how something has changed, you need the both the before and after states.  If you can't reproduce the before state, exactly, you're guessing.  If you can't restore your data onto a fresh system drive, including the supporting operating system, from a last-known-good snapshot, that operating system doesn't meet my min-spec.

Linux meets my min-spec: you can conveniently back up the system that supports your code, and restore it to a fresh drive, and having restored it (and adjusted /etc/fstab to load from the intended kernel image), your code will run as it did before.  In fact you can even restore it onto a different hardware configuration, as long as the processors are compatible and your root filesystem contains the necessary drivers, and it will boot and run your code, which will run more or less identically, depending on how hardware-dependent your code might be.

The main problem I see with linux, other than its having been grossly oversold as what it is not (it is not yet a desktop operating system), and the fact that development is proceeding at a pace users may find it difficult to keep up with, is that it is a copy of Unix, which was designed as a multi-user system.

That is not to say that there is anything wrong with designing a system to have the necessary flexibility and security features to support multiple users attached to a single processor.  However, the vast flexibility linux provides is difficult for most end-users to configure; the swiss-army-knife has too many blades of uncertain function for the average user to deal with. 

This will almost certainly solve itself over time, as the linux toolkit evolves into something more fully integrated, with configuration aligned to its corresponding functionality and stored in a central location and format.  No enforcement of standards is necessary for this to occur, it can be expected to simply happen, as better configuration methodology is developed, and application developers continue to use the best and easiest methods available, because if application developers were not inherently lazy, they wouldn't be in the business of automating the work to be done, they'd be doing it by hand.

There is however a characteristic of multi-user system designs that I find problematic.  When a processor with limited resources is supporting dozens, or hundreds of users, who are logged on at once, it makes sense to reduce the overall system footprint by sharing one copy of each application among all of its users.  However, most of the systems linux is currently running on actually support only one user at a time; count the number of Chromebooks and Android phones and tablets out there to see just how quickly the server population has been surpassed by the consumer population, and how outmoded the multi-user-system paradigm has become.

The paradigm of sharing an application, and in most cases its base configuration, among multiple users, breaks down whenever a new version of an application is released, because applications may update the schema under which they store their data.  When this happens the end-user is irrevocably committed to the new application version, at the whim of the sysadm who updates the shared copy of the application.  Even when the end-user chooses the time of update, as is the case when they are performing a general system update "as administrator", the change to a new version of an application may be irrevocable; even when you can regress the application's executable, your data remains stored in the application's latest format, and you can expect previous versions of the application to choke on it.

What is most necessary, it seems to me, is both a physical and logical level of separation between "operating system" (which in most linux distros amounts to the entire root partition with the exception of '/home'), and "application", along with a homogeneous application interface to the operating system.

Each user should have his own copy of each application used.  It should be possible for the user to update any application to a newer version at any time, modify it, or regress application+data (including application settings) to any previous version, or to a last-known-good snapshot.

Of course this brings the dependence of applications on shared libraries to the surface as another problem.  When an application relies on an external library to perform its functions, the process of updating the application includes an update to an external library; likewise, the regression of an application to a previous version may include the regression of an external library to some previous version.

That makes it clear that the dependence of applications on shared libraries is also problematic.

In olden times, when dinosaurs competed to see who would eat us for lunch, we used an ancient technique called "static linking" to create an application that was self-contained aside from its inherent dependence on the operating system.

However, as the magic of the ancients became encapsulated within encapsulations and stored in loadable libraries to reduce overall footprint to make up for the bloat, the hardware industry was making storage cheaper and thus encouraging more and more code-bloat, and "lookit-me" lames were adding such vital functionality as talking paperclips thrust into our faces in more and more ways in newer and newer versions of applications which, at one time, were actually usable.

In this programmer's opinion, it's time to take the technology back, and hopefully reach a state in which applications can be developed without constant readjustment to the weekly fashion statements of those in control of too many libraries.

Fortunately it can be done, without throwing away the wheel we are reinventing.

November 22, 2015

Total Portability is not binary

Primitive caveman didn't have many different materials to work with; dirt, wood, rock, water, plus whatever he could scrounge from dinner's remains (bone, gut, skin, etc).

Now we have steel, aluminum, various exotic alloys, plastics, supermagnets, semiconductors, blah de blah, we have a whole bunch more stuff to work with than the ancients could conceive of.

Likewise, primitive programmer had comparatively little to work with.  We worked in assembler, and we dealt with binary addresses every day, whether they were written in octal or hex, and simple arithmetic in a non-decimal base was our stock in trade as we worked as close to the bare metal as it is possible to get, without benefit of any operating system worthy of the name.

Nowadays we have more toys to play with. But even though we have operating systems to host our compilers and our version control systems and our interactive development environments, things are still being done the hard way, with binary addresses and different machine instruction sets and architectures.

It seems like the array of libraries and shared objects (at least on linux distros) continues to increase with no end in sight. And each of those is just chock-full of binary addresses, generated from names in source code, and linked together into a single matching lump, each of which seems to become superceded by the next update, and in many cases the previous version is still needed by applications that aren't affected by the update, or haven,t been updated to use the latest libraries.

If you understand how compilers and linkers work, it is difficult not to be awed that it all fits together and works. Mostly. If you have the right set of matching libraries.

Each of those libraries evolved over time, from version and release, to the next version and release, in response to some application's need, even if that need was the result of a bug-fix, rather than new function.

If you want Totally Portable Software, it is necessary to move beyond different operating systems and machine architectures.

If you need a different binary for Intel or ARM processors, your software is not totally portable. If you need a different binary to run your app on a linux system than you need to run your app on a Windows system, or an Apple system, or an IBM mainframe, or your Android phone, or your iPhone or BlackBerry, your application is not totally portable.

Today we don't have any Totally Portable Software. We have lots of software that attempts to be portable, but it's hung up on irrelevancies like operating systems and hardware architecture and support libraries that contain inverted pyramids of workarounds which provide functionality the operating system fails to provide in the name of efficiency.

So basically, to achieve total portability, you have to start with a portable non-binary architecture, and build up from there. That makes it sound like reinventing the wheel, the internal combustion engine, and all the rest; fortunately it isn't quite that large an undertaking (though it is definitely significant).

It also sounds as though it would be very slow, but it seems that modern processors have become fast enough to do the job, if you do only what is actually necessary, without having to work through a mass of interface routines that attempt to perform functions ill-supported by the underlying host OS.

Traditional operating systems like linux tend to try to be everything to everyone, and are carrying forward decades of fixes and enhancements; backward compatability is potentially a giant-killer, but the overwhelming number of existing applications is enough to necessitate it, and more and more libraries serve to keep the existing codebase moving forward.

On the other hand, a non-binary architecture has some significant advantages.

First, it avoids the need to convert names into binary, and thus make all using-applications dependent on specific versions of a specific shared library.

Second, a system-independent architecture need not be much more than a thin client interfacing with a single user. It can ride along on top of a linux host that acts as a local server. Applications can perform their functions and interface with the user in a consistent way, with any number of end-user-selectable UI paradigms.  They can also obtain existing functionality that is provided by a number of local or remote servers.

A truly portable architecture, supporting a consistent, user-selectable, user-tailorable, operating interface, along with a suite of portable applications like text editors, mail programs, and others, can act as the end-user's sole interface with the hosting OS, and through that host, the wider networked world.

Totally Portable Software is practical today, it's just a matter of investing the time to develop a non-binary machine architecture that is sufficiently flexible.