Colin Macleod's Random Ramblings

Wednesday, 18 November 2020

Fundis - FUNctional DIStributed processing

This is a technical post about a project I've been thinking about on-and-off for a while. The idea is to use Redis as the middleware for a heterogeneous network of processing agents. Clients would write requests for particular computations into Redis, servers would pick these up, do the computations and write the results back into Redis, from where the clients would pick them up. I haven't yet written any code for this, I'm just trying to clarify the design at present.

This could be seen as a more ambitious successor to "Fundep" - a little library I developed while working at Bloomberg to connect and control a network of FUNctional DEPendencies within a single process. "Fundis" aims to do a similar job, but working between multiple processes distributed over multiple machines.

Background

This project was motivated by problems I faced as a software engineer at Bloomberg, from where I recently retired. One might say that they are no longer my problems, but I found them interesting and spent quite a bit of time thinking about ways to overcome them, but never had the time to implement these ideas. So I'm reluctant to just abandon all these thoughts, and would still like to implement them in the hope that they may be useful somewhere.

In the early days, all the financial analysis/trading/management/etc functionality which Bloomberg provides its users was driven by a few huge monolithic executables running on huge monolithic "big iron" machines. This was actually pretty efficient, but very inflexible and hard to scale-up further. So the trend has been to move specific chunks of functionality into lots of different executables, each providing specialised services, and distribute these across an increasing number of more cost-effective and energy-efficient "commodity" machines. Bloomberg differs from other prominent providers of online services in that Google, Facebook, etc. provide a fairly narrow range of functionality on a very large scale, while Bloomberg provides thousands of different functions, each of interest to a different subset of their users. (To give an extreme example, I once had to implement some special logic in one function which was only expected to be used by a single client, and then only a few times each year - however that client was the central bank of a country, so the work was considered justified.) So the back-end of the Bloomberg system now consists of many thousands of different executables, each of which may be running tens or hundreds of instances spread over a cluster of machines dedicated to that particular area of functionality. Responding to a single user click on a button may involve chains of calls to dozens of different services.

Clearly, efficient communication between all these services is critical to keep the whole system responsive. I will not go into the details of how this works, it uses some proprietary protocols which always seemed rather heavyweight to me. What is relevant is that there can be major mismatches between the speed of different operations, and updates that need to be fast can be held up waiting for operations that take significant time. Sometimes this is simply unavoidable, but in many cases delays can be minimised by:

Caching results which may be needed more than once;
Pre-computing results which take time and are likely to be needed later;
Doing repeated similar operations in batches, thus saving communication and setup time;
Making requests to multiple services in parallel, as long as the operations are independent;

However these optimisations are often easier said than done. The natural way to write code tends to lead to making requests for remote data/operations as and when they are needed. Implementing any of the optimisations above requires some refactoring and more complex code, so it tends not to get done when the priority is to deliver a working system quickly.

Caching and pre-computing is sometimes done, e.g. with Redis, but extra code has to be written for this in each case. Note that caching within a client process or even one client machine is usually not ideal as the next request involving the same data may be served on a different machine.
Due to the volume and complexity of the existing code, which has often been updated by dozens of developers over tens of years, it can be quite hard to get enough of an overview to see clearly where batching is possible. Similarly, it can be very difficult to trace the interdependencies between sub-computations to see when it is safe to re-order them or do them in parallel.

All these forms of optimisation depend on decoupling the sequencing of calling computationally expensive remote operations from the sequencing of the code which consumes their results. So rather than making such remote calls directly from the consuming code, we need some infrastructure to manage these calls and store their results. Instead of building such infrastructure in ad-hoc fashion for each use, it seems worthwhile to create a generic infrastructure which can manage many such uses.

Note that if the sequencing of calls can be changed by the infrastructure, it is essential that such re-ordering has no side-effects. This implies that the remote calls must operate as pure functions, whose only action is to produce a result which depends only on their inputs. However if the result is expected to vary over time, we can add a timestamp or version number parameter to make this explicit.

Also, it greatly simplifies the client code if it can just ask for what it wants without needing to specify which server its request should be routed to. However in the Bloomberg environment the system for specifying how to route requests to the appropriate servers had become highly complicated, requiring considerable attention to configure and update correctly. I believe it should be possible to manage this in a way which is both simpler and more dynamic.

Proposal

Redis is often used simply as a cache, but the Redis home page describes its uses more broadly as "database, cache and message broker". I want to explore using Redis as a single integrated messaging and caching system which would handle all the communication between services in a distributed processing environment like that described above.

If we assume that processing results are going to be cached in Redis, we will need to have code to write and read input and output in the string key/value form which Redis supports. The key here needs to include (or at least depend on) all the relevant inputs, otherwise we will get false hits. So rather than re-encoding this same data in another format in order to call a remote service, we can use the Redis-compatible format to communicate with the remote service as well. The procedure would be:

Client formats the input parameters as a string that can be used as a Redis key.
Client queries Redis for this key, if found client gets the data in string form and decodes it.
If the key was not present in Redis, client requests it by writing the key to a Redis queue. Note that the query and request (when needed) can be done atomically by sending a Lua script to be executed on the Redis server.
Servers for this data will be monitoring this Redis queue, so one of them will pick up the requested key, do the necessary computation and write the input-key/output-value back to Redis.
Client then reads the result from Redis as it would have done at step 2 if it was already available.

Data structure / granularity

For a first version, I would represent each function in Redis by one hash for the key/data pairs and one list for a queue of keys being requested. In a later version I would hope to support more sophisticated, possibly hierarchical structures. When a client wants the data for a key which has not yet been computed, it will RPUSH the key to the relevant queue. Each server will monitor the queues for the functions they support with BLPOP - this ensures that each request will be processed by one and only one server.

Pre-computation

When a client knows that certain data is likely to be needed soon but not immediately, it could write requests for this data into a low-priority queue (represented as another Redis list). When a server is idle and has no work waiting in the main (high priority) queue it would serve requests from the low-priority queue, writing those results into Redis so that when the client later needs them they are immediately available.

Common parameter data

Sometimes several different computations will require the same input data, e.g. info about a user such as full name, address, organisation, privileges, etc.. Rather than passing each of these parameters individually to each function which needs them, the client could write a "user" record into Redis with all this info and then just pass a single identifier which enables a server to find this info to each function called.

Scaling

Note that depending on load, not only could extra instances of specific servers be started or stopped on-the-fly, but even moved to different machines without needing any special routing configuration changes.
Redis itself could become a bottleneck but if necessary multiple Redis instances could be used, along with some scheme for sharding data across instances.

Side benefits

This system has the side effect that requests to services and their replies are automatically recorded in Redis. Retention times may need to be tuned depending on the storage space available. But this data can then be inspected and monitored by other tools for debugging, testing, system health checks etc..

Next steps

If anyone finds this interesting or has feedback, please post a comment. I hope to start prototyping this scheme soon, and will post any results here.

Update 17/5/24 - after a ridiculously long delay, I have now made a Tcl implementation of this idea, see https://wiki.tcl-lang.org/page/DisTcl+%2D+Distributed+Programming+Infrastructure+for+Tcl.

Wednesday, 4 November 2020

More Autumn Photos

Not much time to post anything as we are now in Family Lockdown No. 2. Every Monday evening my son John goes to an activity group run by Resources for Autism which he enjoys. But a week ago they emailed to say that one of their staff who was at this group had been confirmed to have coronavirus, so everyone who had attended the group, including John, needs to self-isolate for two weeks from that Monday, i.e. until Monday 9th November. Since it's difficult to do any real isolation while John is at his usual Supported Living house with a whole team of staff coming and going, my wife and I have been looking after him back at our family house for the last week.

I have been able to get out for a walk now and then, so here are some pictures taken in and around Hadley Wood (the actual wood that is, not the suburb where the Porches have stickers saying "My other car is a Bentley" 😄) :

Wednesday, 28 October 2020

Odds and Sods

What the ****! I started this blog thinking I would be speaking to maybe a dozen people. At the weekend I posted my usual grumbles about the general lack of appreciation for my favourite programming language Tcl. My former colleagues at Bloomberg must have been fed up of hearing me repeating this stuff. Anyway, someone I don't know linked my post on "Hacker News" and 24 hours later it had 15000 views😵. Any time my ego needs a boost I can reread this comment!

Trying to improve your mood by thinking about it can seem rather like trying to make your car move by pushing on the dashboard (a phrase I read recently in a completely different context). Just getting out for a walk can be much more effective. And it turns out that walking around my area High Barnet is so interesting they made a film about it - see trailer at 23 Walks, also background info. Most of the locations in the trailer are extremely familiar to me, even the council office which appears briefly looks like one my wife and I have visited several times for meetings with social services about our son's care.

But isn't it about time the scriptwriters for the dystopian future drama we seem to be living in decided to lighten up a bit? Surely it was enough to have half the world ruled by mad dictators and would-be dictators, impending environmental catastrophe, the UK tearing itself apart, without adding a world-wide killer virus on top as well? Perhaps in the next episode Stephen Pinker will assure us that all is well?

Some handy points I picked up from Pinker's book How the Mind Works, paraphrased somewhat:

Love is the state of mind where the well-being of another person becomes as important as your own (not very romantic, but it works for me).
The conflict between logic and emotion is bogus, because logic tells you how to do things but not what to do, while emotion tells you what to do but not how to achieve it (in theory of course, in practice this conflict still often seems problematic).
It's really not surprising that human beings can be obsessive and/or highly sensitive about almost anything even remotely related to sex. For the "selfish genes" which ultimately shape our behaviour, whether and with whom we have sex is quite literally a matter of life or death, determining which of those genes live on in the next generation.

The "Keep Calm and Carry On" attitude has a lot to answer for 😕. Of course if you're in the middle of a crisis, you have to focus on the immediate practicalities of the situation and emotional reactions may be luxuries you can't afford. But ignoring these upsets doesn't necessarily mean they go away. In computing terms, the various alarm signals that go off are flagged as high priority, so if they can't be handled at the time they get queued for later processing. If the crisis is intense or prolonged (such as struggling to care for a disabled family member in parallel with a demanding job), it may never be possible to process this queue. But it's also never possible to entirely ignore it and if the queue of deferred alerts continues to build up, its pressure will eventually start to disrupt one's normal functioning. So for the benefit of one's long-term mental health, a better policy may be the classic "When in danger or in doubt, run in circles, scream and shout" 😲.

I used to be rather dubious about Julian Assange and WikiLeaks. But after following reports of his recent extradition hearings, e.g. from the Independent, the indefatigable Craig Murray, etc., I have started to think that he is being "railroaded" for the crime of shining a light into dark places. Also since I tend to get most of my news from the Guardian, I am seriously disturbed by the allegation that the Guardian betrayed Assange after getting a lot of copy out of his earlier revelations.

Finally, since 40+ years ago I was diagnosed as having a schizoid personality, I leave you with the appropriate theme music.

Saturday, 24 October 2020

Why I'm Tcl-ish

I'm a big fan of programming in Tcl, the "Tool Command Language", although it is distinctly out-of-fashion these days. When I have the freedom to choose, I tend to use Tcl for anything that doesn't need to run at maximum possible speed (and probably C++ for anything that does).

One of my colleagues at Bloomberg once asked when I would give up writing utilities in such an ancient language as Tcl and update myself to something more contemporary like Python. I should perhaps have replied "I find your lack of faith disturbing" but I just said something lame to the effect that such an "update" would make me less productive 😉.

Over my 47-year involvement with computing, at various times I have been enthusiastic about several different programming languages:

St. Andrews Static Language - the first practical implementation of a pure functional programming language anywhere, which I just happened to get the chance to use in 1975-6.
Modula-2 - a very clean, predictable, understandable conventional algorithmic language.
Prolog - the classic PROgramming in LOGic language, yet another fundamentally different yet consistent paradigm.
Perl - quite the opposite of all the above, a very "hacky" language based on practicality not purity, great for solving certain types of problems quickly, but really not scaling up nicely at all.
Tcl - "Tool Command Language", for me this hits the "sweet spot" between all of the above.

Programmers who like Tcl tend to think of it as being clean, logical and consistent. However the majority tend to reject it, complaining about "quoting hell" and various awkwardnesses which basically come down to it being too different from what they are used to. Really Tcl has a radical minimalism which makes it genuinely different from the common patterns that most programming languages follow.

Most programming languages blend syntax and semantics. Each language construct (e.g. if-then-else for conditional execution) has individual rules for how it is written (syntax) and how it operates (semantics). The language definition as a whole includes all of these specific elements of syntax and semantics.

In contrast, the essence of Tcl is a very small and simple core which defines only how to define and use variables, data values, commands in general, and events. The only syntactic rules are those which define how to invoke a generic command and pass data to and from it. These are documented at man Tcl, there is no special syntax for specific commands. All functionality is defined as the semantics of individual commands. Flow control is done by commands which take other commands as their arguments. So if-then-else functionality is provided by a command called "if" whose arguments are the condition to test, the code to execute when the condition is true, and optionally the code to execute when the condition is false.

This design can be cumbersome in some ways. For example, the core has no syntax for arithmetic expressions, this is delegated to the command expr, which the programmer has to explicitly invoke in various places where some calculation is needed.

However this division of concerns creates a unique flexibility. Commands can be created or redefined on-the-fly. To give an extreme example, it's perfectly possible to redefine the "if" command to reverse its logic. More constructively, before Tcl added built-in commands for object-oriented programming, many people exploited the language's flexibility to make their own support for object-orientation.

I suspect this modular design has also enabled Tcl to evolve more smoothly. Since it was originally designed, Tcl has incorporated many innovations (caching of optimised internal representations for code and data; unicode support; multi-threading; coroutines; fully-virtualised filesystem operations; decoupling of versioning for language extensions, etc.) with almost no disruption for existing running code, something which Python still struggles with.

I should say that Lisp has many of the same attributes that I'm claiming for Tcl. One difference is that historically, Lisp systems tended to be conceived of as a universe of their own, with little regard for interoperation with anything else. Tcl on the other hand started life as an extension language intended to be embedded in other software, and so has strong support for integrating with other systems on multiple levels.

Finally we have the cross-platform GUI (Graphical User Interface) support provided by Tk. This can be used from other languages, but is most closely integrated with Tcl. For an example of the kind of handy but lightweight tools that can easily be put together with the Tcl/Tk combination, see Diskusage.

Thursday, 22 October 2020

A few older photos

Me at Isle of Wight last year

John at Isle of Wight last year

A slice cut through the clouds over High Barnet

The Mimmshall Brook

Raiders of the lost cause ;-)

Shoeburyness looking Southwest

Shoeburyness looking Southeast

Autumn leaves at Ravenscroft Park

Wednesday, 21 October 2020

"Still Crazy After All These Years"

The relevant song can be found at Still Crazy After All These Years. To keep doing the same thing but expecting a different result is popularly described as a sign of madness.

Each time I am asked to accompany my wife Eleni to a meeting about our son John's care/education/welfare etc., I think this time could be different. Perhaps we will have a rational, respectful discussion where I will be able to make some constructive contribution. But most often, sooner or later it turns into quite the opposite. This make me feel as if I'm being subjected to some fiendish kind of torture that make me want to start screaming. Then afterwards it takes me 3 or 4 days to get back to some kind of balance where I can get on with the rest of life. When you feel like you are banging your head against a wall, doesn't it make sense to just stop?