This is a technical post about a project I've been thinking about on-and-off for a while. The idea is to use Redis as the middleware for a heterogeneous network of processing agents. Clients would write requests for particular computations into Redis, servers would pick these up, do the computations and write the results back into Redis, from where the clients would pick them up. I haven't yet written any code for this, I'm just trying to clarify the design at present.
This could be seen as a more ambitious successor to "Fundep" - a little library I developed while working at Bloomberg to connect and control a network of FUNctional DEPendencies within a single process. "Fundis" aims to do a similar job, but working between multiple processes distributed over multiple machines.
Background
This project was motivated by problems I faced as a software engineer at Bloomberg, from where I recently retired. One might say that they are no longer my problems, but I found them interesting and spent quite a bit of time thinking about ways to overcome them, but never had the time to implement these ideas. So I'm reluctant to just abandon all these thoughts, and would still like to implement them in the hope that they may be useful somewhere.
In the early days, all the financial analysis/trading/management/etc functionality which Bloomberg provides its users was driven by a few huge monolithic executables running on huge monolithic "big iron" machines. This was actually pretty efficient, but very inflexible and hard to scale-up further. So the trend has been to move specific chunks of functionality into lots of different executables, each providing specialised services, and distribute these across an increasing number of more cost-effective and energy-efficient "commodity" machines. Bloomberg differs from other prominent providers of online services in that Google, Facebook, etc. provide a fairly narrow range of functionality on a very large scale, while Bloomberg provides thousands of different functions, each of interest to a different subset of their users. (To give an extreme example, I once had to implement some special logic in one function which was only expected to be used by a single client, and then only a few times each year - however that client was the central bank of a country, so the work was considered justified.) So the back-end of the Bloomberg system now consists of many thousands of different executables, each of which may be running tens or hundreds of instances spread over a cluster of machines dedicated to that particular area of functionality. Responding to a single user click on a button may involve chains of calls to dozens of different services.
Clearly, efficient communication between all these services is critical to keep the whole system responsive. I will not go into the details of how this works, it uses some proprietary protocols which always seemed rather heavyweight to me. What is relevant is that there can be major mismatches between the speed of different operations, and updates that need to be fast can be held up waiting for operations that take significant time. Sometimes this is simply unavoidable, but in many cases delays can be minimised by:
- Caching results which may be needed more than once;
- Pre-computing results which take time and are likely to be needed later;
- Doing repeated similar operations in batches, thus saving communication and setup time;
- Making requests to multiple services in parallel, as long as the operations are independent;
- Caching and pre-computing is sometimes done, e.g. with Redis, but extra code has to be written for this in each case. Note that caching within a client process or even one client machine is usually not ideal as the next request involving the same data may be served on a different machine.
- Due to the volume and complexity of the existing code, which has often been updated by dozens of developers over tens of years, it can be quite hard to get enough of an overview to see clearly where batching is possible. Similarly, it can be very difficult to trace the interdependencies between sub-computations to see when it is safe to re-order them or do them in parallel.
Proposal
- Client formats the input parameters as a string that can be used as a Redis key.
- Client queries Redis for this key, if found client gets the data in string form and decodes it.
- If the key was not present in Redis, client requests it by writing the key to a Redis queue. Note that the query and request (when needed) can be done atomically by sending a Lua script to be executed on the Redis server.
- Servers for this data will be monitoring this Redis queue, so one of them will pick up the requested key, do the necessary computation and write the input-key/output-value back to Redis.
- Client then reads the result from Redis as it would have done at step 2 if it was already available.
Data structure / granularity
Pre-computation
Common parameter data
Scaling
Redis itself could become a bottleneck but if necessary multiple Redis instances could be used, along with some scheme for sharding data across instances.