Commentary: What in-memory BI ‘revolution’?

You can contact Nigel Pendse, the author of this section, by e-mail if you have any comments, observations or user experiences to add. Last updated on 16th August, 2010.


 

This page is part of the free content of The BI Verdict, which represents a small fraction of the information available to subscribers.

By purchasing a subscription to The BI Verdict, you and your team will gain access to the most thorough single source of information available for business intelligence buyers, including:

  • Detailed reviews of over 20 leading BI tools
  • Factbases with feature-by-feature product scores and notes
  • Customer Verdicts revealing product strengths and weaknesses
  • KPI Dashboards comparing leading solutions on 27 key criteria
  • Access to the key findings from the latest edition of The BI Survey
  • A series of in-depth market analyses and commentaries

Click here to see the full table of contents for The BI Verdict

 

If you follow the business intelligence industry, you’ll have been swamped with amazing claims for ‘revolutionary’, ‘next generation’ in-memory BI tools. But what is the truth behind this development?

Why all the hype?

There’s one obvious reason for in-memory BI – it’s extremely fast, which is a big plus for users. The biggest performance bottleneck in typical BI applications is slow disk or even slower database access, which is hundreds of times slower than RAM access. Of course, disk access is not the only bottleneck, so in-memory tools are not hundreds of times faster overall than disk-based tools.

But that performance triggers a virtuous cycle: if an application is intrinsically very fast by default, it doesn’t need complicated data structures to further optimize performance. Even a simple, inefficient RAM-based application is likely to be at least an order of magnitude faster than even a very well optimized disk-based application. That saves time and effort (which translates into reduced consulting costs) when developing applications, and makes it easier for in-memory applications to re-structure dynamically or recalculate on-the-fly. This ease of development and flexibility is perhaps the biggest advantage of in-memory BI.

However, in-memory tools, though simpler to build than disk-based tools do need to hold data as compactly as possible. It is not enough simply to load disk structures into memory, as these tend to store data rather inefficiently. The best in-memory BI tools can take an order of magnitude less space for the data than it would take in an RDBMS.

The current wave of in-memory euphoria was started by QlikTech and other smaller vendors, but SAP, IBM, MicroStrategy and Microsoft have now joined the bandwagon.

Why has in-memory BI suddenly become so fashionable?

In-memory applications have always been very fast, of course, but two other more recent developments help in-memory BI: the plunging cost of memory chips and 64-bit computers. One less obvious trend that helps in-memory BI is that modern computers crash much less often than in the past, so there’s little risk of randomly losing all the work performed in a session.

At first glance, 64-bit computing sounds like the most important breakthrough, as it allows access to far more RAM than the two or four GB accessible on 32-bit systems. However, it has always been possible to access more memory than the operating system can address, using techniques like bank switching.

Back in the days of 16-bit computers, these techniques allowed applications to access more than the maximum 640 KB of conventional memory. For example, in 1988, Lotus, Intel and Microsoft created a specification for an expanded memory system (LIM-EMS) that allowed 64 KB chunks of applications data to be paged into the reserved upper memory space – it may not sound like much today, but if most of the conventional memory was already filled with programs, an extra 64 KB for data could double the space available for application data (such as Lotus 1-2-3 spreadsheets).

Microsoft and Digital Research then added more features to their MS-DOS and DR-DOS operating systems that allowed extended memory to be used beyond the 1 MB addressable by 16-bit processors, for example to hold TSR programs and DOS components or as a high speed RAM disk.

Thus, long before 32-bit computers became established, 16-bit DOS computers routinely used more than the supposed upper limit of 1 MB of RAM. Exactly the same would have happened with 32-bit computers and their 4GB limit (once thought unimaginably high), but 64-bit processors became widely available at affordable prices early enough for these techniques not to be re-invented.

So the real breakthrough is the plunging cost of RAM, for which the BI industry can take no credit. It is the Asian semiconductor industry’s sustained heavy investment in advanced chip foundries that has made in-memory applications of all types practical and affordable. And that investment was not even driven by the needs of the BI industry, but consumer demand for electronic devices with ever more RAM. So today’s in-memory BI wave is thanks to demand in the consumer hardware market, rather than innovations in the business software industry.

So just how new and innovative is in-memory BI?

The surprising answer is that in-memory BI came before disk-based BI, probably because in-memory programs are easier to write. Programmers would rather not have to write code to constantly move active data back and forth between disk and RAM.

The first multidimensional tool was APL, whose origins were in the 1960s, based on a book called A Programming Language published in 1962. The first usable implementation of APL was in 1967, on the IBM 1130 mainframe. It is ironic that IBM, which pioneered in-memory multidimensional BI in the 1960s, had to buy TM1 (through its Cognos acquisition) and Cognos Planning (whose origins before Adaytum were in an IBM APL-based product called Frango, developed in the early 1980s) to re-enter this market segment 40 years later. Analyst, the oldest component of Cognos Planning is still written in APL, and even the newer components use an APL-like language.

Most other early modeling (which we would now call BI) tools also used in-memory architectures. Indeed, I built very complex oil tax models in the late 1970s using a now almost-forgotten in-memory financial modeling product called FCS. Back then, only 320 KB RAM was available for both the FCS software and the 30+ year tax models on the timesharing mainframe (which served numerous concurrent users with much less CPU power and RAM than a modern cell phone). Of course, we would now have to call it Software as a Service (SaaS) in-memory performance management (PM) – except that I doubt that any of the modern BI tools could handle the complexity of the tax rules I had to model more than 30 years ago. And don’t let anyone tell you that application development is faster in modern tools: I could respond to the frequent changes of the tax laws within a day or two.

So why did in-memory BI go away for so long?

It didn’t. By far the most widely used tool for BI applications today is Microsoft Excel, which has always had an in-memory architecture. So did Lotus 1-2-3, the product it defeated in the market in the 1990s, and VisiCalc before that. And Lotus Improv was a short-lived, in-memory multidimensional spreadsheet that would clearly be described as a BI tool if it was still on sale today.

Of course, spreadsheets are not marketed primarily as BI tools, but conventional BI products like Cognos PowerPlay (first released in 1990) also started out as in-memory tools. This is a short extract of the PowerPlay review from the original edition of The OLAP Report from 1995:

“PowerPlay is usually installed as a stand-alone PC product, with a memory resident database loaded from pre-prepared files. It is this somewhat simpler architectural option that makes PowerPlay so easy to deploy in large numbers; this same architecture also limits its capacity.”

Exactly the same could be said of any of the modern in-memory BI tools, though of course that capacity limit is far higher today than in the 1990s. Later, PowerPlay, like most other long-lived BI products, moved to a disk-based architecture in order to handle more data. This wasn’t mainly because it hit addressable RAM limits, but simply that, by modern standards, RAM was so expensive in the 1990s that the trade-off of speed vs capacity favored disk-based solutions for even medium-sized applications.

However, TM1 (first released in the mid 1980s) has always been, and remains, a pure in-memory OLAP engine, as are the several similar products inspired by it (Alea, now Infor PM OLAP, PowerOLAP, proCube and Palo). In fact, TM1 is not only the longest established in-memory BI product currently available, but also the longest surviving BI product of any type – so in-memory architectures do seem to have a sustained advantage.

Will in-memory solutions kill off disk-based BI?

Products don’t have to stick rigidly with one architecture or the other. There is nothing to stop designers of disk-based products from also including an in-memory option, or simply using large caches to optimize disk performance (just as disk controllers do). Indeed, CPUs also include small, high speed memory caches to minimize wasted cycles. However, products designed and optimized for pure in-memory architectures will outperform disk-based products that simply take advantage of available disk caches, as such products will still shunt data around unnecessarily, and have redundant indexing.

For example, back in the 1990s, Holos had both in-memory and disk based structures that could be freely mixed in a single application, and MicroStrategy now has similar capabilities. Microsoft’s new PowerPivot cubes offer similar capabilities when promoted to Analysis Services.

In any case, disk-based products automatically take advantage of RAM-based disk caches, while in-memory products automatically take advantage of (disk-based) virtual memory if they run low on real RAM. Large, low cost flash memory – which offers non-volatile storage just like disk drives, but is much faster – further confuses the picture. So, with today’s sophisticated computers, there is little real distinction between the two architectures, and both will continue to co-exist, often in the same product and even in the same application.

In other words, the venerable 40 year old in-memory BI architecture was not defeated by the new generation disk-based BI architectures that came along a few years later, but nor did it triumph either. One approach is not ‘better’ than the other; both are useful tools in the software designer’s armory. But the growing availability of masses of low cost RAM will swing the pendulum towards the in-memory BI direction.

The next breakthrough?

In-memory BI eliminates the traditional disk input/output bottleneck, so to gain yet more speed, attention must switch to the next bottleneck: overloaded CPUs. Processor clock speed acceleration has ground to a halt, because of overheating – modern CPUs have more cores than ever, but the individual cores are not much faster than their predecessors.

So overall CPU throughput continues to improve, but individual tasks are no longer speeding up very much. At the very least, in-memory BI tools need to be designed to take maximum advantage of multi-core CPUs — 12 or more cores per CPU will soon be available, and high-end BI applications should automatically exploit them all concurrently, so that even individual tasks should be multi-threaded.

But most desktop computers and workstations, as well as some servers, also include at least one other, much faster processor: the graphics processing unit (GPU). GPU speeds tens or even hundreds of times faster than conventional CPUs have been reported for some scientific computations, so could exploitation of GPU accelerators be the next BI performance optimizer? Research projects in this area have been underway for some time, and commercial products are imminent. This could lead to some very dramatic acceleration of calculation-intensive BI applications, possibly allowing OLAP engines to be used for entirely new classes of business problems, such as large econometric models.


This page is part of the free content of The BI Verdict, which represents a small fraction of the information available to subscribers. You can find out more about the benefits of purchasing a subscription here or register to access a free preview of a small sample of the large volume of subscriber-only information.