OLAP (Online AnaLytical Processing) is a concept for conducting swift multidimensional analysis of data (MDA), enabling the examination of data from multiple dimensions. This allows for complex calculations, trend analyses, and sophisticated data modelling.
It's important to clarify that OLAP is not a technology, nor a language (unlike SQL or Java). OLAP is a concept / method for organizing data in a certain structure to facilitate multidimensional analysis. The term “OLAP tool” refers to platforms, such as icCube and Microsoft Analysis Services (SSAS), that provide the necessary infrastructure for building OLAP data structures.
Why use Olap?
The core features of OLAP are :
Multidimensional: transforms complex data into structural and intuitive business concepts, making it easier to analyse data across various dimensions (e.g., time, geography, product).
- Analysis: data aggregation such as sum, minimum, maximum, average, vector and more. Supports advanced analytics such as statistical methods, regressions, algorithms, etc.
- Fast Query Performance: Optimised for reading and analyzing multidimensional data.
- Ad-hoc Analysis: Users can conduct spontaneous, on-the-fly queries to explore data without needing pre-defined reports or dashboards.
Schema or Model
Before doing OLAP analysis on your data, you first create a model or schema from your data sources or data warehouse. The most common data sources are SQL databases, though they can be data coming from any source (e.g., flat files, IoT, Google Analytics). The goal is to transform the data from technical nomenclature into domain-specific terms that a business user can easily use aka a Semantic Layer.
Dimensions are a categorisation of your data. Typical dimensions are country, people, age, product, color. Dimensions can be hierarchical, like in a time dimension with years, quarters and days, or in a geography dimension with continent, region, country and city.
Measures are metrics, that can be aggregated as sum, minimum, maximum, average (e.g., sales amount, temperatures).
Once the schema is defined, users can now start querying the data. Drag and drop dimensions and measures to generate information, such as a table with continent, year, and sales amount. This is called dicing (choose specific values of multiple dimensions to create a sub-cube):
The ability to show aggregated information in each cell is known as rollup. Although thousands of sales may occur in Africa in 2018, only the summarised, or aggregated, information is displayed.
Filtering the above table, e.g. for a given product, is called slicing (choose a single value for one dimension to create a sub-cube).
Drilldown, the opposite of rollup, allows for going into the details of a continent or a year, taking advantage of the hierarchical structure of the dimension. The end result will look like :
Rollup and Drilldown, along with dicing and slicing, form the basis of OLAP interactions.
Business Intelligence (BI) tools may serve as clients to an OLAP server, providing the visual result (through charts and filters) of OLAP queries.
The view of dimensions in the axis and values in the cells is why you can think of OLAP as a cube. But this is a conceptual view, not a physical one.
How does OLAP work?
We can use a very basic example, as input data we have a table with a few column :
OLAP when calculating the sales for Asia in 2018 is going to filter the table for all rows with Asia and 2018. On this filtered subtable aggregate the measure for the expected value, sum for sales (500+500 = 1000). For those familiar with SQL, this is similar to a group by operation.
This is a very basic example as OLAP solutions support advanced aggregation and calculations, including statistical, regression, comparison, and ranking analysis. More advanced solutions allow the dimensions themselves to be mathematical transformations.
OLAP query languages
Unlike relational databases that use SQL, OLAP does not have a standardized query language.
However, in 2001 Microsoft released MDX that stands for Multidimensional Expression. It's a language used by different tools that support OLAP. Excel is an example of a client using MDX, an Excel Pivot Table can be connected to an MDX-supporting server through the XMLA protocol.
Other tools not supporting MDX will have their own language that might be translated to SQL. If you are interested in MDX, you can read this Gentle Introduction to MDX.
Types of OLAP
MOLAP (Multidimensional OLAP): The system loads the underlying data into an internal structure. This structure can be a file on disk or it can be an in-memory structure.. This provides fast answers with the loading time equal to that of the memory/file system. The advantage of having this separate structure is that queries access data in this internal system, and not directly the data sources. This means that analytical queries, which can be quite heavy, do not slow down production servers.
ROLAP (Relational OLAP): The system will perform the analysis directly into the underlying relational data source, an SQL compatible database. This provides a slower answer and does not have the analytical power and flexibility of MOLAP but there is no need to load the data and require local resources such as memory or disk space. Tools transform end user queries into SQL then performed on the production database.
HOLAP (Hybrid OLAP): Combines MOLAP and ROLAP, presenting technical challenges due to the complexity of integrating both models.
icCube OLAP
icCube is based on in-memory columnar data stores and lifts several of the constraints that made Multidimensional Cubes (OLAP Cubes) IT intensive and rigid, while at the same time maintaining the compatibility with industry standards in the reporting and analytics world.
Here are a few of the constraints icCube lifted:
No more need to restructure data into a star or snowflake schema, which is complicated to perform at source level.
- Moving out of trivial key-like associations (e.g., many-to-many, ranged).
- Mapping through a semantic layer.
- Categories allow introduction of new dimensions on the fly to simplify the modelling process.
- Dimension members that are calculations (e.g. convert temperature to Celsius or calculate a moving average).
- No pre-aggregation is required (all aggregations are computed on the fly).
- MDX+ language (functional support, object-oriented extensions, and many new helper functions).
- Client interface to manage schemas, deployment and loading.
When not to use OLAP?
OLAP is designed as a read-only system, distinguishing it from transactional systems such as relational databases (RDBMS) or Online Transaction Processing (OLTP) platforms. Unlike tools for data scientists, OLAP is not optimised for performing extensive calculations on vast datasets.
Next Steps
Now that you have a basic theoretical understanding of what OLAP Hypercubes are and what Multidimensional Analysis means, we recommend you to check our other Blog Post giving you some practical examples of OLAP as well as to look up our Gentle Introduction to MDX in our documentation.