Approximately 328.77 million terabytes of data are created every day, which represents a data creation of 120 zettabytes per year (1 ZB = 1 * 10 GB). The majority of data is created on social networks. At this point, a first question arises: What percentage of this data can generate value for us? The answer to this question is neither simple nor straightforward, as complex ETL processes need to be performed depending on the type of data, and the choice of tool depends on the tool available to the client. This is where we can begin to understand some advantages of Qlik Sense over other tools. A memory size limit of 1G in a PRO version is not sufficient for the amount of information we need to analyze. Additionally, we need to consider the response time in the pipelines of the data analysis process and the historical data to display and analyze.
The advantage of extracting information and storing it:
In a compressed QVD format, it translates into reading speeds between 10 and 100 times faster than other sources, not to mention that we do not impact the database by making repeated queries. Additionally, we can consolidate our data, which can be consumed by different applications. One of the functionalities is incremental loading, which is possible by using QVDs. In the extraction layer, we only query new, updated, or deleted data. To implement delta or incremental loading, we can apply the following steps:

Determining the latest data, maximum ID, maximum Creation Date, maximum Update Date, etc.

Constructing the query according to that last update flag using the clause.

Concatenating the new query with the stored QVD.

If there are deleted data, we use INNER JOIN with the entire set of keys from the database with the stored QVDs.
A similar technique can also be applied to QVDs that store transactional data in the transformation layer. The idea is to divide a large QVD into multiple small QVDs. This segmentation is useful when analyzing the last N periods is required. Another reason is when there is a limitation in resources in the incremental load since it involves reading and writing the entire QVD, which impacts time and computing resources, which will always increase over time.
Steps to segment a QVD, let's use the following QVD with the following data as an example:
- Transaction_ID: Primary Key
- Transaction_Date: Date the transaction was made
- Update_Date: Date of the row or record update (generally has a maximum update limit of 60 days after creation)
- Product, Customer, Category, etc.
Our goal will be to create QVDs per period or month "Transactions_YYYYMM.qvd."
This can be done depending on the data brought by the extraction and the data previously stored in QVD. For this, we use the WHERE NOT EXISTS(Transaction_ID) statement. This process only occurs when a full reload is requested and data is added or removed due to fortuitous cases.
Below, we provide a small code in which we perform segmentation by month of a QVD.
TempMeses:
LOAD DISTINCT
MONTHSTART(Fecha_Transaccion)AS TranMes
RESIDENT
Transaccion;
FOR i = 1 TO FIELDVALUECOUNT('TranMes')
LET vMonthName = DATE(FIELDVALUE('TranMes', $(i) ),'YYYYMM');
SET vQvdName = Transaccion_$(vMonthName).qvd;
MesTransaccion
NOCONCATENATE LOAD *
RESIDENT Transaccion
WHERE
MONTHSTART(Fecha_Transaccion) = FIELDVALUE('TranMes', $(i));
IF FILESIZE('$(vQvdName)')> 0 THEN // si el qvd existe consolidados los qvds LOAD
*
FROM[$(vQvdName)] (qvd)
WHERE NOT EXISTS(ID_Transaccion);
ENDIF
STORE MesTransaccion INTO [$(vQvdName)] (qvd);
DROP TABLE MesTransaccion;
NEXT i
DROP TABLE TempMess, Transaccion;
This is how we can efficiently and optimally handle large volumes of data through the Qlik Sense application. We can also discuss an interesting functionality for large volumes of data called On-Demand Apps, where we can already use techniques similar to data segmentation, although this functionality is more comprehensive and oriented towards Big Data. At Wadua, we apply the best practices in code and platforms, adapting to the needs of the client.
How to solve incremental loads in Qlik?