Thursday, October 20, 2016

Improve pivot table import performance (tdf#102694)

After a short break I got back to Collabora and I continue working on LibreOffice. This time I handled a performance issue in pivot table's import code, as part of our work for SUSE. In some cases importing an XLSX document containing more pivot tables took such a long time, that user could interpret it as a freeze.

After some testing we found that pivot table grouping is one of the points which makes import slow. We means me and Kohei Yoshida, who is an expert in this area of the code and helped me with understanding it. So grouping was in our focus this time.

Pivot table groups

In case of pivot tables we can make groups for columns of the source data. This groups can be name groups (general groups), number groups (e.g. number ranges) and date groups (e.g. grouped by year, month, day, etc.). Since this groups are related to the source, they rather part of the source than the pivot table layout. This is true both for MS Excel and LO Calc implementation. Effectively this means that when we have more pivot tables using the same source they will have the same groups too. In case of LO Calc this groups are stored in the pivot cache linked with the corresponding source range.

Performance

The performance issue here was that however these kind of pivot tables (having the same source) were linked to each other by the pivot cache, XLSX import ignored that and worked expecting pivot tables are fully independent. Which means that same groups were imported so many times as many tables referenced them. What makes it even slower, this kind of tables are linked to each other on a way that when one table's grouping is changed other tables are also affected.

This difference between internal handling of pivot tables an the XLSX import code came from that XLSX import code was written before groups became part of the pivot cache. So I actually needed to update the import code to follow the changes of pivot table internal implementation. With that, pivot table groups' import time became quite good. For example the test document I uploaded to bugzilla (tdf#102694) took more than 20 minutes before and now it takes less than a half of a minute. This document contains a small data table and 20 simple pivot tables. So it's not something which should take so much time to load and now it doesn't.

No comments:

Post a Comment