Power Query: Xlsb vs Xlsx

Recently I helped my colleague with a report. The task was to consolidate data from multiple Excel files, each contains 100k-500k rows. Hence, size of each Excel file is relatively big.

I proposed to use xlsb to store those files. One file per month, report needs R24m, so 50 MB is less than 150 MB.

In the end I finished with deep analysis what is better as Excel-files database for Power Query – XLSB or XLSX.

Resume

For PQ solutions use XLSX instead of XLSB

PQ from XLSX twice faster than from XLSB

Engine handling XLSB from time to time return error – cannot read file

“UseHeaders” parameter of Excel.Workbook function works differently for xlsx and xlsb

If sheet and table names are equal in Excel workbook – Navigator generates unique name of object.

If you interested how I revealed all this – continue reading.

Files for tests

I thought that PQ works faster with smaller xlsb files, but I was wrong.

In the same time I knew that some people had problems with xlsb, e.g. issue.

Nevertheless, I started with xlsx with idea in mind to switch to xlsb later.

First surprise for me was that if sheet and table on this sheet have same name – PQ engine renames table

Don’t worry, despite in Navigator we see DATA1, M-code access table object with name DATA.

let

Source = Excel.Workbook(File.Contents(“C:\Temp\PQ Samples\2015 09.xlsx”), null, true),

DATA_Table = Source{[Item=”DATA”,Kind=”Table”]}[Data],

#”Changed Type” = Table.TransformColumnTypes(DATA_Table,{{” Lcurr.”, type text}})

in

#”Changed Type”

Then I saved xlsx file as xlsb

And tried to apply same code

let

Source = Excel.Workbook(File.Contents(“C:\Temp\PQ Samples\2015 09.xlsb”), null, true),

DATA_Table = Source{[Item=”DATA”,Kind=”Table”]}[Data],

#”Changed Type” = Table.TransformColumnTypes(DATA_Table,{{” Lcurr.”, type text}})

in

#”Changed Type”

Got an error and analyzed query step-by-step.

First step – get Excel file structure – similar formula – different result

xlsx
xlsb

Structure is different. We see “DATA” object for xlsb, but it is not clear – is it a table of sheet.

I renamed sheet and checked this step once again

OK – column Name contains sheet names.

Second step – Navigation – didn’t work for xlsb

xlsx
xlsb

Error, as expected, because first step for xlsb doesn’t contain Kind and Item columns.

Not a big problem, just need to fix navigation step.

Nevertheless, this step that doesn’t allow simply switch from xlsx to xlsb.

One click on Table in Data column and we see result:

Now you probably want to apply Table.PromoteHeaders function, but wait, read description of Excel.Workbook function

Second parameter – useHeaders, will help to avoid additional transformation step. Turn it on and query XLSB

Wow, this was not expected.

Be careful with this parameter – some characters will be replaced by # if you work with xlsb.

Dot – definitely, rest special characters – I didn’t test.

Let’s save solution with PromotedHeaders and check speed.

Transform query into function

Let

fGetXlsb = (path as text) =>

let

Source = Excel.Workbook(File.Contents( path ), null, true),

sheet = Source{[Name=”DATA”]}[Data],

#”Promoted Headers” = Table.PromoteHeaders(sheet)

in

#”Promoted Headers”

in

fGetXlsb

Do the same for XLSX and load data to Data Model

xlsx First try – 43 sec

Next try: 38 sec

xlsb Oops, first try gave me

I saw such error several times in my practice of work with XLSB

Second try – 84 sec.

Next try: 82 sec

Simple VBA code that check Power Query refresh with data load time

Sub test_time()

Dim start as Single

start = Timer

ThisWorkbook.Connections(“Query – QueryXLSB”).Refresh

Debug.Print “xlsb: “; Timer – start

start = Timer

ThisWorkbook.Connections(“Query – QueryXLSX”).Refresh

Debug.Print “xlsx: “; Timer – start

End Sub

What I also noticed – cosmetic difference in PQ editor

xlsx PQ editor shows MB

xlsb PQ editor shows ROWS

I saw somewhere – “DAX is easy but not simple”, same is relevant to Power Query.

Advertisements

5 thoughts on “Power Query: Xlsb vs Xlsx

  1. Daniil Maslyuk 2016-12-16 / 09:21

    Ivan, excellent post!

    I’ve been working on a solution that combines data from several files with macros in them. My first choice was XLSB, because it can contain macros, and it is usually smaller than XLSM.

    I ended up switching to XLSM (essentially XLSX with macros support, right?), because of the above-mentioned “External table is not in the expected format”.

    Also, occasionally my consolidating file was opening one of the consolidated files! I still have no idea why one of my consolidated files (always the first one alphabetically) was being open.

    Thanks for the post!

    • Ivan Bondarenko 2016-12-16 / 20:54

      Hi Daniil, thanks for positive feedback. Advice is simple, if you want to combine data with Power Query, use xlsx or xlsm. Seems PQ works better with these two formats.
      However, if you want to continue with VBA, than xlsb is a better option, as it has smaller size and is opening faster.
      What can open a file is usually a macro… maybe you have it on Workbook_Open event… Hard to guess.
      Good luck with your solution!

  2. AshT 2017-02-04 / 06:43

    Evan – Great post. Its really useful to know about this different. It saved a lot time wastage due to file format issues.

  3. Wan Hadi 2017-08-08 / 04:34

    Thanks for the info.. really helpful

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s