Power BI – Quick Tips working with multiple large CSV Files

I have recently been working a lot with multiple larger type CSV files, and thought it would be good to share what I have found that makes the process run faster. I am always looking at ways to make sure that not only is the model efficient, but also the development experience.

Name your query first before applying steps in Query Editor

I have found that when working with csv files, if you rename your query in the Query Editor, it will then go and complete all the Applied Steps in your query. This in my case meant waiting a few minutes for a simple rename to take effect.

So always remember to give your query a name first, before doing any additional steps.

If possible combine all csv files from separate files into ONE file

I found that when using the Folder to import my CSV files, that when I put all the data into One file it loaded the data significantly faster.

Disabling Background Data in Options

When working with multiple tables that all load off the same CSV files, I found that Power BI desktop would start trying to refresh data in the background, and it would either make the Query Editor I was working on very slow. Or I would have to wait until the background data preview to complete. By turning this off it meant you only would refresh the data you are working on.

To disable this click on File, then Options and settings

Then click on Options

Then go down to the section called CURRENT FILE and click on Data Load

Now in the right hand pane under the section Background Data, remove the tick from Allow data preview to download data in the background. So that once complete it looks as shown below.

Then click Ok.

Now to ensure that it does take effect I would suggest closing and then opening Power BI Desktop.

Advertisements

7 thoughts on “Power BI – Quick Tips working with multiple large CSV Files

  1. Is there any documentation or experience on exactly what the different data load options do in terms of performance – particularly on loading a pbix file from scratch. Use case is 2 x years of locally stored daily 30MB csv files each containing 100,000 records. Final queries are broken down into historical data (not refreshed) and current data (for last month refreshed daily) – but even this takes two hours on a 16GB machine….any suggestions greatly appreciated.

    • Hi there Patrick

      All the suggestions as part of the blog post really do help. If possible I would suggest putting all your historical data into one CSV file which should then increase the loading performance for the historical data. Another option is that you could create one table called Historical which has all the historical data, and right click on the table and de-select “Enable Load” and de-select “Include in Report Refresh” (What this will do for the historical data, is it should load it once and be part of the model, but remove the requirement for it to be refreshed. Also by disabling the loading it will not the data into the data model). Then have another table which has your current data.

      Finally Merge the 2 tables together, which should hopefully increase the performance loading.

      Another thing that I did do, was when I was developing to ensure that the files were as close (or local) as possible to the Power BI Desktop application. This is so that I could reduce the slow speed when going across the network. If you have it on a server, make sure that you can connect via the LAN and that the Server has got potentially Jumbo Frames enabled.

      Also do you have a lot of transformations happening after you import the data? As this could also potentially slow it down (which will be seen by a High CPU Usage) if this is the case the High CPU (potentially running at 100%) could be another bottleneck to look at.

      And finally there is the potential to have an On-Premise Gateway installed on a Server (with the Power Plan set to High performance). The server I would hope would have decent CPU’s, Memory and connectivity to the source data. This is because if you refresh from the On-Premise Gateway, where the Gateway is installed this is where it does all the refreshing.

      Sorry one last point, is if possible an alternative is to load the CSV data into a SQL Server database, where you can manage the data better using SSIS, and then simply consume it into Power BI using SQL Server as the source? SQL Server and SSIS would be a lot more efficient at not only loading the data, but as well as making sure it only loads the new data and not the older data.

      As well as if you are looking to use Power BI Premium they are going to have an option for Incremental loads (coming soon) which will also help the loading of your files.

      • Absolutely agree with split and only loading current data – and have already tried breaking the files into monthly gulps but it seems that about 10 x files (approx 1M rows and 300MB) is the practical limit for containment within the constraints of RAM and CPU locally. I’m doing all of this on the local box until I get the balance of performance and response right – then plan to move the 2 x data sets to the cloud so the historical data will never have to go through the query transformations again !

  2. That is good to know that you have gone through the pain of testing them out as individual files. One thing that I have not tested is if the performance is the same, better or worse if they are excel files? No idea, but I wonder if it will be worth all the effort.

    • Absolutely – Thanks for that !

      I should have also pointed out that the 30MB daily csv actually contains 2 effective data tables (once transformed) – so it might be a case of separately pre-processing (multiple times) in Excel to the limit of the 1M rows as a one off for the historical data….

      Cheers

      Pat

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s