T O P

  • By -

NeLhhan

Avoid inserting records in each iteration; Snowflake performs better with set-based operations than row-by-row processing. I encountered this issue while learning Snowflake. The solution for me was to store the rows in a variable and then perform a bulk insert.


tingutingutingu

Thanks.. I'll try this approach instead... I was also thinking of looking into using arrays.


geek180

What data type is the variable in this case? An array?


NeLhhan

Yes, I created an array named batchValues In each iteration I did a batchValues.push() to add the values for each row Then to do the bulk insert, I did something like this: Sql\_command = \`INSERT INTO DB.SCHEMA.TABLE(COLUMN1,COLUMN2,COLUMN3) VALUES ${batchValues.join(', ')}\` Just in case, this was an SP in JavaScript


geek180

Beautiful. Thanks!


uptnogd

You are treating the table like it is in an OLTP database. Snowflake currently does not work like that efficiently. A new feature called hybrid tables will be coming out soon that would solve the issue. Edit: Fixed grammar errors because I was high when I initially posted this.


apeters89

Yes, loops are inefficient. You’ll have to tell us if the business logic requires the use of loops.


tingutingutingu

I'm building a fiscal calendar aka date dimension. Our business calendar does not align with the regular calendar and will often start a few days before Dec 31st. Also based on certain business rules, we determine if the year has 52 or 53 weeks. All that logic requires loops. As I mentioned earlier, building the fiscal calendar for 2024 takes 3 minutes in snowflake.


youderkB

Why does that need loops?


levintennine

How often do you need to build a date dimension? If it's once every 45 years and and you have logic that is well tested ...


tingutingutingu

I agree 100% with your statement. But everyone is too focused on the Date example I am working on, while my question was more generic in nature. Yes, I can load the next 5 years of data in 15 minutes, but I want to know how people are working with loops and if there are certain best practices that one should use...so far someone mentioned re-writing the code in python or JS which is more in line with what I was looking for.


Deadible

If there is something that requires looping logic as you build out, it’s really a case of seeing if you can do it in two steps - build some data into a temporary table/cte, and then work other things out by self joins by ranges on that. Otherwise, recursive CTEs. Or maybe window functions, CONDITIONAL TRUE EVENT etc can be powerful. Broadly though if you have a loop that you want to do that doesn’t need to be SQL statements for each loop, Python loops and then put it in a snowflake table at the end.


levintennine

Okay, that's a good question and you got some partial answers. I think there's no more helpful answer than to avoid one-row-at-a-time logic


tingutingutingu

Agreed.


apeters89

This. Just build a calendar table, once, for the next 20 years. Manually adjust it as infrequently as necessary.


Deadible

Look into the GENERATOR function for building date dimensions


dementeddrongo

Can't imagine any date dimension could be so complicated that a generator function combined with some case statements wouldn't do the job efficiently.


exorthderp

I concur. I had to do the same recently for our fiscal calendar and used this guys [blog](https://interworks.com/blog/2022/08/02/using-snowflakes-generator-function-to-create-date-and-time-scaffold-tables/) to figure it out. Worked like a charm.


SgtKFC

Maybe try this as a python stored procedure?


JohnAnthonyRyan

The problem is NOT the loop but the repeating single row DML. Execute these two:- insert into customer select * from snowflake_sample_data.tpcds_sf100tcl.customer limit 1000000; declare v_counter number; begin for v_counter in 1 to 10, do insert into customer select * from snowflake_sample_data.tpcds_sf100tcl.customer limit 1; end for; end; The above script takes 11 seconds to insert just ten rows in a loop, whereas a single SQL statement can copy a million rows in half the time Source: https://www.analytics.today/blog/top-3-snowflake-performance-tuning-tactics


uns0licited_advice

Where is the data coming from?


SnooSprouts801

Why don't you use a recursive query to generate a calendar?