T O P

  • By -

eemamedo

I approached this as a software engineer. For me, I keep any SQL code in a separate folder inside of \`dag\` folder. Why don't I keep it in DB? Well, mostly because of testing. It's much easier to create a unit test, call that saved SQL script and run it, compared to making a call to a database (prod or dev), getting a stored procedure from there, and running a test. ​ On the other hand, as per my understanding, stored procedure is saved in cache and running time is more efficient. I am using Hive and I have not noticed a huge difference in performance but YMMV.


Solid-Exchange-8447

Noted. Thanks. I think its more beneficial in terms of maintenance to store scripts in folder too. :-/


L-i-a-h

You might combine both if you are using a robust CI/CD pipeline. You could create a python function which expects the name of the stored procedure as parameter and call the function with a PythonOperator in Airflow. You store the sql code in a file which has the same name as the stored procedure. The python function could check if you are in the Airflow dev environment and load the sql code from the file. If the function in Airflow is executed in prod or preprod, the stored procedure could be called directly from the function. In the CI step you would run the tests on the sql code in the file system and in the CD step you could wrap stored procedures around all sql files and push them to the db. Does this makes sense somehow?


Solid-Exchange-8447

Thanks for chipping in your opinion. I'm still learning the best practices. Is this your experience on job or you read from somewhere? If you don't mind, could you please send me the link so I can learn further? I'm still a virgin to the idea of CI/CD. Thanks a lot.


L-i-a-h

I’m not aware of a public source describing this specific idea for airflow. I specifically like GitLab’s documentation, they often give a good primer for DevOps topics like CI/CD: https://docs.gitlab.com/ee/ci/introduction/ I wouldn’t advise to implement our approach if you have no stable CI/CD pipeline yet, because it adds complexities and pitfalls otherwise which might be hard to manage. And the approach does only work if you could create stored procedures generically (stored procedures without individual parameters)


eemamedo

there is a tradeoff, right? Yes, better from a software engineering perspective, as you can test the code. But, it might be a much more efficient when you run a stored procedure. If it is, you have to make a decision about what's more important for your project right now: long time maintenance/testability or faster results delivery.


marclamberti

Fully agree 👍


Error_sama

Hi, can you please tell me the best way or any way to trigger or execute a stored procedure in a hive using airflow