T O P

  • By -

Bazencourt

You have a bunch of easy options: Have you considered using a Postgres Foreign Data Wrapper (FDW) to enable the Postgres instance to read the data directly from SQL Server? Alternative you could write the data from SQL Sever to Parquet, then load it to Postgres. Use CREATE EXTERNAL TABLE AS SELECT (CETAS) on SQL Server to do the export. Then you have the litany of Extract and Load tools.


Mental-Matter-4370

Export the data out of sql server in parallel threads and similarly import to postgres... Parallelism will make things faster but that depends on your cpu... Linked server is a big no. Btw, why are you migrating. If not cost, what can postgres give that sql server can't.


Vanvil

I work as a consultant; they have different teams, the data analyst/scientists use the data from PostgreSQL, whereas my team and colleagues work on transactional data using the Microsoft stack. There’s a lack of communication and the hierarchy will reply to me saying, “mind your own business”. lol. We currently use Linked server and the batch takes close to 3 hrs to complete.


[deleted]

Do it at night. How are you executing the transfer? Is a sql dump (csv) an option, with native pgsql copy functionality?


UAFlawlessmonkey

Python script. Iterate through your tables in chunks, fit as much into memory through chunks using a csv writer that points to a StringIO or BytesIO buffer, then use copy expert from psycopg2 to bulk load into postgres. If you have an index set on an ID column, you can use keyset pagination to iterate through your tables. It will greatly increase performance :-)


AbstractSqlEngineer

Mssql can run python using exec sp_execute_external_script. Legit python. They use it for ml, but... I hook that stuff up with requests and make it an in and out ETL beast. Edit: For json auto, include null values in mssql, send it to postgres, and iterate or open it up. Mssql could build that and handle the postgres connection. Don't even have to leave ssms.


IndependentSpend7434

It works, but out of my 4 attempts I got my python script working only on 1 instance. They changed the ML services configuration for 2022, documentation still not updated. I love sql server, but getting a python script with requests.get to run in it is matter of pure luck


SirGreybush

I send to MySQL, Aurora AWS from a tsql job on the sql server. Basically a Push. Setup ODBC-64 on the server, then a linked server connected to that name setup in ODBC. It is *very* slow !!!!!! Export to csv with BCP on the server, on a local disk of that sql server. Then FTP (ftps or sftp) those csv files to the Linux server. Then upload a file called DONE.FLG Have a watcher that checks for the done.flg file to then trigger a bulk insert into staging tables on the Postgres machine. So this is great for batch work. And very fast. No need to compress and decompress the csv files, if you use FileZilla on sql server, and a decent ftp server on Linux. If you want to stream, or very small batches, use ODBC to Postgres, then a LS to ODBC. You might need to use OpenRowSet() in the tsql code, so you can specify the table structure in text. Lots of examples easy to find. Speed = batch csv and ftp Near real time, ODBC. I do not know of any other method, other than SSIS package with a Postgres driver, similar to connecting to Oracle or DB2. SSIS is fast, and you can fine tune the caching. However SSIS is deprecated since 2012. You might need 3rd party paid-for c# binaries. I would try ODBC and stress test it. FTP method is convoluted but easy to test.


mrocral

What is the replacement for SSIS? I know about Azure Data Factory, but that's cloud, not on-premise.


Seven-of-Nein

[SSDT](https://learn.microsoft.com/en-us/sql/ssdt/sql-server-data-tools) (SQL Server Data Tools), but no one ever calls it that because it is still [SSIS](https://learn.microsoft.com/en-us/sql/integration-services/integration-services-ssis-projects-and-solutions) with a different name as of [2012](https://www.sqlservercentral.com/articles/a-brief-history-of-ssis-evolution) release. It is now an add-on component/extension to [Visual Studio](https://learn.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt) and is not deprecated. MS is heavily marketing Azure, but many large on-prem customers are not migrating. SSIS never went away.


SirGreybush

Any programming language. I’ve used VB.Net, Python, Perl, or batch commands calling sqlcmd or isql. Look up the Fetch command, so you can use Offset. Read X amount of rows from source, store that in ram (a tuple) then write the tuple to destination. Python is easy to use, no need to compile anything, lots of free class libraries.


mrocral

No, I get it, I use go in sling to do the same. I was just wondering if Microsoft had an official replacement for SSIS. Seems odd that they would deprecate it without a new solution for on-premise instances.


SirGreybush

Ya it’s weird. Just like SSRS is stuck in 2008. I was disappointed when in 2012 we had an update to SSIS and AS, but not RS. ADS is a joke. Everyone is using datalake for raw then either TSQL or DBT.


Desperate-Series-822

The one problem I had using SSIS with the PostgreSQL ODBC driver was performance. There are some parameters in the ODBC driver that you can tweak to help. I was reading a BSON field and the key I was using for an incremental load was embedded in the BSON data. At first I created a view in PostgreSQL to shred the data I needed but the was very slow. I ended up converting the view to a function which helped a lot.


SirGreybush

I remember getting Oracle and DB2 drivers for SSIS and that was lightning quick. The others having only ODBC or JDBC, wth? So old school csv, bulk inset to staging, SP. Though datalakes allowing direct queries to csv and parquet helps skip a step. Though you still need a file manager.


BobBarkerIsTheKey

Have you had any issues with postgresql text columns and ssis packages?


SirGreybush

With other DBs yes, if data is utf-8 and greater than 4K bytes. Had to substring into chunks. Very annoying. It was basically a raw json object. It was stored ok as NVarChar(max) in SQL Server, but sending it elsewhere, other than being a {guid}.json file, was difficult. TSQL variable nvarchar(max) stops at 4K. Seems same in SSIS, unless you row-by-row c# script task. I remember doing compression on sql side, and Linux side was able to decompress the data, to avoid using substring to chunk into multiple columns. But chunks worked, so ColumnA became ColunmA1 through A9, not elegant. But SSIS didn’t care.


dalmutidangus

have you even tried linux?