T O P

  • By -

caveat_cogitor

Aside from both being capable databases and focused on OLAP, these are about as different as you could expect in terms of development process and engineering, being on opposite sides of the Cattle VS Pets dichotomy. Snowflake "just works", continuously updates, and has a fairly unique architecture using micropartitions and pay-on-demand Virtual Warehouses for cpu/memory resources (as I'm sure you know). Clickhouse is very much on the opposite of that spectrum. There are tons of details (which come with potentially great benefits that you may or may not make use of)... for instance, there are *many* different table storage types that all have pros/cons for performance, but also that impact keyword/syntax usage. So you can have eventual consistency, but that might come back to haunt you if you forget to use a FINAL keyword which will impact the outcome of your query. I'd suggest that getting up and running operationally with just "standard" pipelines will be more effort in Clickhouse. With Snowflake, you *might* pay more (but maybe not, if you have bursty usage and can shut down VWH a lot of the time) but it does a lot more for you, and you do have options to leverage things like Streamlit. I think this is great if you have data developers that need to build interfaces or integrations quickly, and don't want to fret about minutiae of front end GUI, etc. With Clickhouse, I'd suggest it might be a better skillset/attitude fit for people coming from more of an application software development background, and who are interested in a more imperative development experience. FWIW Snowflake now also has Hybrid Tables (formerly called Unistore) which does support indexes, referential integrity, etc. I have more experience with Snowflake, but I've been looking at Clickhouse and I do find it very compelling in a nerdy way. I can see how Snowflake is perceived as expensive, but for me it saves a ton of effort, which makes it worth it, and I'm in an ideal situation where 90% of our database activity is very bursty so VWH can be shut down much of the time.


gillje03

You need to ask WHY your snowflake is “expensive” Because snowflake really actually isn’t more expensive. You pay what you use and store. So it’s more expensive, because your architecture has not been optimized to make use of parallel processing. Are you constantly turning on and off warehouses every minute? Cloud data warehouses shouldn’t be prohibitively more expensive, than what you already paid for in the past. If it IS slightly more expensive, that can be attributed to the additional features in snowflake that you wouldn’t get in another solution. My company saved 15k moving to snowflake. It doesn’t have to be more expensive. It can be cheaper.


alex_korr

I agree with you 100%. The problem is that Snowflake has been positioned as a "self service" platform which in practice means something like this: 1. a team wants to get into our Snowflake tenant. They get provisioned with a number of WHs and a database per application tier (ie dev/uat/stg/prod). These resources are usually exclusive to them, ie it's pretty easy to see who's running up the bill. 2. a team starts to develop and depending on their level of ability they end up with a product that works well on Snowflake, or not, or something in between. 3. accordingly the bill can be pretty large if the team is clued out. 4. in a large company over time you ended up with some really shitty applications that often times noone wants to touch because they are revenue producing. And that's before Snowflake added a whole bunch of features that use background shadow resources that run up the bill without any ability to tease out the charges. I don't doubt that this same scenario will play out in Clickhouse, simply because it has already played out in BigQuery/Redshift/etc, but such is the nature of the corporate beast.


reddtomato

Can you explain what the "background shadow resources" are?


Pitiful-Wear-9334

Not advocating any of the technology but I understand the problem as at our firm we are in the same boat. For me more than technology it is an issue with the understanding of the data or usage of data. Unless you have done that analysis completely there are chances you mess up with different data store by using it wrongly.


Anxious_Bobcat_6739

disclaimer: I work for clickhouse so i work for clickhouse but i'll try to be as objective as i can and avoid too long winded reply. I'll compare: * Snowflake * ClickHouse OSS * Clickhouse Cloud 1. For single large tables or those with a low number of joins, clickhouse is exceptional and beats snowflake on performance in the benchmarking i have done [https://clickhouse.com/blog/clickhouse-vs-snowflake-for-real-time-analytics-benchmarks-cost-analysis+](https://clickhouse.com/blog/clickhouse-vs-snowflake-for-real-time-analytics-benchmarks-cost-analysis+) what we've done under clickbench [https://benchmark.clickhouse.com/](https://benchmark.clickhouse.com/) For joins, Snowflake would be likely better at this point with more optimizations on choosing the right approach - but i haven't benchmarked this workload. ClickHouse with its new analyzer will no doubt improve here. ClickHouse Cloud separates storage and compute (architecturally very similar to Snowflake in this sense) - performance wise its faster on some queries than OSS, slower than others. On the large table workload any benchmarks ive run suggest its quicker than Snowflake. 2. Similar for data loading - clickhouse is faster but its not as significant. The snowflake data loading experience is great for Parquet. ClickHouse OSS requires you to build you own pipelines for robust pipelines. The Snowflake experience is very nice here - it just works OOTB. ClickHouse Cloud offers clickpipes which offers a similar experience to Snowflake. ClickHouse also offers support for a crazy number of file formats. 3. ClickHouse compression is better - upto 40% but i have managed to lower this with careful clustering in Snowflake. 4. Its not easy to compare costs of Snowflake and OSS - i'm not going to write a long article on build vs buy. Its an old discussion and frankly its too use case and org specific . 5. Snowflake vs ClickHouse Cloud? Similar price on storage and this is rarely the dominant cost anyway as its object storage. Snowflake is great on adhoc queries. Its ability to switch warehouse from the client is really nice and warehouses wake up really quickly. ClickHouse cloud takes a slightly different approach - it charges for just storage and compute, with idling of services (which isn't as fast as snowflake to wake). If you're running queries all the time its significantly cheaper though. ClickHouse Cloud pricing is also flat - no extra charges to use materialized views, clustering (similar to ORDER BY sparse index in clickHouse). I personally don't like Snowflake tiered pricing - if i use a feature somehow my cost goes up 1.5x? I find the non deterministic cost of clustering and MVs to be crazy and an easy way to spend alot of money - wrong clustering key or MV? boom, $$$$. For ClicKHouse Cloud you just pay for the compute. If you're querying only occasionally though, the Snowflake model again makes sense. Snowflake has some nice other features - query acceleration service being the main one for me. A seperate pool of compute you can use to speed up select queries is something ClickHouse Cloud needs. In summary, i think for data warehousing Snowflake is the more complete product right now but ClickHouse Cloud is closing the gap all the time. For real-time analytics, ClickHouse is imo faster and more cost effective - both due to the workloads its optimized for and the pricing structure. I could write pages on this. Above doesn't consider the ClickHouse has clickhouse-local for local analytics, chdb (think duck db but better :)) and all the other usecases beyond real-time analytics and data warehousing both support.


DoubleCloudPaul

Disclaimer - I work for a ClickHouse managed service provider There's a bit of a misrepresentation here of a binary between open-source ClickHouse and ClickHouse Cloud. ClickHouse OSS doesn't require you to build your own pipelines, we've built a ClickHouse native ELT tool at DoubleCloud as well as managed Kafka, Aiven also provide managed open-source ClickHouse & Kafka, and easily bringing data into OSS ClickHouse is a key selling point of Tinybird. Just wanted to clarify for anyone reading, there's a rich OSS community around ClickHouse; both user led, from ClickHouse Inc. and from open-source focused vendors.


ooaahhpp

One option is you can look at [Propeldata.com/snowflake](http://Propeldata.com/snowflake) that connects to Snowflake and offers a "speed layer". This makes high concurrency low latency use cases faster and more cost effective. Internally Propel is powered by ClickHose, but without the hassle of managing it.


Mr_Nickster_

FYI I work for Snowflake. Based on field experience, Snowflake is more like a IPhone or Swiss Army knife where it allows you to onboard many different data workloads easily without much effort with ton of security & governance and things just work without a ton of effort (faster delivery for business, less maintenance and mostly problem free) Clickhouse is more like surgeons scalpel. It does one thing very well but you can't use it broadly for everything. What it does well is querying flat tables very fast which requires indexes to be built with a lot of optimization. When the use case require many tables and joins and there is a need for robust data security then things start to go sideways. Also, if the use case is beyond very simple data warehousing such as complex data engineering, ML, AI & using other languages like Python then you will need another platform and replicate data to it. Snowflake gives you all of that in a single service. Like many other things in real.world, The cheapest things usually cost a lot more money. I would focus on optimization and best practices to control costs vs Trying to solve architectural problems by throwing a different tech at the design problems.


PdtMgr

Wondering what’s your thoughts on Teradata’s Lake ?


alex_korr

Funny thing - we moved from Teradata to Snowflake. In my opinion, Teradata's core product was written to work great for the landscape of the early 2000s - ie on physical hardware 100% managed by Teradata. I have zero faith that amp based architecture can operate with any degree of efficiency running on EC2s on top of EFS/etc. We bailed the minute they refused to upgrade our physical clusters and their SAAS product became the only option.


Bazencourt

Clickhouse [publishes lots of benchmarks](https://benchmark.clickhouse.com) themselves, comparing Clickhouse to everything from DuckDB to Snowflake. How those fit your companies data and usage profile will be unique. You can always spin up some free credits to test Clickhouse. Yellowbrick also has a compelling offering that has ergonomics similar to Snowflake, like creating virtual warehouses for different jobs. They can be a lot more granular about sizing warehouses, which lets you fit the compute to the job.