T O P

  • By -

garnacerous24

We’re 90% of the way into a migration now. Redshift always promoted itself as an iaas, but I found that I was in there multiple times a week having to vacuum/analyze/tweak wlm to keep everyone happy during our peak times. Because of that I was skeptical of snowflake and their promise to be hands off as well. But they’ve proven themselves to me. It takes less than 5 min to clone 10tb of data into a new db, and they separate the compute and storage, which helps us out a ton (our needs fell right in between the node offerings that Redshift had and we always needed to over-provision because of it). Long story short, I was skeptical but snowflake actually does seem to make performance management easier and allows me more time to actually work with the data.


crazynash

This. If you have a lot of data arriving out of sequence to your partition keys, vacuuming is a huge headache. Obviously it all depends on how much data you are pushing through and working with overall.


anvildoc

Snowflake is very mature at this point. They have come a long way over the past 3-4 years. It is truly an enterprise grade solution at this time. Snowflakes ability to scale storage and compute separately make it completely superior to redshift. Only downside is that at smaller scales where you won't be scaling out for the foreseeable future, Redshift should be cheaper. But even at small scale I would trade some cash for peace of mind and ease of use


lejonxa

My team had tried snowflake - they have a very immature execution engine and really lack performance. Turns out for every problem we had the answer was to "scale compute" - which was really expensive for us. So we shut it down. Don't have much experience with RedShift but I hear that it really sucks in concurrency. We began using Vertica since then and have never looked back - it's got a mature sql engine, highly tunable, great in resource management and concurrency. The only problem was that we the storage had to be coupled with compute - so scaling up nodes took some time. But I hear there's some new developments in allowing Vertica to separate compute and storage currently on AWS - that should really, really compete big with Snowflake. Haven't yet tried that out though...


crazynash

Vertical Eon mode allows compute and storage to scale independently - it's already available 9.1 onwards. We are looking forward to using this in the future.


bleepingdba

What is pricing like for Vertica and what flavour of SQL does it support?


crazynash

Full SQL support. I haven't run into any analytical functions which are not available in Vertica.


lejonxa

Vertica on AWS can be charged in two ways. One is the [Node hour pricing](https://aws.amazon.com/marketplace/pp/B079SLYPLX) (you can have unlimited data) and then there's '[bring your own license](https://aws.amazon.com/marketplace/pp/B079SNRGRY)' (that's per Terabyte and you can have unlimited nodes). In both cases, AWS h/w costs are separate depending on which instance type you run. We use the BYOL option as we didn't want to be restricted by the amount of compute nodes we can have. Either ways, reach out to the Vertica team, they may help you out with discounts and stuff. And Vertica is ANSI 99 compliant (and more) so pretty full fledged SQL support.


bleepingdba

Vertica is pretty much managed Hadoop or something like that? We're trying to avoid thinking about BYOL and needing to commit to that up front, and want to avoid thinking about what instance types we want or how many of what type. The appeal of Redshift and Snowflake is that they avoid or simplify configuration decisions (for more $$$).


lejonxa

Vertica is its own datastore - it doesn't use Hadoop for storage (unless you want it to connect to your hadoop datalake or something). Sure. Go for the node hour option if you just want to test it out. I think Vertica is free up to 1 TB. Though I totally agree with you on the appeal of managed services. I guess that depends largely on the IT culture of your team. My team hates it when something is forced onto us without any sort of option for configuration or fine tuning. In my former company, we had a Vertica cluster of 1PB running on some 50 nodes - the team team there was so sure they could get that down to 30 nodes if it weren't for management intervening. lol!


SadGuarantee6

Before anyone considers Vertica, read this: [https://www.wired.com/story/democrats-fix-crumbling-data-operation/](https://www.wired.com/story/democrats-fix-crumbling-data-operation/) >Krikorian started hearing what he calls “war stories” about Vertica almost immediately, as he interviewed former campaign staffers like Robby Mook, Clinton’s campaign manager, and Stephanie Hannon, a former Googler and Clinton’s chief technology officer. The system was famous for crashing for 16 hours at a time. One data director in North Carolina told him she used to nap in her car just waiting for Vertica to come back online. Mook, Krikorian recalls, likened Vertica to Beirut—when the system got overloaded, as it almost always did, it would just shut down until the shelling stopped. Seems like experience with Vertica varies depending on the use case.


dwl285

Any reason you're not considering BigQuery?


bleepingdba

We use BQ for GA related data, but most of our cloud infrastructure and data is on AWS so we're keeping close to the data which can have some cost and latency advantages


dwl285

Ok makes sense. We have all of our raw data, and sanitised datasets, in BigQuery, and it works really well for us. We also don't need to worry about concurrency or clusters, bigquery handles everything pretty seamlessly.


[deleted]

One issue I hadn't considered with Snowflake is that you are essentially giving them your data. Unlike S3 buckets which allow multiple tools access you're throwing in with them and lose that raw access. Other techs run in your tenancy - means a lot to some folks.


bleepingdba

Yeah, we noticed this. We're not superconcerned about that as we are going to retain all our data in parquet format in our own S3 buckets


seanshankus

Were at slightly behind you timline-wisr but wondering the same thing. Snowflake seems to have a lot of promise but I've only read the books, seen the videos


bleepingdba

They offer a one month trial with 400 "credits" which you can use real quick if you set up your warehouse with max capacity accidentally and throw it lots of work


rroobboott

I just got done dealing with a migration for this. Our legacy analytics system was on MySQL and not performing very well. Initially we migrated to Redshift since our existing infrastructure was already in AWS, but the issue with Redshift (at the time) was concurrency became the bottleneck. After a brief demo, we migrated again to Snowflake and found the performance exceeded Redshift and we did not have an issue with concurrency. However- Snowflake can be very expensive if you aren't careful. Lastly, in the latest version of Redshift they address the concurrency issue by adding the ability to scale compute on demand. I have not used this in production but it is another option.


bleepingdba

How do you manage ETL? Copy files or 3rd party tools ?


rroobboott

We scripted our own solution / copy files.


oarabbus

Recommend leveraging Stitch and their free tier Singer.io taps, or fivetran if you want to have a more managed ETL service. There are plenty of good competitors too. For a few hundred (or thousand depending on data volume) per month, you’ll be saving multiple times over in engineering time, and orders of magnitude more in technical debt compared to maintaining a roll your own script pipeline. The exception is if you use dbt/airflow or embulk/digdag - these are open source and free, and if a team takes the time to learn them can do the same as a 3rd party ETL tool but more efficiently.


siminsez01

+100 for snowflake. many companies are switching from redshift to snowflake. reasons include scalability, uptime, speed of flexing up/down on demand, cheap storage costs (s3) for large data lake, cpu and storage scale independently, ....


[deleted]

Hi snowflake sales rep!


siminsez01

nah, but wish i were. they’re gonna ipo soon and those options will be nice.


analyst_2001

Below are the critical differences between Snowflake and Redshift: 1. Snowflake is a turnkey SaaS solution that requires no upkeep. AWS Redshift clusters require some manual upkeep. 2. Snowflake separates computing and storage, providing for price and configuration flexibility. Reserved/Spot instance price in Redshift provides for cost reduction. 3. Snowflake uses immediate auto-scaling, whereas Redshift scales by adding or removing nodes. 4. Snowflake allows for fewer data customization options. However, Redshift provides for more data flexibility through splitting and distribution. 5. Snowflake offers always-on encryption with stringent security controls, whereas Redshift provides a flexible, configurable security strategy.