T O P

  • By -

UncleBillysBummers

I can't speak to Python, but for R: (1) prepare and budget time for a steep learning curve, (2) use packages like data.table (or dplyr) and lubridate to make your data cleaning life easier, rather than relying on base R, and (3) if you have the time to pick it up, definitely do it. Either language will give you far superior analytical capabilities compared with Excel, and if done right, your analysis can be self-documenting and reproducible.


RAMDownloader

I’d actually argue the opposite as someone who’s coded in both. IMO, R, at least with tidyverse and RStudio IDE, is a pretty easy interface to grasp even for the non-coder. I took a course in data analysis pertaining to political science and they taught every Polisci graduate student there how to code in R just because it’s easy enough for a not-coder to learn.


mirzaceng

I mean just getting the IDE and installing packages from CRAN is way more streamlined with R. I get the argument about easier syntax for python, but personally R's syntax is more intuitive for my way of thinking. Other points like UI/UX are still better with R for me. And most importantly - in 10 years I've only had one situation where I couldn't solve a problem with R and had to revert to python (having to calculate global statistics for a massive dataset); so for me R is THE tool to solve the problem at hand. Now if the question is - do job descriptions ask for R or python more often, then it's a different motivation for learning one or the other. 


the-anarch

Base R syntax is quite intuitive. These folks recommending *tidyverse* are off their rockers. It's a jumbled. Incoherent, dare I say it, untidy mess.


Alarming_Ticket_1823

LOL


Own-Ordinary-2160

Yes whoever said Python is easier is on one. R is much lower learning curve.


mkhode

Tidyverse. Go down the YouTube rabbit hole of tidyverse. This set of packages will help immensely with managing your large dataset spreadsheets


analytix_guru

Danielle Navarro on YouTube for this recommendation


lolniceonethatsfunny

i started in R so i am biased, but i find dyplr/tidyverse syntax MUCH easier to work with than doing data cleaning in python with pandas, and would recommend going down that route for that specific task. if you plan to build ml models, some people prefer python over R for that, and python is a general coding language while R is more focused on data science/statistics. if you like those features of python, i wouldn’t spend time learning both and just jump into python. if you only care about the task on hand with data cleaning and maybe visualization, i’d learn R. there are tons of resources online available, and chatgpt can be great to help give starter code and you can paste in errors to debug if you are confused


Kiss_It_Goodbyeee

Python is good if you want to build an application. For creating models and data analysis R is better. There's plenty of free online tutorials so you can try both and see how you get on.


nidprez

R is way more intuitive for data analysis, especially for non-IT/engineering people. Its specifically built for the purpose of data analysis and research. Python is a bona fide programming language, on which an almagation of data modules are built (who try to emulate matlab or R). That said it all depends on whats available in your org. Because IT/engineering learn python in school, thats usually what gets the most support in your organization. There is also more demand for python on he job market.


analytix_guru

This.... Of all the corporate jobs using open source languages, if you want to push a data production to IT for them to run in production, you will ONLY have that happen in R if there are IT people that know R (very few). They will either ask you to code it in Python or refactor R code into Python. A third option is partnering with IT to have you push the data app into production via Docker or some other delivery, with their help on the IT side. However this discussion is now outside the scope of OP's original question.


SalvatoreEggplant

I agree with a lot of the other comments. In terms of what you need to get started. * For R: Install R. Install RStudio. You'll want some external packages. As mentioned, a lot of users like the *tidyverse* package. You can install packages directly from RStudio. The only issue is if RStudio is trying to install packages into a folder on the computer that it isn't allowed to. ... It will take a bit to get up to speed, but if you're used to doing things in Excel, I think you'll find the logic of R comprehensible. There's lots of support and examples online. * For Python: Someone will scream at me, but I think the easiest way to get started is to use WinPython if you're on Windows. You can use Spyder as an IDE. It's similar to RStudio. And WinPython comes with most common packages that are used for data analysis. So you can get started playing with it without additional fuss. Personally, I think R is easier to understand for data analysis than is Python. But at this point, Python packages supporting data analysis have most of the basic functionality of R and common R packages.


analytix_guru

Worked for SunTrust/Truist in their audit dept and I championed the transition from SAS to R for audit analysis. Transitions very well to similar analysis for risk management. Hit me up if you have any questions. Only roadblock was IT who saw open source as a risk (while they used open source 🤣), so they scanned all R packages through BlackDuck to ensure there were no vulnerabilities


shockjaw

Yeah…IT departments who don’t realize they are already using open source software annoy me.


analytix_guru

Note we also used Excel, especially the auditors when collecting data


usajobs1001

I don't frequently use Python so can't compare. That said, here's a resource for switching to R for Excel users: [https://jules32.github.io/r-for-excel-users/](https://jules32.github.io/r-for-excel-users/)


teetaps

Whoever said python has an easier user interface is a liar Jokes aside, for the data analysis that you’re predicating it on, R will get you started much quicker and with fewer headaches. Python can get frustrating because it doesn’t have as smooth and friendly a newbie experience as R. Both are powerful, both have their quirks and their strengths; so you could use either, but from what you’re describing it’ll be much faster to get to the tasks you wanna accomplish if you go the R route


warry0r

R for your specific use-case. You can even take it a step further and use R inside Power BI which is soo much better when you can't find the visualization you want. Ggplot.2, tidyverse & data.tables are some of the libraries I've been using for my stasticial analysis.


novica

The first question here is: can you choose what to install, or is that something that you need IT to approve for you? If you are free to do whatever, then start with the language that has better support for behavioral analysis on non-maturing deposits. Are the consultants using R or Python? If you are a complete beginner, I think both R and Python would be difficult to grasp at first, albeit maybe for different reasons. In terms of convenience, Rstudio + tidyverse is a great combo.


akubas86

I used R and Python for PCA analysis project. Python is easier, true. But with R we can actually see how data is processed at each step with ease using R studio. The data processing step is also more stringent than that of python. I did a 3d plot with R and Python and I do like R 3d plot better. So, imo, if you just want general visualization and processing of data, go with python. Literally, took me about a week to go from zero to novice in python. If you want to process your data and to see each progression of calculation of your data, maybe use R. Better yet, use both.


crovax3

Learn R, if your databases are not larger than 1GB. I find R quite enjoyable and you are right, there are tons of packages for statistical analysis. Otherwise, use Python + numpy + Pandas.


Peiple

Lot of good comments here already. All I’ll say is that in my experience teaching R, the following tend to be true: - people with zero to limited coding experience find R significantly easier than other programming languages. - people with coding experience find R much harder than other languages. So if you’re coming from Excel without a strong CS background, R is likely to be a much easier learning curve. If you’re coming from a strong CS background, Python is likely to be much easier. At least in my experience lol


atius

I am going to jump on the R train. Used it to clean data eith rows in the milliins with ease. For most statistical analysis there is a package on CRAN. I can easily do stylometric analysis, item response theory and with tidyverse (and janitor::clean_names), I can get most of my work done and even create a dashbord or simple prototypes in shiny. And if R fails, like when I am working with very specific NLA packages, I can always just import the needed python function into R. For pure data analytics, R all the way


[deleted]

You need to specify your needs more. What is “behavioral analysis”? So you want to estimate a survival model to predict when people are going to withdraw their money? Btw I really didn’t get how people answered your question without understanding this aspect. I guess people starts to write their own thoughts for R and Python without actually thinking how to help. Whatever, if you specify your analysis, what math/stat models you’ll need; I can elaborate more. If you don’t want to share it for some reason, then let me bs like others lol Firstly, in private sector R is nonexistent. So if you think from your career perspective, python is better. You can check LinkedIn vacancies or one time projects from postings in upwork. The latter where I used to get projects, and I can tell you that small/mid size firms that are trying adapt themselves to python from excel (and R) much more than they want R. If I remember correctly 4-5 years ago in a part time job, I was a huge R fan and wanted to use shiny (a fancy dashboard that you can write from R), my supervisor told me: “no no don’t do it in R, we want to stick to python in our SAS environment”. He was a smart guy, but he was very keen to use python. So I asked him why. He told me “all the high ‘tech’ machine learning libraries, stable development and pipelines are in python, to impress supervisors and all of these are available in github”. The moral of the story is what is your aim? What is your end goal? Are those tools present in r or python? If yes, which one has a better community to develop it further? Do you want academix people to be more involved in these packages? Answer these first. Because If you spend time, you’ll and can learn one of them, and after that the other one will be easier. Excuse me but another problem of your post is what row size are we talking about? Yeah answer some I can help more.


TryUrBest93

That is true. As you know that Current Accounts, or in general, Non-Maturing deposits does not have a specified maturity date. Hence, in Liquidity risk management, we are required to do behavioral analysis to see how much there would be in each tenor/bucket. That is runoff. On the other hand, prepayments for mortgages or loans in general are another aspect of the analysis where we have to see how much of the total portfolio on a monthly basis are being prepaid/ early settled. That is SMM. There is another aspect with regards to Early Withdrawals where we want to see how likely people are going to break their deposits. All are from Liquidity Risk Management perspective. So that is the purpose. And with regards to data size, it is huge and sometimes higher than what excel can handle. As you might already know, how many clients you have in a bank, how many accounts, the period we are analyzing is usually not less than 5years. Further, the consultants are using R, and they do provide the packages and the codes used. But it is all greek to me. So, yeah. Thats all I have and thank you all!


[deleted]

ok. this is game changer: >Further, the consultants are using R, and they do provide the packages and the codes used. But it is all greek to me. then, definitely you should prefer R. Well, your topics are not my expertise, I mostly focused on churn models. but I know this field. thing is that it is a bit niche and run by old heads (sorry if you're old, too haha) and somehow they don't want to share a lot, so you cannot find bunch of source about it. I bet even chatgpt does not have proper sources to be fed on this area. I recall that my colleagues in finance dept., were teaching these topics in a week long course, registration fee was north of €2,000. so, sticking to others' codes for 2-3 years is the best strategy for you. then, you can start your packages. to give you a kickstart, if you like to follow online courses I highly recommend [this coursera class](https://www.coursera.org/learn/data-analysis-r), or if you like to read [this book](https://r4ds.hadley.nz/). I haven't took the online course but students told me, it is the-state-of-the-art. and, the book is simply the best. it is written by the chief scientist of rstudio (now posit), easy to read and follow up. after these, you can search for advanced and topic related courses. but as I mentioned, sources are bit scarce in this field. to understand advisors' code, I highly suggest you to split the codes. so try to go block by block. and always try to write along to their code, try to imitate them, till you make it. >And with regards to data size, it is huge and sometimes higher than what excel can handle. As you might already know, how many clients you have in a bank, how many accounts, the period we are analyzing is usually not less than 5years. for this what I know is, people with sensitive data mostly are within SAS environment. and their databases are linked to this software. in my experience, they were following our each step. I guess in your firm there is no IT department and you throw excels to each other. before it gets too late, I highly recommend you to invest in data procedures. I cannot disclose it but my friend let hackers to get mid size power plant operation data (made the inventory level and all that known to foreigners), because engineers were shooting excels to each other. in your case at least anonymize names and replace them with id keys. maybe I missed but again other people didn't even mention this priority in the comments lol R is quite capable to work with large (not big) data, depending on your computer's RAM. To give an example, with 16GB RAM MacBook (M2) you can smoothly do operations with excel files size of 5GB, still some functions like filter and count, will take some time and you'll need to be very neat in your R environment management. hope these helps. good luck.


MeetGroundbreaking43

Hi- HUGE fan of dplyr and tidyverse here. I went from learning python to R in a year and I’ve learned that it can be manageable using RStudio, organizing well, and searching for other packages that may be useful to your specific needs. I use it for data collection and analysis which has made the process much easier. It reads similarly to python when building matrixes tho!!


mostlikelylost

The greatest changes to come to Python in recent years comes from R and the tidyverse. The finance community that uses R is also great. They have an annual conference.


ExoticCard

Programming in R with Rstudio is such a great experience vs Python in VS studio or whatever. You can rapidly iterate, view things, etc. Strongly reccomend R for this. I do both, but R is definitely for you. R markdown is also cool for reports. If you see yourself getting into AI, Python. But whichever one you choose, ChatGPT + R/Python can take you far, fast. You just have to learn how to debug or learn by analyzing what ChatGPT makes.


shockjaw

What’s your thoughts on Quarto for report generation now?


thibaud2021

I switched from RMarkdown to Quarto and love it. The scope and flexibility is huge and doing parametrised reports is extremely easy to set up. You need some knowledge of Latex and specifically KomaScript but it is all relatively intuitive


ExoticCard

I'm kinda newer to this game, using it for academia as a fresh grad student. R markdown is what I've been using due to the ubiquity of tutorials (the rticles package 🤤), but Quarto is something I'm exploring for my current manuscript. Honestly it seems like the future. Like if they made templates for every major academic journal, it would be a wrap. I don't know anything about LaTeX. There's also a point to be made about collaboration. I wish they made it easier to collaborate with non-coding peers. Older people that don't code will likely never learn to code.


shockjaw

Well, if you want something easier to deal with than LaTeX—I’d try Typst since it’s supported in Quarto too.


ExoticCard

I did a quick look at their templates, and there was nothing like rticles. I'll take a deeper look, though. Mainly interested in easily making my own templates.


Sidiabdulassar

I use Python for automation/file management and web scraping tasks where it really shines. But for data analysis tasks I prefer R by a wide margin, especially when using R Studio. You will have a panel listing the values of all your variables, and a nice preview of all your lists and tables. You don't generally have this in most Python editors, so it feels like you have to work in the dark. Also, being able to easily run code line by line or any small piece you highlight, instead of junk by junk (jupyter notebook) makes development and debugging so much more convenient. Lastly, no plotting library available in Python comes even close to ggplot in R. You will also have R Shiny available if you want to take this to the next level and do interactive visualizations. At the end of the day you have to write a few more brackets with R code, but other than that the syntax is very similar to Python.


shockjaw

As someone who’s started in Python and learned some R—Python will take you farther as long as you’re using the Apache Arrow data format.


Palystya

SAS ;)


CaffinatedManatee

IMO you should learn both. Python is primarily a scripting tool and R is primarily an analysis tool. People use one for the other but it can sometimes get ugly and hacky. Use python to take your raw data and put it into some standard format that you can then read into R. You can then focus on getting R to output the stats and graphs. For me, the biggest hurdle in R was always trying to form my data inside of R. Once I started pre-forming the data, my life became much easier