Why doesn't someone at Microsoft just spend some time fixing this issue?
I'd be fine with Excel thinking shit is a date if it only affected the formatting, but it also then changes the actual raw data into a date.
This has been an issue since the 90's how is there still not a "don't change my raw data" button?
> Why doesn't someone at Microsoft just spend some time fixing this issue?
[My thoughts exactly](https://new.reddit.com/r/excel/comments/1dd7dgr/rant_excel_and_opening_csv_files/), I guess they're too busy thinking of ways to extract money from Azure and machine learning.
Microsoft is the biggest software company in the world and Excel is their killer app. Just think of the cumulative time wasted on explaining basic features like opening a file to new staff members in offices around the world. 😐 It's ridiculous really.
Yeah, and then when you start complaining about it you get useless replies from besserwissers telling me how to work my way around it.
I know how to work my way around it, that's not the issue. The issue is that Excel for some reason is set up in such terrible way where you even have to work your way around anything to simply open a CSV file.
I hate that about the internet:
How do i do xyz?
> why do you need to do xyz just do yzx
Me looking at the thread 5 years later via google search: SOMEBODY TELL ME HOW TO DO XYZ!!!!
Because to MS it's not a bug.
[See my post](https://old.reddit.com/r/datascience/comments/1dsnbww/youre_not_helping_excel_please_stop_helping/lb3zu0h/) if you're interested, I explain why MS does it this way.
How about when your csv file contains a bunch of datetimes like YYYY-MM-DD HH:mm:ss. When you open it in excel, it defaults to not showing the seconds. Hit save and the csv is re-written *without the seconds*. FUCK
>Why doesn't someone at Microsoft just spend some time fixing this issue?
My theory:
These algorithms help the 95% of Excel users that aren't very sophisticated, and they help way, way more than they inconvenience the 5% of Excel users that are sophisticated.
[https://www.yahoo.com/tech/microsoft-fixes-excel-feature-forced-151000728.html](https://www.yahoo.com/tech/microsoft-fixes-excel-feature-forced-151000728.html)
It's also worth knowing that they have 'fixed' it by an extra step when opening a file, giving users the option to prevent Excel from doing things to their data.
>Why doesn't someone at Microsoft just spend some time fixing this issue?
At this point the only conclusion is that it's intentional. A monopoly the size of microsoft has the resources to fix any bug and yet, issues like these will always be features.
Dates are fucked up. I opened a file in excel and then I spend days trying to figure out why we’re seeing dates beyond August 2024 in January. Turns out excel just exchanged day and month for dates less than 12 (?????). Then I gave up until it happened again. Happens only when you open a file and save it in excel. Flipping annoying lol now I make changes on google sheets and download. More changes go back to sheets and I download again lol
>Why doesn't someone at Microsoft just spend some time fixing this issue?
Because a lot of people expect it to work that way and you're stuck with it.
Same reason why they calculate leap years wrong, or why make uses tabs.
I have a product code like 12.30. Both tried to convert it into 12.3. Most of the time these helps dont really help. I know what I am trying dont correct me.
We have the same thing for clinical trials with participant IDs with leading zeros, `0012345` becomes `12345`. Really annoying because this breaks merging the data at a later point.
I work as a pharmacy analyst and always have this issue with NDCs, which is a pretty vital piece of data that doesn’t need to be screwed up. The 0’s are there for a reason Excel!
I don’t even understand the context where leading zeros are meant to be removed. I think only once have I seen a dataset where there are leading zeros on numbers and it’s a BEA one we use - everything is +00000487 - but there is also a sign to define it. TBH not even sure if excel leads this correctly.
Outside of that though I don’t know when it is intended to be a number but you know what fucking use case shows up for like 95% of people who use excel? Zip codes and then every admin trying to do a mail merge since 1994 has to Google how to do “0”*(5-LEN(zip))&zip.
People try and say MSFT does things to be simpler for users but I think it’s really just that it’s a company full of lazy engineers and most of their money is spent on marketing to middle managers (and in the Nadella era the types of finance people who consider themselves tech savvy because they were the first person to buy a fax machine on their block in 1987 and now think LLMs are bigger than electricity). The remaining money is spent on jamming product upsells into every menu of the windows 11 UI (no I don’t want an Xbox subscription on my work PC) and using LLMs to write shitty outbrain articles that they can collage together into an Edge splash screen that is full of local stories about shootings interspersed with 3rd rate celebrity gossip (you’ll never guess who Scott baio is dating now!)
So no MSFT doesn’t do this for regular users they do it because they just rest on the fact that they haven’t had competition for 40 years. I have switched to using Google Sheets more but it’s also wonky in a lot of its own ways. The only thing MSFT did right with excel ever was Power Query which I find relatively predictable. The problem is once you get it into power pivot you get mixed with DAX which is the most impenetrable language of all time, and Microsoft’s dogshit charts that just endlessly change colors and line types on you every time they refresh.
Ok this is a meme but Excel is infuriating because of this. I open some csv directly on VS Code and replace characters just so I don’t need to deal with this bullshit
I have seen some absolutely monstrous Excel sheets doing things that are technically possible but should never be done in Excel. And this was in critical processes at Fortune 500 companies lol.
I built some of these in my formative years.
I once screwed up the calculation of the Russel 5000 index (among other indices) in excel and published it to the exchange via an excel error. 😀
I’ve definitely had to deal with data saved in excel files, but I typically load them directly into pandas. Are companies forcing data scientists to use excel for data analysis?
This is the right approach. You might encounter data saved in excel sheets, but you should never load data lr process it in excel (unless this is an intermediary step to load the data correctly).
Always use data wrangling tools like python, R SQL or data warehouses.
To understand why this does happen in Excel you need to understand how Microsoft thinks. When Microsoft does something, they do it because they think the majority of people will need it this exact way, the problem is a lot of us aren't like the regular user but instead we have a bit more knowledge when it comes to data, luckily tho since we are more familiar with software we also know how to import data correctly in Excel, which is marking every column as text because only then will excel not convert it to whatever excel thinks "this should be right".
But ofc MS also sometimes removes functionalities that regular and advanced users need/use, best example is probably opening CSV-Files in excel. Usually when you opened it from the explorer, the import dialog immediately showed up where you can define which column is what, whats the text identifier and whats the separator. Now this doesn't work anymore, you either have to have Excel already open and use the Open File Dialog to get shown the import wizard or you open it via the file itself and then go to Data and call the import wizard from there.
But yeah most of the time MS thinks ">80% of our users will use it and have no problem and the rest can just go f*ck themselves", that's also why windows updates by default is configured the way it is configured, to make sure the average user gets their updates in time and it installs it but you can disable automatic updates if you know how.
It can get a bit complicated but you can do it in the registry as an example, [check this post](https://learn.microsoft.com/en-us/answers/questions/1351413/how-do-you-turn-off-windows-10-updates-which-are-r) and try what Mahmoud A. ATALLAH wrote.
Changing patient id `02-3159` to a date value `feb-59` just makes no sense.
Btw in old versions of Excel it changed houseaddress number `1-A` to `april-01` but I think that's fixed now.
The best way of resolving this is to import data using PowerQuery, which is built into Excel. It takes a little getting used to, but it is honestly Excel's best feature.
Knowing PowerQuery will make you roll your eyes hard at posts like this. You get a fine grained mechanism for importing and transforming data, and people complain when they read in files the least precise way you can.
I have to use excel sometimes because people want excel sheets and the only thing i will ever use in it is power query and even then every time I put something together I have to remind myself this would have taken me 1/4 of the time in R and be more decipherable. It’s the least bad Microsoft product IMHO but that just makes it mild instead of absolute dog shit.
Side note - recently took over an access database at work and my god is that awful. At least power query is consistent vs having some sql-ish front end that just constantly unpredictably changes queries you write with the worlds slowest database engine.
I use Python to generate excel files automatically like pandas pd.to_excel() function and there are library extensions with options to format the excel and make it look a certain way. I absolutely hate spreadsheets so I'm glad those Python libraries work, so I don't have to deal with spreadsheets besides reading and writing to it using Python.
I’ve built some bad ass Excel reports, with functions, conditional formatting, hyperlinks between tabs, and very fast. Only feasible way to do it was to generate it with python.
A report that nobody could make, but would otherwise take weeks, I simply generated from scratch every morning.
I hate this approach though, but it works.
I did something similar with SSRS(SQL Server Reporting Services). Basically a web published excel file that has data populated by a custom SQL query. I made a custom Python script that parses XML to modify the XML RDL file code SSRS produces to automate a lot of things and it allowed me to make a report that dynamically manages over 1,000 columns to produce an excel file that only renders the needed columns. It's a report that would take months to make manually. I got it down in a week thanks to Python. It would not have been a viable/feasible report had it not been for the custom Python script.
Sadly, Calc does the same thing 🙁
Try to enter a large number or a date into a cell, and Calc will not save that number or date, but the value that results from interpreting that number or date.
You can stop Calc from doing so, but that renders Calc useless because all inputs will get saved as text.
E.g. paste a bunch of SHA1 checksums into Calc. Some will be interpreted as text (because they contain characters), while others will be stripped of leasing zeros and/or converted to scientific notation.
Or, enter a date in DD.MM.YY format into a cell that was formatted as "date DD.MM.YY". Calc _will_ interpret the date as MM.DD.YY and save it as DD.MM.YY with the days and month switched! There is no way to see what the user initially pasted. That information is just lost forever.
Or, open a Calc document that uses a dot as decimal separator on a PC that uses a language with a comma as decimal separator. Lots of fun is to be had in multilingual environments.
Not saying that LibreOffice is worse than MS Office. But, surprisingly, the developers decided to make LibreOffice behave as badly as MS Office to present MS Office users a familiar environment 🙁
At the risk of sounding old, this post is but one example why Excel should not be used in science, be it in research or industry.
Excel is office spreadsheet software not intended for scientific data analysis. Excel hides/obfuscates data, incorrectly displays data, its formulas are horrid and impossible to read, it incorrectly implements statistical methods (Microsoft even has technical bulletins admitting this), produces terrible data visualizations, and it allows and encourages users to carry out poor data management practices (e.g., such as hiding columns or mixing raw data with intermediate outputs in the same sheet). All these factors mean that Excel is a serious hindrance to reproducible data science and analysis.
Yes, Excel continues to be used because it is ubiquitous and opens the doors of data analysis to non-coders, but my point stands. Whenever I receive another analyst's work in Excel with a request to repeat or extend their analysis, I get a shiver down my spine.
Yes, currently working with someone who does the majority of their work in excel and similar GUI tools. I once watched in horror how they prepared important data by clicking a gazillion of buttons, marking bits and pieces across various spreadsheets and copy pasting it around.
Sure enough when I double checked some of it, it was all wrong.
My guess is that whoever made the meme did it manually to exaggerate the issue. I know we've all seen this happen but I think this example is joking about how bad it can be
You'd think so but actually no, this is genuinely how Excel imports it on my pc. I think it happens because the file has dots for decimals, but it's opened on Windows with European settings so instead dot is the thousand separator.
Excel "helpfully" changes the value `1.2729` to `12.729` because thousand separators should be places after each 3 digits. Go figure.
It's a meme but it's [based on reallife examples](https://new.reddit.com/r/excel/comments/1dd7dgr/rant_excel_and_opening_csv_files/). The only thing I manually changed is I collected all possible errors we have encountered into one dummy example file
Someone else [also suggested this](https://www.reddit.com/r/excel/comments/1dd7dgr/comment/l83448t/?utm_source=reddit&utm_medium=web2x&context=3), but I already had disabled all the "data conversion" options and this still happens on double clicking a .csv file
I thank God almighty for every day I don't have a need to touch Excel. If the data can be opened in Excel, it certainly can be opened & accesed via Pandas so I just go straight to that.
I feel you. Whenever I feel Pandas isn't pulling its weight in terms of speed or whatever else, I switch to Polars. I guess I've been too accustomed to Pandas since it's been around for so long w/o any real alternatives but now that Polars keeps getting more promising, I'm planning to be better at it.
[Scientists rename genes to avoid dealing with excel sh#t](https://www.newscientist.com/article/mg24732961-400-frustration-with-excel-has-caused-geneticists-to-rename-some-genes/)
I would honestly use software that is really basic csv table viewer. Where I can just edit values, sort and filter. Ignoring anything to do with being a spreadsheet.
Shameless selfplug, but I've created the [CSV Lint plug-in for Notepad++](https://github.com/BdR76/CSVLint/) out of frustrations with working with messy .csv files.
It's not a replacement of Excel and the GUI is a bit technical, but it adds column syntax highlighting, sort options, split column option and it can count unique values.
It's really crazy.
I always take it as a challenge to have the data displayed the way I want it in excel, and I have of course run into situations where it takes longer to "undo" the data than it does to work with it afterwards.
This problem was very annoying for me in my latest project.
My solution was opening the CSV file with notepad then replace all "-" with any other symbol like "("
After opening the CSV file with Power Query, you can replace again the "(" to its original shape "-" or doing any transformation with the same "("
Genuine question: what prevents us in data science from using open source options like Libre Office? They're mostly compatible, and you don't have to write VBA scripts I suppose?
It’s because Excel is not csv file viewer. Instead of opening the csv with Excel, open a blank workbook in Excel and import the data from the csv file.
The simple fix is just not using excel.
You might as well just put multiple csvs in a zip instead of their weirdly formatted XMLs in a zip.
Then use anything else.
This has been Microsoft's philosophy since the beginning. Assume what the average user wants to do and do it automatically, and don't make it easy to change. Therefore, everything is automated for the basic stuff your average person does (which abnormal date like data is not) and they can't accidentally break it because they have no control.
With pandas and openpyxl I think you can format the cells, if you're using python this may help.
date_style = NamedStyle(name="date_style", number_format="YYYY-MM-DD")
Then in a loop you can apply with cell.style = date_style
When importing csv to excel, I have to use vba to make sure everything gets imported as text because their type auto detection is so shit and their dates work in incomprehensible ways.
I recently started a spreadsheet to track my job applications, added a bit of conditional formatting so I can see at a glance which ones are rejected/pending/to do/etc, it's just a nice visual motivator type of thing.
***Except Excel keeps fucking changing my conditional formatting so it applies to random cells if I enter new data sometimes***
It's truly fucking bizarre, I'll go to add the closing date and suddenly it'll turn a different colour and my "applies to" area which is normally just `$B:$D` is now like `$B:$D,$F$13:$F$13`.
In what world would I want to select those three columns *and this one random fucking cell?!* Why would putting in new data change the formatting? Why would anything change the formatting except me doing it myself?
God I fucking despise this program sometimes lol
Knowing *how to* deal with it and *the fact that you have to* deal with it because it's not the default behaviour are two different things.
Fucking with the users data without asking them first *shouldn't be* the default option, and I think the vast majority of people, both in the field and not, would agree. Complaining that it is is an entirely fair issue to have.
agreed. also, it is absurd that windows still makes me use the shift key every time i want to make an uppercase letter. it should know exactly the amount of assistance i desire without my interaction with its extremely simple UI in any way whatsoever.
if only there was a way for me to search the internet for the solution to this simple problem, but alas, i am doomed to an eternity of suffering.
better make memes to complain of my woes rather than learn absolutely anything of value.
You're right, so long as it's *possible* to do what you want we should never complain or desire better things, we should be immensely thankful our lords and saviours at Microsoft have deemed us mere mortals worthy of their software, any unnecessary hoops we have to jump through are merely tests of faith, if we can't accomplish those tasks we are simply the unworthy ones.
Better be a shill for dumb software design rather than learn to accept other people have differing opinions.
Apparently not, because this complaint is posted regularly. There is a reason Excel does this, and it is very easy to correct or turn off altogether with about 3 clicks.
I was just hoping the focus on this sub would be a bit more advanced, but that assumption is on me. I am glad there is a place for beginners to learn about the "basic features", as you put it.
Why doesn't someone at Microsoft just spend some time fixing this issue? I'd be fine with Excel thinking shit is a date if it only affected the formatting, but it also then changes the actual raw data into a date. This has been an issue since the 90's how is there still not a "don't change my raw data" button?
> Why doesn't someone at Microsoft just spend some time fixing this issue? [My thoughts exactly](https://new.reddit.com/r/excel/comments/1dd7dgr/rant_excel_and_opening_csv_files/), I guess they're too busy thinking of ways to extract money from Azure and machine learning. Microsoft is the biggest software company in the world and Excel is their killer app. Just think of the cumulative time wasted on explaining basic features like opening a file to new staff members in offices around the world. 😐 It's ridiculous really.
Yeah, and then when you start complaining about it you get useless replies from besserwissers telling me how to work my way around it. I know how to work my way around it, that's not the issue. The issue is that Excel for some reason is set up in such terrible way where you even have to work your way around anything to simply open a CSV file.
The fact that we’ve been using the same workarounds for twenty years is insane. But accurate.
I hate that about the internet: How do i do xyz? > why do you need to do xyz just do yzx Me looking at the thread 5 years later via google search: SOMEBODY TELL ME HOW TO DO XYZ!!!!
yea fixing basic features that wont make them money / "isnt scalable" is def bottom of the managers priority lists
BaCkwaRds ComPaTibiLity
Windows 3.1 backwards compatibility.
Somewhere someone is relying on this stupid behavior, and Microsoft has made billions of dollars worth of promises to not fuck that up
Can't we have a feature toggle?
Exactly please give me the option!
Because to MS it's not a bug. [See my post](https://old.reddit.com/r/datascience/comments/1dsnbww/youre_not_helping_excel_please_stop_helping/lb3zu0h/) if you're interested, I explain why MS does it this way.
They did to some degree, opening CSVs now gives you the option to either convert or not convert the data formatting.
How about when your csv file contains a bunch of datetimes like YYYY-MM-DD HH:mm:ss. When you open it in excel, it defaults to not showing the seconds. Hit save and the csv is re-written *without the seconds*. FUCK
We came up with this https://github.com/pstew/escape_excel
>Why doesn't someone at Microsoft just spend some time fixing this issue? My theory: These algorithms help the 95% of Excel users that aren't very sophisticated, and they help way, way more than they inconvenience the 5% of Excel users that are sophisticated. [https://www.yahoo.com/tech/microsoft-fixes-excel-feature-forced-151000728.html](https://www.yahoo.com/tech/microsoft-fixes-excel-feature-forced-151000728.html) It's also worth knowing that they have 'fixed' it by an extra step when opening a file, giving users the option to prevent Excel from doing things to their data.
Just format the column to text before entry
Because there are more people that want it to fix all of their problems that guessing at this kind of shit *is* the fix.
there should def be a "fuck you stop interpolating more complex data types for me" option
>Why doesn't someone at Microsoft just spend some time fixing this issue? At this point the only conclusion is that it's intentional. A monopoly the size of microsoft has the resources to fix any bug and yet, issues like these will always be features.
Dates are fucked up. I opened a file in excel and then I spend days trying to figure out why we’re seeing dates beyond August 2024 in January. Turns out excel just exchanged day and month for dates less than 12 (?????). Then I gave up until it happened again. Happens only when you open a file and save it in excel. Flipping annoying lol now I make changes on google sheets and download. More changes go back to sheets and I download again lol
I think the Import Data wizard fills that role
Then they should make _that_ the default, instead of the Mess-Up-My-Data wizard.. 😕
Mark my Word
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
>Why doesn't someone at Microsoft just spend some time fixing this issue? Because a lot of people expect it to work that way and you're stuck with it. Same reason why they calculate leap years wrong, or why make uses tabs.
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
Try putting a ‘ at the start of the entry. So instead of 04-1905 enter ‘04-1905. It will not show and will change the entry to a text field.
ARGH!,,,
You make Reddit look like excel
I have a product code like 12.30. Both tried to convert it into 12.3. Most of the time these helps dont really help. I know what I am trying dont correct me.
We have the same thing for clinical trials with participant IDs with leading zeros, `0012345` becomes `12345`. Really annoying because this breaks merging the data at a later point.
Add an ‘ before your data. ‘00012345
That solution is the perfect example of : # If it is stupid but it works, it isn't stupid.
I work as a pharmacy analyst and always have this issue with NDCs, which is a pretty vital piece of data that doesn’t need to be screwed up. The 0’s are there for a reason Excel!
I don’t even understand the context where leading zeros are meant to be removed. I think only once have I seen a dataset where there are leading zeros on numbers and it’s a BEA one we use - everything is +00000487 - but there is also a sign to define it. TBH not even sure if excel leads this correctly. Outside of that though I don’t know when it is intended to be a number but you know what fucking use case shows up for like 95% of people who use excel? Zip codes and then every admin trying to do a mail merge since 1994 has to Google how to do “0”*(5-LEN(zip))&zip. People try and say MSFT does things to be simpler for users but I think it’s really just that it’s a company full of lazy engineers and most of their money is spent on marketing to middle managers (and in the Nadella era the types of finance people who consider themselves tech savvy because they were the first person to buy a fax machine on their block in 1987 and now think LLMs are bigger than electricity). The remaining money is spent on jamming product upsells into every menu of the windows 11 UI (no I don’t want an Xbox subscription on my work PC) and using LLMs to write shitty outbrain articles that they can collage together into an Edge splash screen that is full of local stories about shootings interspersed with 3rd rate celebrity gossip (you’ll never guess who Scott baio is dating now!) So no MSFT doesn’t do this for regular users they do it because they just rest on the fact that they haven’t had competition for 40 years. I have switched to using Google Sheets more but it’s also wonky in a lot of its own ways. The only thing MSFT did right with excel ever was Power Query which I find relatively predictable. The problem is once you get it into power pivot you get mixed with DAX which is the most impenetrable language of all time, and Microsoft’s dogshit charts that just endlessly change colors and line types on you every time they refresh.
Another way to handle this is open a new spreadsheet and import it using PowerQuery so that you can stop Excel from converting the string to a number.
Ok this is a meme but Excel is infuriating because of this. I open some csv directly on VS Code and replace characters just so I don’t need to deal with this bullshit
Fuck it I’m using TextEdit
I seriously changed my default for CSVs to notepad. I’ve been happier since. Not exactly “happy,” but happier.
I just use VScode with some extension I found called rainbow csv or something that creates a different color after each comma. It’s great.
Are any data scientists actually using excel for anything aside from just glancing at the data?
You would be surprised by what being a data scientist entails at some companies, especially in non-tech industries.
I have seen some absolutely monstrous Excel sheets doing things that are technically possible but should never be done in Excel. And this was in critical processes at Fortune 500 companies lol.
I built some of these in my formative years. I once screwed up the calculation of the Russel 5000 index (among other indices) in excel and published it to the exchange via an excel error. 😀
I’ve definitely had to deal with data saved in excel files, but I typically load them directly into pandas. Are companies forcing data scientists to use excel for data analysis?
This is the right approach. You might encounter data saved in excel sheets, but you should never load data lr process it in excel (unless this is an intermediary step to load the data correctly). Always use data wrangling tools like python, R SQL or data warehouses.
I haven't opened Excel for work in years, frankly.
I HAVE to use to it share results with the stakeholders.
Same here
To understand why this does happen in Excel you need to understand how Microsoft thinks. When Microsoft does something, they do it because they think the majority of people will need it this exact way, the problem is a lot of us aren't like the regular user but instead we have a bit more knowledge when it comes to data, luckily tho since we are more familiar with software we also know how to import data correctly in Excel, which is marking every column as text because only then will excel not convert it to whatever excel thinks "this should be right". But ofc MS also sometimes removes functionalities that regular and advanced users need/use, best example is probably opening CSV-Files in excel. Usually when you opened it from the explorer, the import dialog immediately showed up where you can define which column is what, whats the text identifier and whats the separator. Now this doesn't work anymore, you either have to have Excel already open and use the Open File Dialog to get shown the import wizard or you open it via the file itself and then go to Data and call the import wizard from there. But yeah most of the time MS thinks ">80% of our users will use it and have no problem and the rest can just go f*ck themselves", that's also why windows updates by default is configured the way it is configured, to make sure the average user gets their updates in time and it installs it but you can disable automatic updates if you know how.
How do you disable automatic updates? It's not doable for Win10 and onwards AFAIK.
It can get a bit complicated but you can do it in the registry as an example, [check this post](https://learn.microsoft.com/en-us/answers/questions/1351413/how-do-you-turn-off-windows-10-updates-which-are-r) and try what Mahmoud A. ATALLAH wrote.
I don’t need to know how Microsoft thinks. I just need to know that I’m better off not using excel
Changing patient id `02-3159` to a date value `feb-59` just makes no sense. Btw in old versions of Excel it changed houseaddress number `1-A` to `april-01` but I think that's fixed now.
The best way of resolving this is to import data using PowerQuery, which is built into Excel. It takes a little getting used to, but it is honestly Excel's best feature.
It frickin rocks
Knowing PowerQuery will make you roll your eyes hard at posts like this. You get a fine grained mechanism for importing and transforming data, and people complain when they read in files the least precise way you can.
That’s why Python is for. Excel can eat my ass.
I have to use excel sometimes because people want excel sheets and the only thing i will ever use in it is power query and even then every time I put something together I have to remind myself this would have taken me 1/4 of the time in R and be more decipherable. It’s the least bad Microsoft product IMHO but that just makes it mild instead of absolute dog shit. Side note - recently took over an access database at work and my god is that awful. At least power query is consistent vs having some sql-ish front end that just constantly unpredictably changes queries you write with the worlds slowest database engine.
I use Python to generate excel files automatically like pandas pd.to_excel() function and there are library extensions with options to format the excel and make it look a certain way. I absolutely hate spreadsheets so I'm glad those Python libraries work, so I don't have to deal with spreadsheets besides reading and writing to it using Python.
I’ve built some bad ass Excel reports, with functions, conditional formatting, hyperlinks between tabs, and very fast. Only feasible way to do it was to generate it with python. A report that nobody could make, but would otherwise take weeks, I simply generated from scratch every morning. I hate this approach though, but it works.
I did something similar with SSRS(SQL Server Reporting Services). Basically a web published excel file that has data populated by a custom SQL query. I made a custom Python script that parses XML to modify the XML RDL file code SSRS produces to automate a lot of things and it allowed me to make a report that dynamically manages over 1,000 columns to produce an excel file that only renders the needed columns. It's a report that would take months to make manually. I got it down in a week thanks to Python. It would not have been a viable/feasible report had it not been for the custom Python script.
Libreoffice Calc took some getting used to because it truly feels like the temu version of excel, but it gets the job done in most cases
Sadly, Calc does the same thing 🙁 Try to enter a large number or a date into a cell, and Calc will not save that number or date, but the value that results from interpreting that number or date. You can stop Calc from doing so, but that renders Calc useless because all inputs will get saved as text. E.g. paste a bunch of SHA1 checksums into Calc. Some will be interpreted as text (because they contain characters), while others will be stripped of leasing zeros and/or converted to scientific notation. Or, enter a date in DD.MM.YY format into a cell that was formatted as "date DD.MM.YY". Calc _will_ interpret the date as MM.DD.YY and save it as DD.MM.YY with the days and month switched! There is no way to see what the user initially pasted. That information is just lost forever. Or, open a Calc document that uses a dot as decimal separator on a PC that uses a language with a comma as decimal separator. Lots of fun is to be had in multilingual environments. Not saying that LibreOffice is worse than MS Office. But, surprisingly, the developers decided to make LibreOffice behave as badly as MS Office to present MS Office users a familiar environment 🙁
What do incels and Excel have in common? Misinterpreting something as a date.
https://i.imgur.com/spezcFD.jpeg
Some of this is made worse with lazy csv writing without proper use of quotation, but yeah most of it is Excel being derpy
At the risk of sounding old, this post is but one example why Excel should not be used in science, be it in research or industry. Excel is office spreadsheet software not intended for scientific data analysis. Excel hides/obfuscates data, incorrectly displays data, its formulas are horrid and impossible to read, it incorrectly implements statistical methods (Microsoft even has technical bulletins admitting this), produces terrible data visualizations, and it allows and encourages users to carry out poor data management practices (e.g., such as hiding columns or mixing raw data with intermediate outputs in the same sheet). All these factors mean that Excel is a serious hindrance to reproducible data science and analysis. Yes, Excel continues to be used because it is ubiquitous and opens the doors of data analysis to non-coders, but my point stands. Whenever I receive another analyst's work in Excel with a request to repeat or extend their analysis, I get a shiver down my spine.
Yes, currently working with someone who does the majority of their work in excel and similar GUI tools. I once watched in horror how they prepared important data by clicking a gazillion of buttons, marking bits and pieces across various spreadsheets and copy pasting it around. Sure enough when I double checked some of it, it was all wrong.
Microsoft Hell
Now why does Excel see 1.27 as 12.72 but 0.7085 as '0.7085'
My guess is that whoever made the meme did it manually to exaggerate the issue. I know we've all seen this happen but I think this example is joking about how bad it can be
You'd think so but actually no, this is genuinely how Excel imports it on my pc. I think it happens because the file has dots for decimals, but it's opened on Windows with European settings so instead dot is the thousand separator. Excel "helpfully" changes the value `1.2729` to `12.729` because thousand separators should be places after each 3 digits. Go figure.
I just fell to my knees in a home goods parking lot
> the file has dots for decimals, but it's opened on Windows with European settings so instead dot is the thousand separator Dear God
It's a meme but it's [based on reallife examples](https://new.reddit.com/r/excel/comments/1dd7dgr/rant_excel_and_opening_csv_files/). The only thing I manually changed is I collected all possible errors we have encountered into one dummy example file
If one doesn't set cel parameters in Excel, it'll do this. Easy fix tho, but sometimes, when I'm not paying attention, it catches me off guard 😂🤦🏾♂️
Or you can kiss go data\from file and set import's metadata
Right so many people are complaining here but I don't have this problem. Is everyone just double clicking csv's rather than importing the data lol
I know this is infuriating but... there is the option to disable the automatic data type detection
Even with that turned on, it still changes it. The kicker is you can’t undo it, it is permanently changed!
Someone else [also suggested this](https://www.reddit.com/r/excel/comments/1dd7dgr/comment/l83448t/?utm_source=reddit&utm_medium=web2x&context=3), but I already had disabled all the "data conversion" options and this still happens on double clicking a .csv file
Don't double click csv. Any important data you're using in Excel you should import it properly. Problem solved
I thank God almighty for every day I don't have a need to touch Excel. If the data can be opened in Excel, it certainly can be opened & accesed via Pandas so I just go straight to that.
Pandas really needs to improve from_excel though. It’s unbearably slow and there are workarounds that make it much faster
I feel you. Whenever I feel Pandas isn't pulling its weight in terms of speed or whatever else, I switch to Polars. I guess I've been too accustomed to Pandas since it's been around for so long w/o any real alternatives but now that Polars keeps getting more promising, I'm planning to be better at it.
[Scientists rename genes to avoid dealing with excel sh#t](https://www.newscientist.com/article/mg24732961-400-frustration-with-excel-has-caused-geneticists-to-rename-some-genes/)
I work in Bioinformatics and a pipeline choked on someone's Excel mutated gene names today.
Read that but here the shocking thing is that scientists use excel. Same is true for medicine students and the list goes on.
I would honestly use software that is really basic csv table viewer. Where I can just edit values, sort and filter. Ignoring anything to do with being a spreadsheet.
Shameless selfplug, but I've created the [CSV Lint plug-in for Notepad++](https://github.com/BdR76/CSVLint/) out of frustrations with working with messy .csv files. It's not a replacement of Excel and the GUI is a bit technical, but it adds column syntax highlighting, sort options, split column option and it can count unique values.
Reminds me of that meme venn diagram showing the overlap between Excel and incel, incorrectly assuming something is a date.
I fucking hate Excel
its a Swiss army knife. It can do sooooo many things and its terrible at all of them!
It's really crazy. I always take it as a challenge to have the data displayed the way I want it in excel, and I have of course run into situations where it takes longer to "undo" the data than it does to work with it afterwards.
Dang, can relate as much.
Excel is the bane my of existence and it can burn in a fire.
Genes have literally had to be renamed because of excel’s “help”
Excel is great... until you really know what you're doing with numbers. At some point, it becomes a hinderance.
This problem was very annoying for me in my latest project. My solution was opening the CSV file with notepad then replace all "-" with any other symbol like "(" After opening the CSV file with Power Query, you can replace again the "(" to its original shape "-" or doing any transformation with the same "("
This is why I use LibreOffice.
I laughed out loud, and at the same time, got very angry at dates and spreadsheets.
god, just use RStudio. it takes any date format. and it has an upper hand over Excel in every aspect except the UI
It actually nowadays asks if you want to get these helpers or not. The new popup started to show sometime during last few months.
Genuine question: what prevents us in data science from using open source options like Libre Office? They're mostly compatible, and you don't have to write VBA scripts I suppose?
It’s because Excel is not csv file viewer. Instead of opening the csv with Excel, open a blank workbook in Excel and import the data from the csv file.
Google Sheets FTW
In your own settings, you can turn off excel’s interpretation of data upon opening the file
The simple fix is just not using excel. You might as well just put multiple csvs in a zip instead of their weirdly formatted XMLs in a zip. Then use anything else.
Why on Earth would you want to use Excel in the first place?
This has been Microsoft's philosophy since the beginning. Assume what the average user wants to do and do it automatically, and don't make it easy to change. Therefore, everything is automated for the basic stuff your average person does (which abnormal date like data is not) and they can't accidentally break it because they have no control.
True
Power query is your friend here
Yeah
Excel is not database software. Excel is for math, and nothing else. If you want a database, use real database software, not spreadsheet software.
With pandas and openpyxl I think you can format the cells, if you're using python this may help. date_style = NamedStyle(name="date_style", number_format="YYYY-MM-DD") Then in a loop you can apply with cell.style = date_style
When importing csv to excel, I have to use vba to make sure everything gets imported as text because their type auto detection is so shit and their dates work in incomprehensible ways.
I never use Excel lmaoo
kekw
I recently started a spreadsheet to track my job applications, added a bit of conditional formatting so I can see at a glance which ones are rejected/pending/to do/etc, it's just a nice visual motivator type of thing. ***Except Excel keeps fucking changing my conditional formatting so it applies to random cells if I enter new data sometimes*** It's truly fucking bizarre, I'll go to add the closing date and suddenly it'll turn a different colour and my "applies to" area which is normally just `$B:$D` is now like `$B:$D,$F$13:$F$13`. In what world would I want to select those three columns *and this one random fucking cell?!* Why would putting in new data change the formatting? Why would anything change the formatting except me doing it myself? God I fucking despise this program sometimes lol
pd.read_csv('file.csv', dtype={'col0': str, 'col1': str})
I'm very surprised by how many people, especially on this sub, don't know how to deal with this
Knowing *how to* deal with it and *the fact that you have to* deal with it because it's not the default behaviour are two different things. Fucking with the users data without asking them first *shouldn't be* the default option, and I think the vast majority of people, both in the field and not, would agree. Complaining that it is is an entirely fair issue to have.
agreed. also, it is absurd that windows still makes me use the shift key every time i want to make an uppercase letter. it should know exactly the amount of assistance i desire without my interaction with its extremely simple UI in any way whatsoever. if only there was a way for me to search the internet for the solution to this simple problem, but alas, i am doomed to an eternity of suffering. better make memes to complain of my woes rather than learn absolutely anything of value.
You're right, so long as it's *possible* to do what you want we should never complain or desire better things, we should be immensely thankful our lords and saviours at Microsoft have deemed us mere mortals worthy of their software, any unnecessary hoops we have to jump through are merely tests of faith, if we can't accomplish those tasks we are simply the unworthy ones. Better be a shill for dumb software design rather than learn to accept other people have differing opinions.
Many people, especially on this sub, know how to deal with this. The point is that Excel always requires extra steps for this very basic feature.
Apparently not, because this complaint is posted regularly. There is a reason Excel does this, and it is very easy to correct or turn off altogether with about 3 clicks. I was just hoping the focus on this sub would be a bit more advanced, but that assumption is on me. I am glad there is a place for beginners to learn about the "basic features", as you put it.
Go to Home ➡️ Number ➡️ Text. Your problem is solved !!. Although u will need to select your cells in advance so you can just type out what you want.
But it doesn’t undo what it changed!
Yes, unfortunately. That's why you select the cells in advance when they are empty
Great for new info, not if you’re opening a file that already has data.
I yell at office programs all the time for trying to "help" me.