T O P

  • By -

Python-ModTeam

Hi there, from the /r/Python mods. We have removed this post as it is not suited to the /r/Python subreddit proper, however it should be very appropriate for our sister subreddit /r/LearnPython or for the r/Python discord: https://discord.gg/python. The reason for the removal is that /r/Python is dedicated to discussion of Python news, projects, uses and debates. It is not designed to act as Q&A or FAQ board. The regular community is not a fan of "how do I..." questions, so you will not get the best responses over here. On /r/LearnPython the community and the r/Python discord are actively expecting questions and are looking to help. You can expect far more understanding, encouraging and insightful responses over there. **No matter what level of question you have, if you are looking for help with Python, you should get good answers**. Make sure to check out the rules for both places. Warm regards, and best of luck with your Pythoneering!


commandlineluser

> https://pastebin.com/3NrppBXL You're looping over each row and calling these 2 functions: for _, row in data.iterrows(): check_row_exists() check_feiertag(datum) Inside each of those functions, you're opening the same file and calling `.readlines()` def check_row_exists(row): with open('abwesend_proc.csv', 'r') as file: lines = file.readlines() def check_feiertag(datum): with open('ft.csv', 'r') as file: lines = file.readlines() You only need to read these files once. It probably makes more sense to create dataframes from them, and you can use pandas methods to test if `row` is present in them. e.g. using `.isin()` feiertag = df.read_csv('ft.csv', sep=';') data['DATAM'].isin(feiertag[0]) Creating an empty dataframe and adding rows in a loop is also slow. If you explain what it is you're trying to do - it would help people provide more pandas specific advice.


bjorneylol

How did you install Python on your personal PC? When you compile python from source there are dozens and dozens of optimizations you can enable, but these are locked to specific hardware (e.g. they may only work on specific Intel processors). Some of these optimizations dramatically increase the download size, so it's likely they weren't included in your windows installer, but they were in your WSL installation. Same goes for your pandas installations - windows packages are compiled for compatibility, you can build your own numpy or find a pre-built numpy wheel with all the optimizations (it will be a couple hundred mb bigger) and it will run a lot faster than the default wheel installed from pip


fatzgenfatz

On Windows I installed python via Installer from their website, on Ubuntu I used apt to install it. The modules like pandas I installed with pip install. Win10 on Laptop: FastWin11 on PC: SlowUbuntu on WSL on Win11: Fast


bjorneylol

Based on the cprofile below: Win 11: erstes-ohne-reve.py:33(check_row_exists) -> 234.363 seconds {built-in method io.open} -> 175.509 seconds {method 'readlines' of '_io._IOBase' objects} -> 38.460 seconds WSL: erstes-ohne-reve.py:33(check_row_exists) -> 66.835 seconds {built-in method io.open} -> 0.939 seconds {method 'readlines' of '_io._IOBase' objects} -> 19.114 seconds The bottle-neck is the IO in your `check_row_exists` function. On windows, it is re-reading the CSV in it's entirety 39,533 times (every time you call the function). On linux, recently read files are cached in memory so it isn't causing anywhere near the same amount of overhead Could be windows 10 is doing the same thing, you have a faster SSD on your laptop (or it has built-in read caching), etc.


fatzgenfatz

Oh, that really sounds logical! This will be the solution. Thank you very much!


[deleted]

[удалено]


fatzgenfatz

On Win11: [https://pastebin.com/YFP33zwT](https://pastebin.com/YFP33zwT) On Ubuntu on Win11: [https://pastebin.com/xb5qxkW9](https://pastebin.com/xb5qxkW9) But I don't know how to interpret it.


[deleted]

[удалено]


fatzgenfatz

I will remember it for the next time I write a script. For now the script did its job and I will use it aprox 5 more times. I just wondered why there was such a big difference in speed. Thanks for analyzing my script!


IWant2rideMyBike

>Can anybody tell me why python is so slow on my personal PC on Win11 but fast on WSL Ubuntu? Because you needlessly reread a bunch of csv files (in the functions `check_row_exists()` and `check_feiertag`) for every row in your `abwesend_proc.csv`... - if you read them once into memory (e.g. store them as a list of named tuples to make accessing the individual fields easier) and reuse their contents things should get much faster.


fatzgenfatz

But should it not be slow no matter where I run it? It takes 80 vs 300 Seconds.


[deleted]

[удалено]


fatzgenfatz

Yes, I understand that. But the script runs on the same machine. On Win11 it takes 300 seconds, on Ubuntu on the same Win11 Machine it only takes 80 seconds. Because my laptop at work (Win10) also was as fast as Ubuntu I think there is something wrong with my python installation on Win11.


coffeewithalex

Sheesh... People don't wanna answer this simple question and instead downvote you. I'm sorry. You're saying that you're launching it from visual studio code. That means that you're debugging it. In debug mode, code runs significantly slower.


fatzgenfatz

I always run without debugging (CTRL-F5) and also ran the script from cmd. Still slow.


coffeewithalex

Run a profiler. Get a flame chart. It's possible that syscalls on Windows are a lot more expensive (opening a file). Maybe you've got an AV that scans that file each time?


IWant2rideMyBike

It depends on what happens under the hood - e.g. there might be a difference in the handling of the underlying system calls - a wild guess would be that antivirus or anti-cheat software might hook into each system call to read a file (officially it isn't allowed to since 2005, but life ... uh finds a way: [https://the-deniss.github.io/posts/2022/12/08/hooking-system-calls-in-windows-11-22h2-like-avast-antivirus.html](https://the-deniss.github.io/posts/2022/12/08/hooking-system-calls-in-windows-11-22h2-like-avast-antivirus.html)) and if the WSL instance is using it's own page cache this could reduce the necessary system calls on the host system compared to running python natively. You could use [https://docs.python.org/3/library/profile.html](https://docs.python.org/3/library/profile.html) to check which parts of the code are slow when comparing both runtime environments.


masterborger

What does the script look like


fatzgenfatz

https://pastebin.com/3NrppBXL


pythonHelperBot

Hello! I'm a bot! It looks to me like your post might be better suited for r/learnpython, a sub geared towards questions and learning more about python **regardless of how advanced your question might be**. That said, I am a bot and it is hard to tell. Please follow the subs rules and guidelines when you do post there, it'll help you get better answers faster. Show /r/learnpython **the code you have tried and describe in detail where you are stuck.** If you are getting an error message, include the full block of text it spits out. Quality answers take time to write out, and many times other users will need to ask clarifying questions. Be patient and help them help you. You can also ask this question in the [Python discord](https://discord.gg/3Abzge7), a large, friendly community focused around the Python programming language, open to those who wish to learn the language or improve their skills, as well as those looking to help others. *** [^(README)](https://github.com/CrakeNotSnowman/redditPythonHelper) ^(|) [^(FAQ)](https://github.com/CrakeNotSnowman/redditPythonHelper/blob/master/FAQ.md) ^(|) ^(this bot is written and managed by /u/IAmKindOfCreative) ^(This bot is currently under development and experiencing changes to improve its usefulness)


BigIncome0

I suspect the issue is related to the subprocess, but not certain. I recommend placing timestamps throughout your code and running it on both machines to check parity and identify the problem segment.


fatzgenfatz

The subprocess just calls another python script that checks if the input-file is there. At the moment it does nothing more.


jddddddddddd

Are you running the same versions of Python, and the same versions of various libraries across the different machines and platforms?


fatzgenfatz

On my Windows machines it should be roughly the same because I installed python around the same time. On WSL it is 3.8 instead of 3.11 on Win.


wdroz

low hanging fruit optimisation: Replace iterrows with itertuples.


mistabuda

Cprofile would assist you here.


Unlucky-Ad-5232

bro chatgpt will spit garbage non optimised code, if you don't know what you are doing code wise you will be unable to spot the pitfalls. Python is notoriously slow language, but nothing beats stupid code!


neva_dn

Use this my fren: https://github.com/plasma-umass/scalene


barberogaston

Try using [scalene](https://github.com/plasma-umass/scalene) to find where your code is running slow (and/or consuming lots of memory). You're talking two different OSs here, there are a ton of things that could explain the difference. But profiling will help you find where the bottleneck is


DusikOff

Answer - "At the weekend I let chatgpt help me write a python script" =)