Comments:"Ned Batchelder: War is peace"
URL:http://nedbatchelder.com/blog/201302/war_is_peace.html?
The Rails community has had a few high-profile security issues this week. They are well-summarized, with an alarming list of what follow-ons to expect, by Patrick McKenzie:What the Rails Security Issue Means for Your Startup.
tl;dr:
- Ruby's YAML parser will execute arbitrary Ruby code,
- YAML is parsed all over the place in Rails, including for all JSON input,
- Pretty much every Rails app is going to be compromised soon.
The Python community is in a slightly better position. True, we have pickle in the standard library, which has exactly the same problem, but it's rare to find applications that accept pickles from untrusted sources.
Don't ever unpickle data you don't trust!The 3rd-party YAML parser PyYAML has the same issue as Ruby's YAML parser. By default, it will let you create arbitrary Python objects, which means it can run arbitrary Python code. YAML isn't nearly as pervasive in the Python world, and we don't parse JSON with the YAML parser usually, but this can still create security holes.
PyYAML has a .load() method and a .safe_load() method. Why do serialization implementers do this? If you must extend the format with dangerous features, provide them in the non-obvious method. Provide a .load() method and a .dangerous_load() method instead. At least that way people would have to decide to do the dangerous thing. I would advocate for PyYAML to make this change now, who cares if backward compatibility breaks? Most people using .load() never intended to deserialize arbitrary Python objects anyway, so they'll never notice.
If you use the PyYAML library in your code, check now that you are using the .safe_load() method.If you want automatic serialization of your user-defined classes, take a look atCerealizer, which works similarly to pickle, but is built to be secure from the start. I've never used it, but it looks promising.
BTW, this whole circus reminded me of Allen Short's excellent lightning talk from PyCon 2010:Big Brother's Design Rules (skip to 17:30). To summarize Allen's pithy maxims:
- War is Peace: assume you are at war, all input is an attack, and then you can be at peace.
- Slavery is Freedom: the more you constrain your code's behavior, the more freedom you have to act. The smaller your interface, the smaller your attack surface.
- Ignorance is Strength: the less your code knows about, the fewer things it can break. This is the principle of least authority.
Allen in particular mentions that adding "conveniences" to your interface can make your life harder later on. In Ruby's case, there were two unneeded conveniences that combined to make things really bad: parse JSON with the YAML parser, and let the YAML parser construct arbitrary Ruby objects. Neither of these is actually needed by 99.999% of programs reading JSON, but now all of them are compromisable.
Think hard about what your program does. Stay safe.

Nick Coghlan 10:33 PM on 1 Feb 2013
@Walls: Writing secure software is hard - it requires a combination of paranoia (assuming you will be attacked instead of thinking "Why would anyone ever try to exploit this?") and humility (assuming your attackers will be smarter than you instead of thinking "I don't know how to break it, therefore it is secure") that most people don't have. It also often comes at a cost in flexibility - pickle is a lot more powerful than JSON as a data format, but that power carries with it a huge increase in risk.
The core Python team tries hard to promote a culture of "use as much magic as you need, but no more" (often paraphrased as "magic is evil", and included in the Zen of Python in various guises like "explicit is better than implicit", "simple is better than complex", "complex is better than complicated" and "if the implementation is hard to explain, it's a bad idea"). However, it's always going to be tempting to make the powerful and flexible option the default, and the more restrictive option the exception.
As an example that was fixed in Python 3: Python 2 has "input()" which implicitly calls "eval()" on user supplied data. The safer alternative, which allows the use of more restrictive parsing by always returning a string, is called "raw_input()". In Python 3, the input() builtin itself has been fixed to behave like Python 2's raw_input()
However, even in Python 3, the builtin eval() is still dangerous to use on user-supplied data, as it can execute *any* Python expression. For obscure technical reasons, the safer-but-more-limited alternative, "ast.literal_eval()", isn't even a builtin the way raw_input() was.
Only in Python 3.3 did we start shipping a comparison operation suitable for security sensitive operations (hmac.compare_digest), and there are still no suitable primitives for password hashing in the standard library (although "passlib" is just a download away on PyPI).
No Pythonista should ever feel smug about security woes in another language or runtime, whether that's Java or Ruby or something else. We have a track record of promoting "safe by default" behaviour, but our record certainly isn't perfect, and we'll almost certainly have more issues in the future. Standard library behaviours that are safe within the confines of a single system (like sharing pickled objects through a pipe) become unsafe when spanning multiple systems (like sharing pickled objects without cryptographic signatures across a network socket), and we're relying on other developers to understand that. Heck, the Rails vulnerability is overshadowing a recent MoinMoin exploit which was used to take out both Debian's main wiki and the Python wiki on python.org.
Looking specifically at the case of the recent Rails problems, even apps written in Python may run into trouble if a related Rails app, or an unrelated Rails app on the same network, falls to an attacker. Attackers don't stop just with the first machine compromised - every compromised machine becomes a platform for launching additional attacks, often with additional data about or privileged access to subsequent target systems.
The design space available for programming languages is enormous, and we collectively still know very little about how to write large scale software sensibly. When other languages and software are attacked, it is important to reflect on it and see what lessons can be learned for our own tools (as Ned has done here), rather than arrogantly assuming ourselves to be immune from the same kinds of error.