Insecure Deserialization is a vulnerability which occurs when untrusted data is used to abuse the logic of an application, inflict a DOS attack, or even execute arbitrary code upon it being deserialized. In order to understand what insecure deserialization is, it first must be said what serialization and deserialization are.
Serialization refers to a process of converting an object into a format which can be persisted to disk (for example saved to a file or a datastore), sent
through streams (for example stdout), or sent over a network. The for- mat in which an object is serialized into, can either be binary or struc- tured text (for example XML, JSON YAML. . . ). JSON and XML are two of the most commonly used serialization formats within web applica- tions;
Deserialization on the other hand, is the opposite of serialization, that is, transforming serialized data coming from a file, stream or network socket into an object.
Web applications often use serialization and deserialization and most pro- gramming languages even provide native features to serialize data. The problems related to deserializeion comes when deserializing untrusted user input. Most programming languages offer the ability to customize deserial- ization processes, but it’s frequently possible for an attacker to abuse these deserialization features when the application is deserializing untrusted data which the attacker controls. Successful insecure deserialization attacks could allow an attacker to carry out DOS attacks, authentication bypasses and re- mote code execution attacks.
The following is an example of insecure deserialization in Python. Python’s native module for binary serialization and deserialization is called pickle. This example will serialize an exploit to run the whoami command, and dese- rialize it with pickle.loads().
# Import dependencies
import os
import _pickle
# Attacker prepares exploit that application will insecurely deserialize
class Exploit(object):
def __reduce__(self):
return (os.system, (’whoami’,))
# Attacker serializes the exploit
def serialize_exploit():
shellcode = _pickle.dumps(Exploit())
return shellcode
# Application insecurely deserializes the attacker’s serialized data
def insecure_deserialization(exploit_code): _pickle.loads(exploit_code)
if __name__ == ’__main__’:
# Serialize the exploit
shellcode = serialize_exploit()
# Attacker’s payload runs a ‘whoami‘ command
insecure_deserialization(shellcode)
It’s quite easy to imagine the above scenario in the context of a web applica- tion. If you must use a native serialization format like Python’s pickle, be very careful and use it only on trusted input. That is never deserialize data that has travelled over a network or come from a data source or input stream that is not controlled by your application. In order to significantly reduce the like- lihood of introducing insecure deserialization vulnerabilities one must make use of language-agnostic methods for deserialization such as JSON, XML or YAML. However, there may still be cases where it is possible to introduce vulnerabilities even when using such serialization formats.
Another such example in Python is when using PyYAML, one of the most popular YAML parsing libraries for Python. The simplest way to load a YAML file using the PyYAML library in Python is by calling yaml.load(). The follow- ing is an simple unsafe example that loads a YAML file and parses it.
# Import the PyYAML dependency
import yaml
# Open the YAML file
with open(’malicious.yml’) as yaml_file:
# Unsafely deserialize the contents of the YAML file
contents = yaml.load(yaml_file)
# print the contents of the key ’foo’ in the YAML file
print(contents[’foo’])
LISTING4.30: Insecure deserialization
Unfortunately, yaml.load() is not a safe operation, and could easily result in code execution if the attacker supplies an YAML file similar to the following:
foo: !!python/object/apply:subprocess.check_output [’whoami’]
Instead, the safe method of doing this would be to use the yaml.safe_load() method instead. While the above examples were specific to Python, it’s im- portant to note that this is certainly not a problem limited to Python. Ap- plications written in Java, PHP, ASP.NET and other languages can also be susceptible to insecure deserialization vulnerabilities.
Serialization and deserialization vary greatly depending on the program- ming language, serialization formats and software libraries used. To such an extent, fortunately, there’s no ‘one-size-fits-all’ approach to attacking an inse- cure deserialization vulnerability. While this makes the vulnerability harder to find and exploit, it by no means makes it any less dangerous.
4.10.1
Countermeasures
The only safe architectural pattern is not to accept serialized objects from untrusted sources or to use serialization mediums that only permit primitive
data types. If that is not possible, one of more of the following rules should be followed:
• Implementing integrity checks such as digital signatures on any serial- ized objects to prevent hostile object creation or data tampering;
• Enforcing strict type constraints during deserialization before object creation as the code typically expects a definable set of classes. By- passes to this technique have been demonstrated, so reliance solely on this is not advisable;
• Isolating and running code that deserializes in low privilege environ- ments when possible;
• Log deserialization exceptions and failures, such as where the incoming type is not the expected type, or the deserialization throws exceptions; • Restricting or monitoring incoming and outgoing network connectivity
from containers or servers that deserialize;
• Monitoring deserialization, alerting if a user deserializes constantly.