Breaking Code

February 17, 2010

One-time pad encryption in Python

Filed under: Cryptography — Tags: , , , , , , — Mario Vilas @ 4:58 am

After some crypto sillyness with @feliam, @julianor and @ortegaalfredo on Twitter I cooked up a one-time pad crypto implementation in Python. This speaks volumes, not of my talent as a cryptographer (which is none at all) but the sad state of my social life these days (which happens to be the same amount).

What is one-time pad encryption?

Feel free to skip this section if you already know the answer. Especially so you don’t have to suffer my layman’s explanations of cryptography. :)

To put it simply, a one-time pad cipher is one in which the plaintext (i.e. the original message) is encoded using a completely new, random key each time it’s sent. When properly used (and I hope I have) this system is provably unbreakable. That means the ciphertext (that is, the encoded message) can never be decoded without the proper key – even if the encoding algorithm is very simple, like a bitwise XOR operation on each byte.

There are a few caveats: first of all, the key can never be reused. If you do, the system not only ceases to be unbreakable but it’s also as strong as the encoding algorithm you used. So if you chose XOR encoding and were tempted to use the key twice, you might as well have used a “magic ring” from a cereal box. ;)

The second caveat: the key must be truly random. For this reason need a random generator that can guarantee a certain amount of entropy, for example /dev/random on many Unix systems, to get the one-time pads, instead of the random module, which is only a PRNG (pseudo-random generator). PRNGs can only produce a seemingly random stream of numbers, all derived from a single value (called the seed number) – so it’s “randomness” is just as good as the seed number from which all others are calculated. (This is a useful property in other contexts, like avoiding to have to store the contents of all malformed files produced by a fuzzer in order to reproduce the crashes, but I digress).

The third caveat: the key must never be transmitted over an insecure medium. Sounds pretty much like a no-brainer, I know, but it’s worth mentioning that public key crypto doesn’t suffer from this problem. (Now you know why GPG is so much better than this). Real-life uses of one-time pads include storing the keys in codebooks, which the recipient of the message would carry everywhere. Then the encrypted messages could be safely sent in the clear, say on some radio frequency by a numbers station, until the codebook was used up.

How does this code work?

If you weren’t among the lucky ones who skipped over my ramblings in the previous section you can easily guess by now: we’ll be using a bitwise XOR encoding of each byte of the plaintext against the corresponding byte of the one-time pad to produce the ciphertext. This is how we generate a one-time pad of any given size:

    $ ./otp.py generate test.key -s 1024
    $ ls -l test.key
    -rw-r--r-- 1 user group 1024 2010-02-17 01:23 test.key
    $

The alternative for the lazy is to pass the name of the file we want to encrypt. A one-time pad of the exact same size will be generated. We’ll use the -f flag this time to force overwriting the previous file.

    $ ./otp.py generate test.key conscience.txt -f
    $ ls -l test.key conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:24 test.key
    $

And to satisfy all audiences, there’s also an option for the paranoid: the -p flag uses /dev/random for maximum security instead of the much faster /dev/urandom. It does take considerably longer to generate even small one-time pads, that’s why this option is disabled by default.

    $ ./otp.py generate test.key conscience.txt -f -p
    $ ls -l test.key conscience.txt 
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 test.key
    $

Now that we have our one-time pad we can encrypt the message:

    $ ./otp.py encrypt conscience.txt test.key conscience.crypto 
    $ ls -l conscience.*
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience.crypto
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    $

Both files are the same size but have different contents. Since it’s no longer ASCII trying to cat the file only renders a bunch of garbage in the terminal. Finally, this is how you decrypt it:

    $ ./otp.py decrypt conscience.crypto test.key conscience2.txt
    $ ls -l conscience*
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience2.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience.crypto
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    $ cmp conscience.txt conscience2.txt 
    $

After decryption, conscience2.txt is identical to the original file and contains the familiar text of The Conscience of a Hacker.

As always, the code is available for download below. Enjoy! :)

Updates

  • 24-Jul-2011: Small update to the command like parsing and the documentation.

Download

otp.py

Source code

#!/usr/bin/env python

# One-time pad example in Python
# Copyright (c) 2009-2011, Mario Vilas
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice,this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of the copyright holder nor the names of its
#       contributors may be used to endorse or promote products derived from
#       this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

from __future__ import with_statement

import sys
import os.path
import optparse

try:
    import psyco
    from psyco.classes import *
except ImportError:
    pass

class OneTimePad(object):
    """
This is a simple one-time pad cipher implementation in Python.
For more information on one-time pads:

http://en.wikipedia.org/wiki/One-time_pad

A word of warning: while I made my best effort to avoid programming mistakes
and one time pads are not that hard to implement, I'm not a cryptographer. So
until a proper cryptographer can validate this code, use it at your own risk!

It's best to use /dev/random here, instead of the "random" module. The reason
for that is that the random module doesn't truly provide random numbers, but
pseudo-random numbers that are calculated from a seed number. So the entire
amount of information contained in a stream of pseudo-random numbers fits
into the seed number... using this would be no more secure than just picking
a single number, defeating the whole purpose of using a one-time pad.

It's also not a good idea to limit /dev/random to printable characters only,
so if the user wants printable files we'll have to proceed as usual and later
encode the results.

One-time pad generation only works on Unix systems, but it should be possible
to port it to Windows by deriving this class and reimplementing the check_dev()
and random() methods to use Win32 crypto API. Encryption and decryption should
work in all platforms.
    """

    # Buffer size for file access
    block_size = 65536

    # Paranoid mode (use /dev/random instead of the faster /dev/urandom)
    paranoid = False

    # Placeholder for random generator device (open file)
    dev = None

    # Check for the presence of the required random generator device
    def check_dev(self):
        if self.paranoid:
            return os.path.exists('/dev/random')
        return os.path.exists('/dev/urandom')

    # Generate a string of random bytes
    def random(self, size):
        if not self.dev:
            if self.paranoid:
                self.dev = open('/dev/random', 'r')
            else:
                self.dev = open('/dev/urandom', 'r')
        return self.dev.read(size)

    # Get the size of an open file
    def filesize(self, fd):
        fd.seek(0,2)
        n = fd.tell()
        fd.seek(0,0)
        return n

    # Generate a random one-time pad
    def generate(self, padfile, total_size):
        random = self.random
        block_size = self.block_size
        while total_size > 0:
            block = random( min(block_size, total_size) )
            padfile.write(block)
            total_size = total_size - len(block)

    # Encrypt or decrypt a file using a one-time pad
    def cipher(self, infile, outfile, padfile):
        block_size = self.block_size
        while 1:
            data = infile.read(block_size)
            if not data:
                break
            pad = padfile.read(len(data))
            encoded = ''.join([ chr(ord(a) ^ ord(b)) for a, b in zip(data, pad) ])
            outfile.write(encoded)

    # Main function (most of it is command line parsing stuff)
    def run(self):

        # Define a command line parser
        banner = (
            "One-time pad example in Python\n"
            "by Mario Vilas (mvilas at gmail dot com)\n"
            "http://breakingcode.wordpress.com/"
                        "2010/02/17/one-time-pad-encryption-in-python\n"
        )
        usage = (
            "\n\n"
            "Create a one-time pad:\n"
            "    ./%prog generate -k example.key -s size\n"
            "    ./%prog generate -k example.key -t example.txt\n"
            "Encrypt a file:\n"
            "    ./%prog encrypt -t example.txt -k example.key -c example.cipher\n"
            "Decrypt a file:\n"
            "    ./%prog decrypt -c example.cipher -k example.key -t example.txt"
        )
        formatter = MyHelpFormatter(banner, max_help_position=26)
        parser = optparse.OptionParser(usage=usage, formatter=formatter)
        parser.add_option("-t", "--text", action="store", type="string",
                          metavar="FILE", help="plaintext filename")
        parser.add_option("-c", "--cipher", action="store", type="string",
                          metavar="FILE", help="ciphertext filename")
        parser.add_option("-k", "--key", action="store", type="string",
                          metavar="FILE", help="one-time pad filename")
        parser.add_option("-s", "--size", action="store", type="int",
                          metavar="NUM", help="one-time pad size in bytes")
        parser.add_option("-f", "--force", action="store_true", default=False,
                          help="force overwriting of any output files")
        parser.add_option("-p", "--paranoid", action="store_true", default=False,
                          help="use /dev/random instead of /dev/urandom (slower!)")

        # Parse the command line
        args = list(sys.argv)
        if len(args) == 1:
            args = args + [ '--help' ]
        options, args = parser.parse_args(args)

        # Set paranoid mode if requested
        self.paranoid = options.paranoid

        # Check command is present
        if len(args) < 2:
            parser.error("missing command")
        command = args[1].strip().lower()[0:1]
        if not command:
            parser.error("missing command")

        # If more parameters are present, try to guess what they are
        if len(args) > 2:
            p = 2
            try:
                if command == 'g':
                    # g key size
                    # g key text
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                    try:
                        options.size = int(args[p])
                        p = p + 1
                    except ValueError:
                        options.text = args[p]
                        p = p + 1
                elif command == 'e':
                    # e text key cipher
                    if not options.text:
                        options.text = args[p]
                        p = p + 1
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                    if not options.cipher:
                        options.cipher = args[p]
                        p = p + 1
                elif command == 'd':
                    # d cipher key text
                    if not options.cipher:
                        options.cipher = args[p]
                        p = p + 1
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                    if not options.text:
                        options.text = args[p]
                        p = p + 1
                else:
                    parser.error("too many arguments")
            except IndexError:
                pass
            if p < len(args):
                parser.error("too many arguments")

        # The one-time pad filename is always required
        if not options.key:
            parser.error("missing one-time pad filename")

        # Plaintext and ciphertext files are required for "decrypt" and "encrypt"
        if command in ('d', 'e'):
            if not options.text:
                parser.error("missing plaintext filename")
            if not options.cipher:
                parser.error("missing ciphertext filename")

        # Generate a one-time pad file
        if command == 'g':
            if options.cipher:
                parser.error("unused argument: ciphertext filename")
            if not self.check_dev():
                parser.error("random generator not available")
            if not options.force and os.path.exists(options.key):
                parser.error("file already exists: %s" % options.key)
            if options.text:
                if not os.path.exists(options.text):
                    parser.error("can't find file: %s" % options.text)
                with open(options.text, 'r') as textfile:
                    size = self.filesize(textfile)
            elif options.size:
                size = options.size
            else:
                parser.error("either plaintext file or one-time pad size is required")
            with open(options.key, 'w') as padfile:
                self.generate(padfile, size)

        # Encrypt a file using a one-time pad
        elif command == 'e':
            if not os.path.exists(options.key):
                parser.error("can't find file: %s" % options.key)
            if not os.path.exists(options.text):
                parser.error("can't find file: %s" % options.text)
            if not options.force and os.path.exists(options.cipher):
                parser.error("file already exists: %s" % options.cipher)
            with open(options.key, 'r') as padfile:
                with open(options.text, 'r') as textfile:
                    if self.filesize(textfile) > self.filesize(padfile):
                        raise RuntimeError("Not enough bytes in the one-time pad for this file!")
                    with open(options.cipher, 'w') as cipherfile:
                        self.cipher(textfile, cipherfile, padfile)

        # Decrypt a file using a one-time pad
        elif command == 'd':
            if not os.path.exists(options.key):
                parser.error("can't find file: %s" % options.key)
            if not os.path.exists(options.cipher):
                parser.error("can't find file: %s" % options.cipher)
            if not options.force and os.path.exists(options.text):
                parser.error("file already exists: %s" % options.text)
            with open(options.key, 'r') as padfile:
                with open(options.cipher, 'r') as cipherfile:
                    if self.filesize(cipherfile) > self.filesize(padfile):
                        raise RuntimeError("Not enough bytes in the one-time pad for this file!")
                    with open(options.text, 'w') as textfile:
                        self.cipher(cipherfile, textfile, padfile)

        # Unknown command
        else:
            parser.error("unknown command: %s" % args[1])

# Just a small tweak to optparse to be able to print a banner.
# (Why is there an epilog but no prolog in optparse?)
class MyHelpFormatter(optparse.IndentedHelpFormatter):
    def __init__(self, banner, *argv, **argd):
        self.banner = banner
        optparse.IndentedHelpFormatter.__init__(self, *argv, **argd)
    def format_usage(self, usage):
        msg = optparse.IndentedHelpFormatter.format_usage(self, usage)
        return '%s\n%s' % (self.banner, msg)

# Run from the command line, try to use Psyco for acceleration
if __name__ == "__main__":
    try:
        psyco.full()
    except NameError:
        pass
    OneTimePad().run()
About these ads

11 Comments »

  1. never implement your own crypto?

    http://discuss.fogcreek.com/joelonsoftware2/default.asp?cmd=show&ixPost=57153

    Comment by jose — February 18, 2010 @ 7:58 pm

  2. Not in production code Jose, but this is educational! Plus the OTP must be both the easiest and least useful crypto ever :)

    Comment by Mario Vilas — February 18, 2010 @ 8:07 pm

  3. [...] Cryptography, python Tagged: crypto, LinkedIn, linux, open source, PRNG, python, tool Breaking Code This entry was posted in Breaking Code and tagged encryption, Onetime, Python. Bookmark the [...]

    Pingback by One-time pad encryption in Python | Linux-backtrack.com — January 25, 2011 @ 4:56 am

  4. Hi !

    This implementation is very good and the only problem I see is that the script doesn’t work with a bigger key than the message. Let me explain: At this point, every time you will create a message, you will have to create the key at the same time. And there comes the problem of sharing the key (there is no trusted network except direct exchange face to face). If you build keys in advance and share them (face to face), you will not be able to use them because there is a very very low probability that your key have the same size as your message.

    A good solution is that the portion of the key used to encrypt the message is only the part that is necessary.
    ex:
    message:
    hello world
    key:
    uvpagbbaetpjzlf
    result:
    bzaluoxovesjzlf

    The “jzlf” part is useless, and the key will not be reused.

    What do you think ?

    Comment by Jens — July 21, 2011 @ 5:58 pm

  5. A friend said to me: /dev/random use random bytes. The encrypted message with a longer key than the message will have unreadable characters and/or be unreadable.
    => Why not using “< /dev/urandom tr -cd A-Za-z0-9 | head -c20" instead of "/dev/urandom" ?

    Comment by Jens — July 21, 2011 @ 7:32 pm

  6. Unreadability should not be a problem if the message is already encoded in a way that lets you know where it ends. For example you can null-terminate the string like in C for text messages.

    I like the idea of having the tool use only parts of a pre-generated pad, by consuming one byte at a time. :)

    Bear in mind though, this is just a coding example! I’m not a cryptographer and I never meant this script to be used in real life…

    Comment by Mario Vilas — July 22, 2011 @ 10:32 am

  7. >> Why not using “< /dev/urandom tr -cd A-Za-z0-9 | head -c20" instead of "/dev/urandom"?

    I'm believe that would be a bad idea, though. Filtering random bytes makes them not random anymore.

    The first problem I see is that printable text in that range (A-Z, a-z, 0-9) have always the highest bit set to 0. Since (n xor 0 = 0) that would leave the high bit of every byte unencrypted. This is a big problem when trying to encrypt binary files, for example an attacker might try to match those bits against a known set of possible plaintexts. For text files this would not be a problem… except that you're revealing the fact that you're encrypting printable text, and that might also help the attacker.

    The second problem I see is that the remaining keyspace for each byte is 2^7 (128) but the total number of possible values is less than that (59), and that made me think the remaining bits are not evenly distributed. After a quick test of converting all 59 values to base 2 and printing them you can see that they're effectively not random at all – the 7th bit is almost always 1, for example. Now, I have no idea how to exploit that, but a proper cryptographer might. This may also not be a problem when encrypting readable text (the plaintext bits would also fall in a similar pattern) but I can't be sure of this with my current knowledge of crypto, so I prefer to guess the worse.

    To solve the "readability" problem I would rather encode the ciphertext and the key using any encoding algorithm that leaves printable characters (base64, uuencode, ascii-armor, etc.).

    Comment by Mario Vilas — July 23, 2011 @ 6:35 am

  8. This is the quick test I made to see how characters are distributed:

    python -c “for x in (range(ord(‘0′),ord(‘9′))+range(ord(‘A’),ord(‘Z’))+range(ord(‘a’),ord(‘z’))): print bin(x)[2:].zfill(8)”

    Comment by Mario Vilas — July 23, 2011 @ 6:41 am

  9. “The first problem I see is that printable text in that range (A-Z, a-z, 0-9) have always the highest bit set to 0. Since (n xor 0 = 0) that would leave the high bit of every byte unencrypted [...] I can’t be sure of this with my current knowledge of crypto, so I prefer to guess the worse.”
    You’re doing it right ! This is why I tell you I’m far from a cryptographer ! Far behind you ;)
    Well, if each character begin by a bit set to 0, I think the first approach would be to test any character in the latin alphabet ‘XORed’ with themselves which leaves very few patters (26²)…. Like I’ve read yesterday, little mistakes in cryptography can make big blasts.
    My idea = Bad idea spotted ! ;)

    Comment by Jens — July 23, 2011 @ 12:45 pm

  10. [...] – Thanks to my friend @MarioVilas at Twitter for crating this great one-time pad generator tool open-source tool called otp.py. More information about the tool can be found at http://breakingcode.wordpress.com/2010/02/17/one-time-pad-encryption-in-python/ [...]

    Pingback by Use netcatpro and otp.py to transfer files using encryption — February 4, 2012 @ 12:47 pm

  11. also a nice otp implementation with some extras http://www.urbanware.org/erfr.php

    Comment by Hose — November 30, 2012 @ 11:41 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,479 other followers

%d bloggers like this: