Breaking Code

February 17, 2010

One-time pad encryption in Python

Filed under: Cryptography — Tags: , , , , , , — Mario Vilas @ 4:58 am

After some crypto sillyness with @feliam, @julianor and @ortegaalfredo on Twitter I cooked up a one-time pad crypto implementation in Python. This speaks volumes, not of my talent as a cryptographer (which is none at all) but the sad state of my social life these days (which happens to be the same amount).

What is one-time pad encryption?

Feel free to skip this section if you already know the answer. Especially so you don’t have to suffer my layman’s explanations of cryptography. 🙂

To put it simply, a one-time pad cipher is one in which the plaintext (i.e. the original message) is encoded using a completely new, random key each time it’s sent. When properly used (and I hope I have) this system is provably unbreakable. That means the ciphertext (that is, the encoded message) can never be decoded without the proper key – even if the encoding algorithm is very simple, like a bitwise XOR operation on each byte.

There are a few caveats: first of all, the key can never be reused. If you do, the system not only ceases to be unbreakable but it’s also as strong as the encoding algorithm you used. So if you chose XOR encoding and were tempted to use the key twice, you might as well have used a “magic ring” from a cereal box. 😉

The second caveat: the key must be truly random. For this reason need a random generator that can guarantee a certain amount of entropy, for example /dev/random on many Unix systems, to get the one-time pads, instead of the random module, which is only a PRNG (pseudo-random generator). PRNGs can only produce a seemingly random stream of numbers, all derived from a single value (called the seed number) – so it’s “randomness” is just as good as the seed number from which all others are calculated. (This is a useful property in other contexts, like avoiding to have to store the contents of all malformed files produced by a fuzzer in order to reproduce the crashes, but I digress).

The third caveat: the key must never be transmitted over an insecure medium. Sounds pretty much like a no-brainer, I know, but it’s worth mentioning that public key crypto doesn’t suffer from this problem. (Now you know why GPG is so much better than this). Real-life uses of one-time pads include storing the keys in codebooks, which the recipient of the message would carry everywhere. Then the encrypted messages could be safely sent in the clear, say on some radio frequency by a numbers station, until the codebook was used up.

How does this code work?

If you weren’t among the lucky ones who skipped over my ramblings in the previous section you can easily guess by now: we’ll be using a bitwise XOR encoding of each byte of the plaintext against the corresponding byte of the one-time pad to produce the ciphertext. This is how we generate a one-time pad of any given size:

    $ ./ generate test.key -s 1024
    $ ls -l test.key
    -rw-r--r-- 1 user group 1024 2010-02-17 01:23 test.key

The alternative for the lazy is to pass the name of the file we want to encrypt. A one-time pad of the exact same size will be generated. We’ll use the -f flag this time to force overwriting the previous file.

    $ ./ generate test.key conscience.txt -f
    $ ls -l test.key conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:24 test.key

And to satisfy all audiences, there’s also an option for the paranoid: the -p flag uses /dev/random for maximum security instead of the much faster /dev/urandom. It does take considerably longer to generate even small one-time pads, that’s why this option is disabled by default.

    $ ./ generate test.key conscience.txt -f -p
    $ ls -l test.key conscience.txt 
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 test.key

Now that we have our one-time pad we can encrypt the message:

    $ ./ encrypt conscience.txt test.key conscience.crypto 
    $ ls -l conscience.*
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience.crypto
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt

Both files are the same size but have different contents. Since it’s no longer ASCII trying to cat the file only renders a bunch of garbage in the terminal. Finally, this is how you decrypt it:

    $ ./ decrypt conscience.crypto test.key conscience2.txt
    $ ls -l conscience*
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience2.txt
    -rw-r--r-- 1 user group 3880 2010-02-17 01:38 conscience.crypto
    -rw-r--r-- 1 user group 3880 2010-02-17 01:22 conscience.txt
    $ cmp conscience.txt conscience2.txt 

After decryption, conscience2.txt is identical to the original file and contains the familiar text of The Conscience of a Hacker.

As always, the code is available for download below. Enjoy! 🙂


  • 24-Jul-2011: Small update to the command like parsing and the documentation.


Source code

#!/usr/bin/env python

# One-time pad example in Python
# Copyright (c) 2009-2011, Mario Vilas
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice,this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of the copyright holder nor the names of its
#       contributors may be used to endorse or promote products derived from
#       this software without specific prior written permission.

from __future__ import with_statement

import sys
import os.path
import optparse

    import psyco
    from psyco.classes import *
except ImportError:

class OneTimePad(object):
This is a simple one-time pad cipher implementation in Python.
For more information on one-time pads:

A word of warning: while I made my best effort to avoid programming mistakes
and one time pads are not that hard to implement, I'm not a cryptographer. So
until a proper cryptographer can validate this code, use it at your own risk!

It's best to use /dev/random here, instead of the "random" module. The reason
for that is that the random module doesn't truly provide random numbers, but
pseudo-random numbers that are calculated from a seed number. So the entire
amount of information contained in a stream of pseudo-random numbers fits
into the seed number... using this would be no more secure than just picking
a single number, defeating the whole purpose of using a one-time pad.

It's also not a good idea to limit /dev/random to printable characters only,
so if the user wants printable files we'll have to proceed as usual and later
encode the results.

One-time pad generation only works on Unix systems, but it should be possible
to port it to Windows by deriving this class and reimplementing the check_dev()
and random() methods to use Win32 crypto API. Encryption and decryption should
work in all platforms.

    # Buffer size for file access
    block_size = 65536

    # Paranoid mode (use /dev/random instead of the faster /dev/urandom)
    paranoid = False

    # Placeholder for random generator device (open file)
    dev = None

    # Check for the presence of the required random generator device
    def check_dev(self):
        if self.paranoid:
            return os.path.exists('/dev/random')
        return os.path.exists('/dev/urandom')

    # Generate a string of random bytes
    def random(self, size):
        if not
            if self.paranoid:
       = open('/dev/random', 'r')
       = open('/dev/urandom', 'r')

    # Get the size of an open file
    def filesize(self, fd):,2)
        n = fd.tell(),0)
        return n

    # Generate a random one-time pad
    def generate(self, padfile, total_size):
        random = self.random
        block_size = self.block_size
        while total_size > 0:
            block = random( min(block_size, total_size) )
            total_size = total_size - len(block)

    # Encrypt or decrypt a file using a one-time pad
    def cipher(self, infile, outfile, padfile):
        block_size = self.block_size
        while 1:
            data =
            if not data:
            pad =
            encoded = ''.join([ chr(ord(a) ^ ord(b)) for a, b in zip(data, pad) ])

    # Main function (most of it is command line parsing stuff)
    def run(self):

        # Define a command line parser
        banner = (
            "One-time pad example in Python\n"
            "by Mario Vilas (mvilas at gmail dot com)\n"
        usage = (
            "Create a one-time pad:\n"
            "    ./%prog generate -k example.key -s size\n"
            "    ./%prog generate -k example.key -t example.txt\n"
            "Encrypt a file:\n"
            "    ./%prog encrypt -t example.txt -k example.key -c example.cipher\n"
            "Decrypt a file:\n"
            "    ./%prog decrypt -c example.cipher -k example.key -t example.txt"
        formatter = MyHelpFormatter(banner, max_help_position=26)
        parser = optparse.OptionParser(usage=usage, formatter=formatter)
        parser.add_option("-t", "--text", action="store", type="string",
                          metavar="FILE", help="plaintext filename")
        parser.add_option("-c", "--cipher", action="store", type="string",
                          metavar="FILE", help="ciphertext filename")
        parser.add_option("-k", "--key", action="store", type="string",
                          metavar="FILE", help="one-time pad filename")
        parser.add_option("-s", "--size", action="store", type="int",
                          metavar="NUM", help="one-time pad size in bytes")
        parser.add_option("-f", "--force", action="store_true", default=False,
                          help="force overwriting of any output files")
        parser.add_option("-p", "--paranoid", action="store_true", default=False,
                          help="use /dev/random instead of /dev/urandom (slower!)")

        # Parse the command line
        args = list(sys.argv)
        if len(args) == 1:
            args = args + [ '--help' ]
        options, args = parser.parse_args(args)

        # Set paranoid mode if requested
        self.paranoid = options.paranoid

        # Check command is present
        if len(args) < 2:
            parser.error("missing command")
        command = args[1].strip().lower()[0:1]
        if not command:
            parser.error("missing command")

        # If more parameters are present, try to guess what they are
        if len(args) > 2:
            p = 2
                if command == 'g':
                    # g key size
                    # g key text
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                        options.size = int(args[p])
                        p = p + 1
                    except ValueError:
                        options.text = args[p]
                        p = p + 1
                elif command == 'e':
                    # e text key cipher
                    if not options.text:
                        options.text = args[p]
                        p = p + 1
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                    if not options.cipher:
                        options.cipher = args[p]
                        p = p + 1
                elif command == 'd':
                    # d cipher key text
                    if not options.cipher:
                        options.cipher = args[p]
                        p = p + 1
                    if not options.key:
                        options.key = args[p]
                        p = p + 1
                    if not options.text:
                        options.text = args[p]
                        p = p + 1
                    parser.error("too many arguments")
            except IndexError:
            if p < len(args):
                parser.error("too many arguments")

        # The one-time pad filename is always required
        if not options.key:
            parser.error("missing one-time pad filename")

        # Plaintext and ciphertext files are required for "decrypt" and "encrypt"
        if command in ('d', 'e'):
            if not options.text:
                parser.error("missing plaintext filename")
            if not options.cipher:
                parser.error("missing ciphertext filename")

        # Generate a one-time pad file
        if command == 'g':
            if options.cipher:
                parser.error("unused argument: ciphertext filename")
            if not self.check_dev():
                parser.error("random generator not available")
            if not options.force and os.path.exists(options.key):
                parser.error("file already exists: %s" % options.key)
            if options.text:
                if not os.path.exists(options.text):
                    parser.error("can't find file: %s" % options.text)
                with open(options.text, 'r') as textfile:
                    size = self.filesize(textfile)
            elif options.size:
                size = options.size
                parser.error("either plaintext file or one-time pad size is required")
            with open(options.key, 'w') as padfile:
                self.generate(padfile, size)

        # Encrypt a file using a one-time pad
        elif command == 'e':
            if not os.path.exists(options.key):
                parser.error("can't find file: %s" % options.key)
            if not os.path.exists(options.text):
                parser.error("can't find file: %s" % options.text)
            if not options.force and os.path.exists(options.cipher):
                parser.error("file already exists: %s" % options.cipher)
            with open(options.key, 'r') as padfile:
                with open(options.text, 'r') as textfile:
                    if self.filesize(textfile) > self.filesize(padfile):
                        raise RuntimeError("Not enough bytes in the one-time pad for this file!")
                    with open(options.cipher, 'w') as cipherfile:
                        self.cipher(textfile, cipherfile, padfile)

        # Decrypt a file using a one-time pad
        elif command == 'd':
            if not os.path.exists(options.key):
                parser.error("can't find file: %s" % options.key)
            if not os.path.exists(options.cipher):
                parser.error("can't find file: %s" % options.cipher)
            if not options.force and os.path.exists(options.text):
                parser.error("file already exists: %s" % options.text)
            with open(options.key, 'r') as padfile:
                with open(options.cipher, 'r') as cipherfile:
                    if self.filesize(cipherfile) > self.filesize(padfile):
                        raise RuntimeError("Not enough bytes in the one-time pad for this file!")
                    with open(options.text, 'w') as textfile:
                        self.cipher(cipherfile, textfile, padfile)

        # Unknown command
            parser.error("unknown command: %s" % args[1])

# Just a small tweak to optparse to be able to print a banner.
# (Why is there an epilog but no prolog in optparse?)
class MyHelpFormatter(optparse.IndentedHelpFormatter):
    def __init__(self, banner, *argv, **argd):
        self.banner = banner
        optparse.IndentedHelpFormatter.__init__(self, *argv, **argd)
    def format_usage(self, usage):
        msg = optparse.IndentedHelpFormatter.format_usage(self, usage)
        return '%s\n%s' % (self.banner, msg)

# Run from the command line, try to use Psyco for acceleration
if __name__ == "__main__":
    except NameError:


  1. never implement your own crypto?

    Comment by jose — February 18, 2010 @ 7:58 pm

  2. Not in production code Jose, but this is educational! Plus the OTP must be both the easiest and least useful crypto ever 🙂

    Comment by Mario Vilas — February 18, 2010 @ 8:07 pm

  3. […] Cryptography, python Tagged: crypto, LinkedIn, linux, open source, PRNG, python, tool Breaking Code This entry was posted in Breaking Code and tagged encryption, Onetime, Python. Bookmark the […]

    Pingback by One-time pad encryption in Python | — January 25, 2011 @ 4:56 am

  4. Hi !

    This implementation is very good and the only problem I see is that the script doesn’t work with a bigger key than the message. Let me explain: At this point, every time you will create a message, you will have to create the key at the same time. And there comes the problem of sharing the key (there is no trusted network except direct exchange face to face). If you build keys in advance and share them (face to face), you will not be able to use them because there is a very very low probability that your key have the same size as your message.

    A good solution is that the portion of the key used to encrypt the message is only the part that is necessary.
    hello world

    The “jzlf” part is useless, and the key will not be reused.

    What do you think ?

    Comment by Jens — July 21, 2011 @ 5:58 pm

  5. A friend said to me: /dev/random use random bytes. The encrypted message with a longer key than the message will have unreadable characters and/or be unreadable.
    => Why not using “< /dev/urandom tr -cd A-Za-z0-9 | head -c20" instead of "/dev/urandom" ?

    Comment by Jens — July 21, 2011 @ 7:32 pm

  6. Unreadability should not be a problem if the message is already encoded in a way that lets you know where it ends. For example you can null-terminate the string like in C for text messages.

    I like the idea of having the tool use only parts of a pre-generated pad, by consuming one byte at a time. 🙂

    Bear in mind though, this is just a coding example! I’m not a cryptographer and I never meant this script to be used in real life…

    Comment by Mario Vilas — July 22, 2011 @ 10:32 am

  7. >> Why not using “< /dev/urandom tr -cd A-Za-z0-9 | head -c20" instead of "/dev/urandom"?

    I'm believe that would be a bad idea, though. Filtering random bytes makes them not random anymore.

    The first problem I see is that printable text in that range (A-Z, a-z, 0-9) have always the highest bit set to 0. Since (n xor 0 = 0) that would leave the high bit of every byte unencrypted. This is a big problem when trying to encrypt binary files, for example an attacker might try to match those bits against a known set of possible plaintexts. For text files this would not be a problem… except that you're revealing the fact that you're encrypting printable text, and that might also help the attacker.

    The second problem I see is that the remaining keyspace for each byte is 2^7 (128) but the total number of possible values is less than that (59), and that made me think the remaining bits are not evenly distributed. After a quick test of converting all 59 values to base 2 and printing them you can see that they're effectively not random at all – the 7th bit is almost always 1, for example. Now, I have no idea how to exploit that, but a proper cryptographer might. This may also not be a problem when encrypting readable text (the plaintext bits would also fall in a similar pattern) but I can't be sure of this with my current knowledge of crypto, so I prefer to guess the worse.

    To solve the "readability" problem I would rather encode the ciphertext and the key using any encoding algorithm that leaves printable characters (base64, uuencode, ascii-armor, etc.).

    Comment by Mario Vilas — July 23, 2011 @ 6:35 am

  8. This is the quick test I made to see how characters are distributed:

    python -c “for x in (range(ord(‘0’),ord(‘9’))+range(ord(‘A’),ord(‘Z’))+range(ord(‘a’),ord(‘z’))): print bin(x)[2:].zfill(8)”

    Comment by Mario Vilas — July 23, 2011 @ 6:41 am

  9. “The first problem I see is that printable text in that range (A-Z, a-z, 0-9) have always the highest bit set to 0. Since (n xor 0 = 0) that would leave the high bit of every byte unencrypted […] I can’t be sure of this with my current knowledge of crypto, so I prefer to guess the worse.”
    You’re doing it right ! This is why I tell you I’m far from a cryptographer ! Far behind you 😉
    Well, if each character begin by a bit set to 0, I think the first approach would be to test any character in the latin alphabet ‘XORed’ with themselves which leaves very few patters (26²)…. Like I’ve read yesterday, little mistakes in cryptography can make big blasts.
    My idea = Bad idea spotted ! 😉

    Comment by Jens — July 23, 2011 @ 12:45 pm

  10. […] – Thanks to my friend @MarioVilas at Twitter for crating this great one-time pad generator tool open-source tool called More information about the tool can be found at […]

    Pingback by Use netcatpro and to transfer files using encryption — February 4, 2012 @ 12:47 pm

  11. also a nice otp implementation with some extras

    Comment by Hose — November 30, 2012 @ 11:41 am

  12. to be frank i tried executing code .. but i am not able to understand the output. it shows this:
    can you plz explain me this
    One-time pad example in Python
    by Mario Vilas (mvilas at gmail dot com)


    Create a one-time pad:
    ./ generate -k example.key -s size
    ./ generate -k example.key -t example.txt
    Encrypt a file:
    ./ encrypt -t example.txt -k example.key -c example.cipher
    Decrypt a file:
    ./ decrypt -c example.cipher -k example.key -t example.txt

    -h, –help show this help message and exit
    -t FILE, –text=FILE plaintext filename
    -c FILE, –cipher=FILE ciphertext filename
    -k FILE, –key=FILE one-time pad filename
    -s NUM, –size=NUM one-time pad size in bytes
    -f, –force force overwriting of any output files
    -p, –paranoid use /dev/random instead of /dev/urandom (slower!)

    Comment by joa — July 8, 2015 @ 6:47 pm

  13. @joa, is it possible that you did not give any or invalid arguments? Just executing the code or without the required arguments will print the usage information. This is not an interactive program. It requires command-line arguments. The usage information tells you what is required. There also are usage examples.

    I also took a look at “erfr” which “hose” mentioned above. Works the same way.

    Comment by Brandon — October 26, 2015 @ 2:36 pm

  14. Hi!

    I cannot code, but I find your example very nice.
    Thanks for it.
    But in order to make it actually useful in practice, could you please implement it such that, as mentioned by Jens, it can take larger keys?
    The code should only assert that the key is >= than the plain text.
    Unreadable cypher text (in case you cannot pass on the file but need to paste it somewhere) is not a problem, as you can easily b64encode it.
    Unreadable decrypted code is the only real issue.
    The solution would be if you can somehow define a marker for the end of the plaintext, such that in the decryption method you’d decrypt step by step, until the end of plaintext is reached, i.e. until you read the end marker. The rest of the cypher text (the one that would result in unreadable random appended to the file, thus preventing the file from being opened.
    But the problem is that the marker needs to be something that under no circumstances can be part of the plaintext itself – easy if you only want to encrypt text. But what if I wanted to encrypt e binary?

    Comment by Tom — November 5, 2016 @ 5:47 pm

  15. So in your current implementation, not only you would have to know the size of the texts you want to encrypt and send, before you even write them, so that you can hand your contact the corresponding keys, but also does the cypher text reveal the actually exact length of the plain text.
    With very long texts it might not be a problem. But what if you send an answer to a Yes-or-No question and the answer is exactly 3 bytes long? You see my point? The OTP scrambles the mapping of cypher text letter to plain text letter but if you can guess the plain text by other means, it cannot help. If those other means are outside the cypher text itself (like key compromise, keylogger, …), this is not in the scope of the OTP encryption. But if it is the cypher text which reveals something, then it does fall into the domain of the implementation.
    So, for the paranoid: Do NOT only use the part of the (longer) key that you really need, but always a tiny bit more. Not a fixed number though.
    I’d suggest than during encryption you choose a number of additional bytes (bits?) at random, with higher numbers getting exponentially less likely, c.f. exponential distribution. The average of this overhead length could be linked to the plaintext file length.
    In this case you’d need still only one marker for the decryption, it tells where plaintext ends, and the rest, the appended part of the key (random bits) is ignored, yet you have not used up the entire key.

    For increased usability:
    Re-use keyfile.
    No not, what it sounds like. It’S a ONE-TIME pad, I know. you won’t reuse any parts of the key.

    Consider this:
    You meet your contact only once in a year. You have no idea how many messages you will exchange and how big each one will be. you could think and think and ask the stars, and then create many differently sized keyfiles, some of which you might never use at all.
    What if instead of this, you simply exchange TWO single files of keydata ( contact1ToContact2.date1.key and contact2ToContact1.date1.key ) of, say, 500GB?
    Now, each time you encrypt something with that key, your encryption uses len(plaintext)+len(randomPadding) of the key file, and marks that part as used. Either by saving the remaining part of it into a copy contact1ToContact2.date2.key or by changing a counter a counter in the keyfile itself. How can you decode old messages then? or if they arrive out of order?
    In that case, i think, the implementation with the marker is better than the one copying the remaining version.
    Now when you encrypt a message, you read the out from the beginning of the file the stored index of currently first still unused bit of key material. you include this information in the cypher message, and start encrypting form that position. When finished, you update the counter in the beginning of the keyfile. That way, when you encrypt the next message, you do not reuse any key material, but also no key material is wasted. And if you want to decrypt even older messages you sent yourself, of which you don’t have a copy of the plain text anymore, you still have the key material (as it was not deleted) and the counter that gives you the start position of the used key material from the key file is included in the cypher text.
    that was
    Of course, one could argue, that keeping used key material for long, increases the chance of your home being raided and the key be found and old messages be decrypted. But for many, this scenario is just unlikely, and for those who fear it, include a simple function that takes the keyfile and deletes all key material prior to the position specify in the current index.

    If cypher texts are not delivered instantly, there could be messages be moving from Alice to Bob and from Bob to Alice simultaneously. In order to prevent them from using the same key material, they should use one keyfile only for encryption, the other only for decryption. (What Alice uses for encryption, Bob uses for decryption, and vice-versa)

    Decryption does exactly what it does in your current implemtnetion, except three things:
    1: It reads from the cyphertext the index of the key material that was used
    2: It does not start at the beginning of the keyfile, but at the given index
    3: it stops decryption when the marked for end of plain text was found during decryption

    Yes, that is a bit more complex, but would make the entire program pretty neat and usable, I think.

    Comment by Tom — November 5, 2016 @ 6:25 pm

  16. your license does not allow redistribution of modifications. maybe you cold release it ( optionally additionally) under GPL or another free open source license.

    Comment by Tom — November 5, 2016 @ 10:35 pm

  17. Ok, you can do without the marker for end of plain text, if you simply read the first few bits of the pad (it is random and secret and known to both communication partners) interpreted as a number modulo a maximum number than can be hard-coded or linked to plain text length.

    Comment by Tom — November 7, 2016 @ 10:05 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: