How "random" are the generated passwords?

steven1 · April 2014

Hi,

While there is a lot of information on the site and forums about the keychain formats and how many rounds of hashing is done on the master pw etc, I have not found any information at all on how 1PW for Mac and iOS generate strong passwords (Sorry if I missed the info elsewhere):

-what algorithm API is used to generate passwords?

-how is entropy collected/added

-How do you achieve the recipe (min x chars, y letters)?

I realize that while the vault may be secured, the real question is, now that you have cajoled everyone into using passwords generated by the app, how secure/non-guessable are the generated passwords? Maybe you should publish the pseudocode for the generator like you do the spec for the vault file format to prove its strength.

Thanks!

steven1 · April 2014

Hi Guys,

Any info on the above?

Thanks!

jpgoldberg · April 2014

Hi @steven1!

This is a great set of questions, and I'm sorry that I didn't get a chance to address it earlier. Quite a bit of our documentation on these sorts to things start out as postings on our discussion forums, so consider this an early draft of what may make it into our formal documentation.

Starting with random numbers

Generating random anything on a computer starts with getting random numbers. The short answer is that we get our random numbers from SecRandomCopyBytes() which is more or less the same as /dev/random. That is, we are using the CSPRNG provided by the operating system.

More detail that you probably want to know

Now there is a technical sense in which it is impossible (under certain definitions) for computers to produce random numbers at all. I've written about that in an article posted on our blog in 2012: Alan Turing’s contribution can’t be computed.

In practice, however operating systems can gather randomness from a large number of internal states, such as microsecond timing of disk and network events, and specially designed "noise" circuits on microchips. These are continually added to an "entropy pool" by the operating system kernel. That raw entropy isn't usable directly because it is not uniformly distributed. Lots of "noise" that can be used for randomness follows a normal distribution instead of a uniform one. So this data will be hashed. Those hashes aren't used directly either, instead they are used to frequently reseed a Cryptographically Secure Pseudorandom Number Generator (CSPRNG). And it those results which one gets by reading /dev/random.

I wrote in another comment on the forums why the "pseudo" in that name shouldn't put people off.

There are various schemes about how the system estimates the entropy in the entropy pool, and the ratio of pool data to hash output it uses. For example, to generate a 16 byte seed, it may "consume" 128 bytes of pool data. There are also different approaches to feeding back small amounts of output back into parts of the system. The two most popular schemes for all of these are Yarrow and Fortuna. I believe that OS X and iOS use a form of Yarrow.

Getting numbers in a range

When picking a letter, we may need to get a random number between 1 and 52. But SecRandomCopyBytes() gives us bytes. So if we ask for one byte, we will get a number between 0 and 255.

The simplest way to get a number between 1 and 52 from a random number between 0 and 255 would be to divide the number that you get by 52, take the remainder, and then add 1 to it.

In many computer programming languages the percent sign, "%" is used as the "remainder" operator. For example 23 % 10 is 3 because when you divide 23 by 10 you are left with a remainder of 3. The result of r % 10 will always be a number from 0 through 9. The result of r % 52 will always be a result from 0 through 51.

So let's use r be the random byte, and n be the number that we are trying to generate (between 1 and 52). The obvious (but as we will see, wrong) way to do what we want is

r  = GetOneByteFromCSPRNG()  // r will be from 0 through 255
n  = r % 52                  // n will be from 0 through 51
n  = n + 1                   // n will be from 1 through 52

That will make n a number between 1 and 52, inclusive; but it won't be
properly random. I really need to emphasize this because if you google around for how to get a random number in a range, that wrong answer is what you will find. Indeed, a number of random password generators in the past are known to have made that mistake.

So why is that wrong?

There are more ways for that system to give us the numbers 1 through 48 than for it to give is the numbers 49 through 52. That is, there is a bias against 49, 50, 51, and 52 here.

Let's count the ways in which we could get the number 42, as a result of that system. We can get it when r is 41, as 41 % 52 is 41, and then we add 1 to our result in the final step to get 42. In addition to when r is 41, will also get a final result of 42 when r is 93, 145, 197, or 249. So there are five possible ways that r could leave us with 42.

Now lets how many ways we could get a final result of 50. That can happen when r is 49, 101, 153, or 205. It would also work if r were 257, but r can't be greater than 255. So there are only four ways to get the number 50 as a final result.

There will always be a bias toward some of the lower numbers with this scheme unless the modulus (52 in our example) evenly divides the range of r. So if we were trying to pick numbers between 1 and 64, there wouldn't be this bias, when r gives us 256 possibilities.

Doing it right

If the range you are after is small compared to the full range of r then the bias will be small, and so it probably wouldn't have been a meaningful bias had we done this wrong, but we did it right anyway.

We don't get a bias when the possible range of r is a multiple of the size of the range we are after. So if r were limited to only 204 possible values, that that would work fine when we are looking for something between 1 and 52. So when looking for something between 1 and 52, we will restrict r to 0 through 203.

do {
    r  = GetOneByteFromCSPRNG()  // r will be from 0 through 255
} while (r > 203) // keep picking until we get an r that is 203 or less
n  = r % 52                  // n will be from 0 through 51
n  = n + 1                   // n will be from 1 through 52

It's possible to calculate that 203 from our two ranges of 256 and 52, but I will leave that as an exercise to the reader.

End of part 1

OK. This has turned out to be longer than I planned when I got into it. I will try to write more about how the 1Password Strong Password Generator works in the next day or two.

jpgoldberg · April 2014

Part II. First overview of algorithm

I probably should have started with this before going to how we can pick a number between 1 and 52 in a cryptographically appropriate way.

Letters only

If we are building a password only with the letters (US-ASCII a-z, A-Z) only then we repeatedly pick letters at random and add them to the password we are building up until we've got something the right length.

When we add each new character to the password, we actually pick a position at random to insert it into. This isn't needed for the letters only situation, but the need for it will become clear when we get to digits and symbols.

Here is some very pseudo code to illustrate that.

d = desired_password_length // this will be an input
lettersArray = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

password = null   // this is empty to beging with. We will build it up
while (d > 0 ) {
    n = pick_a_number_between(0, lengthof(lettersArray)-1)
    c = lettersArray[n] // lettersArray[0] is "a", lettersArray[51] is "Z"
    position = pick_a_number_between(0, lengthof(password))
    password = insert_into_string_at_position(password, c, pos)
    d = d - 1  // when d reaches zero, we've built up the full password.
}

One thing to note here is that we fetch a new random number for each character being added to the password being built. This ensures that each letter of the password is entirely independent of any of the others.

Beyond letters

For letters and digits, we first build up a password of letters only. If someone wants a 20 character password with 3 digits in it we will first build a 17 character password with letters only. We will then start picking digits. Once we pick a digit, we then select, again, randomly where to insert it into the password we are building up.

For example, after we have something that is 17 letters long, and say be have now randomly selected the digit "4", there are 18 places that this "4" can be added into the password. So we pick a number from 0 through 17 and add in that "4".

Now we have something that is 18 characters long. We pick another digit, and then pick a number from 0 through 18 (so 19 possibilities) to select a position to add that digit.

Symbols work the same way as letters.

No repeating characters

Ages ago, some websites were trying to prevent people from using passwords like "aaaaaaa" and do they banned repeating letters. As a consequence, we still have a legacy of "don't allow repeating characters" in our password generator as an option.

Anyway if people "disallow repeating characters" then at the insert_into_string_at_position(password, c, pos) step we check to see
if doing do would lead to consecutive repeating characters. If it does, then we go back and pick a new character all together.

Math to follow

I actually have a draft of an article in which I worked out most of the math for calculating the true strength of our generated passwords. I will try to dig that up and post what I can here.

Stephen_C · May 2014

I confess I'm grinning irreverently after reading the last two posts on this thread. Forum users be warned: if you ask a cryptographic question you will get a most carefully constructed, diabolically detailed answer from the master himself. I find the posts absorbing but have to hold tightly on to my chair before I can really understand much of the detail.

It's always good to know this degree of thought is behind the scenes at AgileBits, so all hail to @jpgoldberg!

Stephen

jpgoldberg · May 2014

You can find a description of the math in the difference between things like "exactly 2 digits" (as we do things) and "at least 2 digits" (as another way of doing things) in a comment I posted in another discussion.

steven1 · May 2014

hi @Stephen_C, not really sure what you mean. Maybe you assume all 1pw users are like you, having to hold on to their chair for some simple descriptions. My questions are indeed coming from the place of a generally satisfied user asking technical questions. Trust me, the software you use and love (as do I) will be better for it :-)

@jpgoldberg, thanks for the response. Too often we assume that because a computer generated a pw, it must be awesomely random. Glad to see some of the care going into the generated passwords, which in the end if we are diligent about it, are what we are really trying to protect.

One question about the pool of characters you pick from--any thoughts on increasing to include character sets other than ASCII? Switching to the UNICODE, UTF-16, or UTF-32? Yes, a lot of web logins may not be able to accept this, but it would certainly make it harder to crack encrypted DMGs, etc.

Thanks!
@steven1

jpgoldberg · May 2014

Thank you @stephen_C and @steven1!

Ideally all of this would be fully documented, and perhaps these comments will work their way into documentation.

I find the posts absorbing but have to hold tightly on to my chair before I can really understand much of the detail.

For questions like these, I don't like giving the "short answer", because the short answer is equivalent to "Trust us, ma'am. We're professionals."

A lot of what goes on beneath the hood (or "under the bonnet" for those of you across an ocean from where I am writing) is invisible to most users. It's nice to get the opportunity to show off the design and thinking that went into these things.

Too often we assume that because a computer generated a pw, it must be awesomely random.

You are absolutely correct. Some "popular" password generators in the past have been shown to have the problem I described about the bias against the high end of the range. I've heard that there were others that picked a single random number and then generated a password off of that, and so were effectively seeding their generator with a small seed. But I can't be certain of that. (And for the former, I'd have to do some searching of literature, but I know I've seen the analyses.

Bad PRNGs or badly seeded CSPRNG are all too common. Those probably wouldn't matter to much for generated passwords, but we need to ensure that the encryption keys we generate are strong. For the most important keys, we take some extra precautions against slight biases in the CSPRNG offered by the system. (We have no reason to believe that such slight biases exist; and larger biases would have been detected by researchers long ago; but we also know that CSPRNGs are hard to do right.)

One question about the pool of characters you pick from--any thoughts on increasing to include character sets other than ASCII? Switching to the UNICODE, UTF-16, or UTF-32?

My thoughts are pretty much what you said,

Yes, a lot of web logins may not be able to accept this

It would just break too many things. We can't even guarantee that such a password would be handled the same way when copy/pasted on different operating systems or in different browsers on the same operating system. This has actually become an issue with 1Password Master Passwords for some people. 1Password is completely agnostic and you can use UTF-8 for a Master Password. But even on the same operating system the letter ü when entered on a US keyboard will end up being a different string of bits then when entered on some German keyboards. So we strongly advise people to keep their Master Passwords to US ASCII.

So even in a case where we know the thing you are giving the data to (1Password) can handle UTF-8, using it can cause things to break. Imagine when we do so without having any idea of how the password will be processed by the receiving system.

Normally I hate having to use the "lowest common denominator" in security, but in this case people can get the same advantages by using longer passwords. Remember that a password comprised of only the letters a through z and A through Z reaches 128 bits of strength once it is 23 characters long. log_2(52^23) is approximately 131. If you've got a use case for something stronger, I'd love to hear about it. But first read: "Guess why we're moving to 256-bit keys"