Memorable password generator: Random vs. non-random choice and entropy loss/gain
I searched for a discussion like this on here, but couldn't find one. When using the memorable password generator, do people tend to choose randomly or do they keep clicking the generate button until they get a password composed only of words they already know and feel comfortable with? If you choose non-randomly like this, how much entropy is lost? How much entropy is gained (if any) if you deliberately choose passwords that have many words you either flat out don't know or just don't like? I have zero password cracking experience, but I would think limiting my search space to the most commonly used words in the English language on the word list as opposed to the whole list would be a good strategy. How should I think about this?
1Password Version: 8.7.2
Extension Version: 2.3.5
OS Version: Not Provided
Browser:_ Not Provided
Comments
-
@gr1516 - great questions! I can't say what people tend to do because we intentionally know nothing about the generation of your password(s); all that logic is kept client-side (in your 1Password app or browser). However, it would not surprise me to learn that there are indeed people who will "skip" a generated password if it contains words with which they're unfamiliar (on the theory that it would be easier to forget/misspell such words, perhaps).
Entropy can be thought of as the "cost" of brute-forcing the correct password (no shortcuts, just straight guessing one after the next until the correct password is discovered). For a password with an entropy of n bits, your attacker would need to check 2^(n-1) passwords before hitting the correct one (the "minus one" here being half, since we're using binary logarithm). Assuming the generated passwords are equally probable (i.e. the generator is a well-designed cryptographically secure pseudorandom number generator, which ours is), then you have n bits of entropy when there are 2^n possible passwords.
Unlike the formula for random-character passwords, which has to take into account character-space (which varies with allowed symbols, etc), word-list password entropy-estimation is considerably simpler. If you have (like we do) a random wordlist of what appears today to be 18,176 words (if I counted correctly, heh), then 2^n = 18176*, and therefore n = 14.1497, or, in non-math-speak English: just under 14.15 bits of entropy per word. From there, it's a simple matter of multiplying that by the number of words. Three words? 42.45 bits of entropy. Four words? 56.6. Five words? 70.75. And so on.
So the question of how much entropy will be lost when a person self-limits the word pool as you've suggested (or by any other method) is an equally simple two-parter:
- Change the calculation of 2^n = 18176 into whatever you think (or know) the word list will be shortened into. If you shave off, say, 1,500 words, then it would be 2^n = 16676, or n= 14.0255.
- Again, how many words are chosen.
As you may have guessed, it is that latter point - how many words are chosen - which will make the most significant difference. Four words at "full strength" would be 56.6bits of entropy, while at the reduced total wordlist of 16676, it would be 56.102. That may seem like a small amount, but remember, each additional bit is TWICE the "size," so small fractions definitely make a difference. But you can very easily address that reduction by simply adding another word, even using the smaller list. Allowing the generator to randomly choose five words instead of four increases the entropy to 70.1275 bits, which is vastly greater in real terms than four words using either list.
So, to sum up: if you're worried about lost entropy (and there is indeed some), simply increase the number of (randomly chosen) words in your passphrase. Hope that's helpful.
0 -
What a great answer. Thank you for helping me to understand this.
I think this guy may be onto something with his proposed replacement wordlist for 1Password. The auto-correct issue is plausible but never occurred to me.
Apparently, the average native English speaker knows 20,000 words, and recognizes 40,000. I'm not sure how many words were on the list you extracted from to create your list, but this (http:// unencrypted link) research at the University of Ghent claims on average people recognized two thirds of the words they were presented. Doesn't mention what fraction they actively used. So, if I only like 33% of the words on the list and keep rolling until I get my desired result, then 2^n=6059, and n=12.565 instead of 14.1497. A 4 word password is now 50.26 bits instead of 56.60. If an attacker needs to check 2^(n-1), and it costs $6 USD/2^32 guesses, then (my math might be wrong here) the cost is $941,736 instead of $76,288,513. Practically speaking, not a significant difference right now for the average user. With 5 words it's $5,706,513,790 instead of $1,385,421,878,493. If the list is limited to 10%, so 2^n=1818, then things start to get interesting though...4 words costs only $7,627. Too weak. 5 costs $13,865,175, and 6 is $25,204,503,791. More to memorize, which is less desirable. Doable, certainly. But not as pleasant and potentially problematic for seniors. Thanks again.
0 -
@gr1516 - we agree; the work done there by that Redditor is quite good, as comments by both our founder @dteare and our principal security architect, @jpgoldberg in that Reddit thread attest to.
I can't say how many words the average person knows, but I would strongly urge people not to reject 90% of the list just because they don't like the words being offered. We did do some pruning a while back of words that either clearly were or could arguably be construed as offensive or objectionable. There might be some others, and there might be a few words that are more uncommon and therefore less well-known, such as
abscissa
orbreccia
(both on our list), but that group doesn't even come close to approaching 90% of the list. Most words are more likecabinet
andmatter
and similar words that I would imagine the vast majority of people are familiar with.Even at 10% where, yes, four words would be not expensive enough to resist cracking, using your calculations, six words appears to still be well into the trillions of dollars, all-but-impossible for virtually everyone. Sure, six words is harder to remember than four...but there are options: don't restrict the word list so severely, or if you must, use more words and - as the now-famous xkcd comic puts it, use a simple mnemonic that makes it very difficult to forget, even for people with memory issues:
0 -
Thanks again Lars. You are very helpful. I have amended my view somewhat. There is something to be said for using obscure words that have to be looked up in the dictionary. They can be quite memorable in their own right. Abscicca is ok because you can picture the coordinate plane, and breccia is actually pretty good. I now enjoy repeatedly clicking on the generator to see what pops up (tawny, peruke, doily, vamoose...). Perhaps the ideal really is the list you already have, but I don't know for sure. A password that includes mostly familiar words plus one (or maybe two) obscure ones might be more memorable than a password containing only familiar words. But if it's only unfamiliar words, then that's potentially a problem too. I think it's a balancing act. Also, when choosing a master password, the user ideally has to sit down in front of their computer and consciously decide that they will use the first password generated in order to ensure picking randomly. An article about all this might be good. I'm enjoying using the product. Take care.
0