Random selection of Master Password for 1Password - the deadly game of probability

royofsf
royofsf
Community Member

After reading one of your blogs on Diceware (http://blog.agilebits.com/2011/06/21/toward-better-master-passwords/) and the extremely strong passwords that can be generated by, say, rolling five dice to get a word from the Diceware list (7776 words) and then repeating that process five more times to get six randomly selected words each time, I had another idea: why couldn't I use a dictionary of 110,000 English words like the one I'm looking at and go to six random pages to and writing down the first word that I see on each page. My effort got banana republic, material, jape, scleroid, politics and johnycake.

I see some weaknesses in this system:
1. their all dictionary words
2. they are not 6 words generated by 30 throws of of an individual die
3. the results aren't oddball letters that often make up no word
4. Diceware involves 30 throws (6 throws of five dice, for example) whereas the dictionary random search selects one entire word and, therefore, there are only six random actions necessary to find the six words.

The latter point alone makes this dictionary system a lot lees random than the 30 throws (6 throws of 5 dice). However, is it a strong enough system since it comes from a base of 110,000 words? Anyone have any thoughts?

Comments

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    Hi @royofsf‌

    That is a great set of questions. If your method of selecting pages at random truly is random, and if your selection of a word from those selected pages truly is random, then your method will be stronger because it uses a list of 110,000 words instead of a list of 7776.

    But notice that I emphasized the word "if" in that sentence. My concern is that when people select things "at random" they really aren't acting at random. For example, I suspect that once on a page your eye was drawn to familiar nouns. I really tend to be very suspicious any time "banana" comes up as something people do randomly. As is becoming a habit, Randall Monroe manages to sum up in a single xkcd comic what it takes me many words to explain.

    XKCD: I'm so random

    If you are willing, let's check something out. Go back to one of those pages, say the one with "republic" on it. Count how many words there are on the page and how many of them are nouns. Do that for a few of the pages, so we get a sense of the ration of nouns to all words. Once you come back with an estimate based on counting) I'll plug that ratio into a binomial distribution and see how likely it is that a truly random selection of words from a page would result in you getting 6 nouns out of 7 trials.

    So if you want to use a dictionary that way, you may wish to use a deck of cards to pick a word from each page. Let spades be 1 through 13 (ace low), diamonds be 14 - 26, etc. Shuffle the cards, and draw one. Find that word on the page. If, say, there are forty words on a page and you draw an 10 of hearts (49) then shuffle and draw again. (If there are more than 52 words on a page, then this method will exclude those, but 52 words per page is still good.)

    A longer list will get better results

    So definitely, selecting randomly from a longer list will get you better results. Let's do some math. Strength expressed in bits, and the assumption that words are selected following a truly uniform distribution.

    Words 7776 list 110000 list
    3 38 bits 50 bits
    4 51 bits 67 bits
    5 64 bits 83 bits
    6 77 bits 100 bits

    So if you can find a way to select in a fashion where each of the 110000 words is as likely to be selected as any other, then do so. That will be better.

    But if subtle human biases can play a role in the "randomness" you are better off using a system that has some kind of external random number generator, whether dice or cards or computers.

  • royofsf
    royofsf
    Community Member

    JD - You're on. I'll get back to you after a trip tomorrow. I like your deck of cards technique to keeping it random. It very well may be that my eye was draw to certain kinds of words (ie, familiar looking, nouns, and maybe extra factors that I just can't see).

    No matter what, it's one heck of a lot better than what most folks do. I keep thinking that perfection shouldn't be the enemy of good or excellent. Most steps to success in business or life are not absolutely predictable IMHO.

  • royofsf
    royofsf
    Community Member

    JD - The six pages averages 42 names/page & 35 nouns/page or 83%. And 6 nouns/7 words = 86% for my tiny sample intial sample. Where do we go from here?

  • hawkmoth
    hawkmoth
    Community Member

    This is fun, but picking four or five Diceware words and knowing that no human-machine combination could crack it by brute force in the time before the human race is likely to go extinct is good enough for me. :)

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    @royofsf‌, at a probability of 0.83 of drawing a noun from a page, the chances of getting 6 nouns out of seven are ... quite strong.

    R> binom.test(x=6, n=7, p=0.83, alternative="greater")
    
        Exact binomial test
    
    data:  6 and 7
    number of successes = 6, number of trials = 7, p-value = 0.6604
    alternative hypothesis: true probability of success is greater than 0.83
    95 percent confidence interval:
     0.479297 1.000000
    sample estimates:
    probability of success 
                 0.8571429 
    

    So that 6/7 does not suggest that you were behaving non-randomly.

    There may have been other things that weren't random, but this noun/non-noun result is fully consistent with you behaving randomly.

  • royofsf
    royofsf
    Community Member
    edited April 2014

    JD - Is it a viable system (with or without the playing cards that you suggested to find the word on the page)?

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    I don't mean to be evasive. But I can't declare your system viable without the playing cards.

    There may be a pattern to how you pick words on a page. It isn't the particular pattern that I first guessed at, but there may be some other pattern that I hadn't guessed at. (I actually have some more guesses, but it would take a lot more work than just counting nouns. And these are hard to test with just seven trials.)

    The point that I'm trying to make is that when the selection process is in your head, it may very well be insufficiently random. But with the cards (as long as you shuffle reasonable well) we know that it will be sufficiently random. With the dice and with the cards we know how random it is. With anything else, we don't know.

    Because humans tend to be very non-random when they try to be random, I'm inclined to still consider your system suspect even if I can't stop the bias. So why not go with the cards or the dice.

    If, say, you suffer from a gambling addiction and have sworn off ever touching cards or dice, I'm sure that we can find an acceptable substitute.

  • royofsf
    royofsf
    Community Member

    JD - I understand and agree (as a non-expert). I may and probably do have a bias when I open the dictionary to two pages, even if I close my eyes and drop my finger or a coin somewhere. So I may have had an unconscious bias toward certain words, picking away from the page edges, etc. Who knows?

    My only pushback on the cards was adding another element. I was hoping to keep it simple. The dictionary pick without the cards is much faster than dice. Few people may have 5 dice around. Most should have a good dictionary, but some may not.

    So while I agree with you in the cards, I imagine that the semi-random selection without them would work well enough but it cannot be proven. With or without the cards, it's much faster than dice in my testing. However, I know that protection and not speed is the goal. But the more that's required, the fewer people that will use it.

    The dictionary has one other advantage. You suggested in a Diceware FAQ article to look up the meaning of oddball words as an aid in putting together a memorable sentence for recall of the password. Well, that's already done with the dictionary method. The definition of each word is sitting right in front of the person.

    Lastly, as you pointe out, the dictionary database is 110,000 words vs less than 8000 with dice, 14 times more words. That's good odds even without the mathematical proof.

  • jpgoldberg
    jpgoldberg
    1Password Alumni

    Oh I agree, @royofsf. You are probably not doing any harm to yourself with your method. Particularly if your password will be seven words long. But I can't "endorse" such a method for the reasons that I explained.

    And, as I did say in "Toward Better Master Passwords", we shouldn't insist on perfection if it is too burdensome. So you have to use what actually and practically worlds for you.

This discussion has been closed.