How to Securely Generate a BIP39 Recovery Phrase?
Choosing the Right Source Material
As we concluded, when looking for a solution we have to comply with two
contradictory conditions - the sequence should be mnemonic, but at the same time completely random.
Impossible? Not necessarily.
Looking closely at our surroundings, we can assume that almost everything around us carries some data. Every newspaper, every picture, and every piece of information available in the public space provides usable data.
The most obvious and common ones are mathematical and physical constants, mathematical strings and series, space distances,
chemical constants and so on.
Wherever we are in the world, we can easily access them all without the need to write them down and carry them with us. Such a feature is a big advantage
and a great working basis for our task. Numbers are the most obvious, but not the only data we can employ.
While a newspaper's lifespan is brief and any past issues are hard to get, books, in most cases, are immortal. Best sellers
and great works of literature are available worldwide.
Now the only thing we need is to choose one and specify an individual and easy to remember method, which will lead us in a smart way to extracting
the necessary set of words.
Once achieved, we no longer need to write it down or remember it. Knowing our individual method is the only thing we need to recover the seed
anytime we need, using just a pencil and a piece of paper.
The possibilities for creating a method are endless; the only limit is our own imagination.
For example, let's choose one of my favorite books, known all over the world: The Millennium Trilogy by Stieg Larsson.
It was 2009 world's best seller no. 1, with 45 million copies sold worldwide in the year of release. Reissued many times, nowadays available even online.
I have no doubts I could get it anywhere in the civilised world if I only needed.
An important caveat here: the fact that this is my favorite book should automatically disqualify it as a choice. Anyone who knows me could guess I used it as the basis for my key. The source material should be neutral, and this choice violates that rule.
It results from my individual thinking schemes and emotional involvements, in this particular case from my literary preferences.
This is a trap you should always try to avoid because of security reasons.
But even if you give in to temptation and choose your favorite book, the next step will eliminate effectively all the risk.
I choose part 1, „The girl with the dragon tattoo”. Now it is time for a completely random component—which is the only thing I will actually have to remember. To isolate a random part of the text, I choose chapter 5. To make the randomization even deeper, I don't start at the very beginning of the chapter, but rather immerse myself inside it. I pick a random number—for example, 50. Keep in mind it could be any other number, even a much larger one, like 1969, which is easy to remember as the year of the Apollo 11 Moon landing.
So now let's take a look what we get. The beginning of the 5th chapter is:
„Thursday, December 26
For the first time since he began his monologue, the old man had managed to take Blomkvist by surprise. He had to ask him
to repeat it to be sure he had heard correctly. Nothing in the cuttings had hinted at a murder.
It was September 24, 1966. Harriet was sixteen and had just begun her second year at prep school. It was a Saturday,
and it turned into the worst day of my life. I've gone over the events so many times that I think I can account for what happened
in every minute of that day - except the most important thing.(...) ”
In this example, I treat every string of letters (excluding punctuation marks) as a separate word, but remember that you can invent your own rules. I count to the 50th word, which in this particular case happens to be the number 24. Starting from this point, I select the next 23 words—this is the highlighted text fragment above. Next let's bring into play the simplest way of transforming words into numbers.
We assign the following index number to every letter.
Thus we get:
Next let's transform every word of the chosen fragment into a number by simple addition:
harriet
8+1+18+18+9+5+20
79
sixteen
19+9+24+20+5+5+14
96
saturday
19+1+20+21+18+4+1+25
109
At this point, we could safely stop and use the results we got above as our indexes for the BIP39 standard list. But as you probably noticed, the resulting value range is quite narrow, falling between 1 and 109. This is simply because there are only 26 letters in the alphabet, and the average word length amounts to just a few letters.
The number 1966 is a lucky exception, arising only because a date happened to occur in the text fragment. Keep in mind that numbers do not usually appear in standard text, so you probably won't encounter one in your own case. If you want to expand this range, you can introduce an additional mathematical step.
To stretch the result more into the 2048 elements space (BIP39), let's add to each number a multiple of some additional random number.
Let it be 40. So we add 40 to the first index, 80 to the second (2 times 40), 120 to the third (3 times 40) and so on.
After such a process we finally get BIP39 indexes, which provide us particular BIP39 words:
For the resulting word set, we calculate the possible 24th checksum words and choose one.
We have already done it in the example on the first page and we know that one of the possibilites is "agent".
That way we have created our own secure seed.
Its objective randomness stems from our choice of book and our completely random text selection.
Even if the book is your favorite one and some people could associate it with you,
random text choice and individual transformation method guarantees high level of security of the created seed.
Please note that the resulted seed is just as hard to remember as any other one given by internal Ledger generator.
This is the best proof for its randomness and security.
But the difference is now you do not have to remember it nor write it down.
You will recover it anytime you need using just the book, pencil and a piece of paper.
And all you need to remember are three numbers:
5 -
chapter number
50 -
number of the word in the chapter that indicates the beginning of the text fragment
40 -
component added to each BIP39 index, multiplied at first by the word index
I guess you agree it is much easier to remember than any 24 random words?
And this case is still much more complicated than it is necessary.
Remember that the third component is not necessary. We set it only to project the result we got to the full range of BIP39 set.
Let's summarize:
In fact, we haven't chosen the seed above. But we have chosen the source and the method of generating it. As a result,
now we can recover it anytime we need by simply repeating the generating process.
Last but not least, please remember that the example above is not a strict rule or template. Your personal method can differ at every stage. You don't have to use a continuous fragment of text. You can start at any random point and choose every second word, every fifth word, or every one hundred twenty-seventh word. It's all up to you. You don't have to limit the text fragment just to 23 words. It can be any longer, and you can transform
a several words sets into a single BIP39 indexes. You can choose the words forward or backward and only you will be the one who knows that.
You don't have to assign subsequent numbers to subsequent letters, but you can create any other simple to remember rule,
for example 5, 10, 15, 20.... or consecutive prime numbers. The possibilities are endless, it is only a matter of your own imagination.
It should be always your own, custom and unique scheme. Without the necessity of noting.
Here is another example: let's choose a TV series instead of a book.
A TV series usually consists of tens or even hundreds of episodes, and every episode usually has a title.
Using the method above, you can easily transform a single episode title into a single BIP39 index.
Tens of thousands of TV series arose all over the world and every single year this number is rising.
Just pick at random one, get 23 episodes of your own choice and transform them into the set of 23 BIP39 words.
It is up to you if you choose something contemporary, Columbo from the 70's, or even Bonanza from the early 60's. You can easily find on the Internet
full episode title list of every TV series, no matter how old it is. You don't have to remember anything more than just an episodes
choosing manner and the transforming method.
One more example.
You don't have to use any text. As we already said in the beginning, numbers are just as good. Forget about mathematical and physical constants. They are useful, but choosing a less obvious source is even better. How about immortal and unchangeable sports statistics? The Olympic Games? FIFA? UEFA? Just pick a tournament and make use of the first 23 goals. This is just one of a thousand possibilities, of course. Goals don't have to be the first,
you can choose the last 23 goals as well. Or any other 23 goals starting from the specified match or date. Taken forward or backward. Every goal
has been shot in a specific minute and that information will never change in statistics. This easy way you get 23 numbers in the range 1-120.
You can use it directly as the BIP39 indexes, or project them onto the full range of 2048 BIP39 elements by any simple mathematical operation of your own choice.
It is worth noting that you do not need to go to the extreme with the transformation method. Possibilities of choosing the source basis are endless.
It means the possibility of guessing exactly the one being yours is close to zero. If so, then you really don't need to increase the succeeding transformation steps.
As long as your source and method are based on your own choice and idea, even simple transformation is still secure.
Still not convinced? Have doubts?
Then be honest and answer this question: would you check the goal minutes of EURO 1988 starting from the first Laudrup score if you wanted to find my seed or passwords?
Would you have even thought about it if you hadn't read the paragraph above? I bet you wouldn't have.
If you have cash or a gold bar, you can conceal it in a chest or bury it in a forest. If you own blockchain-based assets,
this method allows you to conceal and store them in any immaterial content without leaving a trace. That is quite revolutionary.
You only need to remember that the method should always base on your own idea. You should never copy anything you have read about on the Internet.
Anything written on the Internet becomes compromised forever. Be creative.
Most of Ledger Nano owners at some point start to wonder if there is a possibility to customize their seed. Possibly some of them go even further and try to use randomly chosen words in the recovery option. read...
Let's start with the issue that the manufacturer does not provide such a possibility. And it is no coincidence. Secure seed is the one that nobody can guess. If so - then it should be objectively random. read...