Kindle Anki Parsing Project

This was a simple weekend hack to support my German learning.

Anki is flashcard software built on the idea of spaced repetition. Instead of the standard flash cards, spaced repetition uses an algorithm to control when you study a card. Ideally the system repeats a card shortly before you would forget the information (10 min, 1 day, 10 days, etc.). More information can be found on wikipedia.

I have been reading German novels on my kindle due to the ability to translate a word when you highlight it. This allows me read quicker since my dictionary is embedded into the book.

The kindle stores a record of each word you translate as well as the context sentence from the book. This data is stored in the vocab.db database on the kindle itself. I wrote a small python script that would parse the database to extract each word and its corresponding sentence. I then use the clever linux command lineĀ leo utility written by github user seebi to translate each word. TheĀ leo tool searches dict.leo.org and parses the returned HTML for the translation. I then put this returned translation into my word list.

Using these information I build the Anki cards as a CSV file consisting of the word, the sentence, and the hidden translation. I can then import the CSV into the program and review the words each morning on the train.

The system is not perfect at the moment – it could use more refinement, but it is simple enough that I will actually use it. One of the big design considerations with any system is when to stop before you make it too complicated.

The complete script (and instructions) can be found on my github!