FM-index

In computer science, an FM-index is a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini, who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for Full-text index in Minute space.

It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear[clarification needed] with respect to the size of the input data.

The original authors have devised improvements to their original approach and dubbed it “FM-Index version 2“. A further improvement, the alphabet-friendly FM-index, combines the use of compression boosting and wavelet trees to significantly reduce the space usage for large alphabets.

The FM-index has found use in, among other places, bioinformatics.

Leave a Reply

Your email address will not be published. Required fields are marked *