You are here

Comparing Static and Dynamic Suffix Arrays

In this course project, I was curious about an interesting paper by Salson et al. on dynamic suffix arrays that had been published fairly recently.  They showed that in many circumstances, using a dynamic suffix array is better than constantly rebuilding a static one.  I wanted to know at what point rebuilding stops making sense since the dynamic structures required much work up front for setup.

Special note: The authors were kind enough to share their code with me for use in my project - thanks so much to all of you!

Conclusions

The authors of the dynamic suffix array showed that inserting 500 characters into a large body of text is more efficient with their structure than rebuilding a standard suffix array from scratch. They did not, however, include deletions in their tests, nor did they consider the time taken to build the dynamic suffix array before the insertions.

The experiments presented here used a sliding window paradigm to fill both these gaps. It was determined that if the building time is included (in this case, averaged over the number of window slides), the use of dynamic suffix arrays tends to be slower than rebuilding standard suffix arrays from scratch. However, dynamic suffix arrays are superior to the static version when inserting and deleting less than 1000 characters at a time. Based on this, dynamic suffix arrays are well suited to applications that make only small changes to the associated body of text, while static suffix arrays are a better choice when large changes are required often.

Important Note: It was pointed out that a sliding window is probably not the best choice for benchmarking the data structure as adding suffixes is theoretically speaking the worst-case situation, as mentioned by the authors in this article.

Downloads

A summary of the experimental setup and results can be downloaded as a PDF.