View Single Post
Author Message
joaquimandrade
Veteran Member
Join Date: Dec 2008
Location: Portugal
Old 03-24-2009 , 08:48   The advantages of a trie over an array for strings storage
Reply With Quote #1

Those who are unaware of what a trie is and what's its purpose, when needing a data structure to store strings, use an array. Since you can also use tries for store strings there are best cases of use for each one.



What they are:

An array is a group of sequential bytes in memory subdivided in groups of the same length, being each one of the subdivided a individual place to store data, that you can access by its position in the array (usually referred as index).

A trie is a collection of data organized in the form of a hierarchy. The level zero is a start point. From there on, each level is a group of strings of length equal to the level number and different from each other. Each string can be associated to a value and to strings in the level below that share its characters. If one is associated to a value, it is stored in the trie.

From wikipedia:



In the tree above there are stored the strings:

A , i , to , in , tea , ted , ten , inn

(since they are those who have associated values).


Which one should be used where you can use both:

An array should be used when you know the index of the string that you want to posteriorly access or if you need to iterate over its values.

A trie should be used when you want to posteriorly search for a string.

To show the differences of its usage, in a case where you should use a trie, i created the following situation:

I want to know if in a list of steam Ids there is a given steam Id. I have a list of 69 steam Ids and the given steam Id is not on the list.

When comparing algorithms it must be tested what's referred to as "worst case". The worst case is the case on which you force the algorithm to test every possibility. That's why the steam Id is not on the list.

The code and the profiling results go as attachments.

A note:

While the superiority of tries over arrays for this case should be obvious i think that the results for ArrayGetString are not accurate. So or:
  • They are accurate and i am wrong
  • I made something wrong
  • There's something wrong with ArrayGetString
  • There's something wrong with the profiling.
Attached Files
File Type: sma Get Plugin or Get Source (cellArray.sma - 2063 views - 885 Bytes)
File Type: sma Get Plugin or Get Source (cellTrie.sma - 2282 views - 724 Bytes)
File Type: txt cellArrayResults.txt (2.7 KB, 878 views)
File Type: txt cellTrieResults.txt (2.4 KB, 907 views)
__________________

Last edited by joaquimandrade; 03-25-2009 at 04:32.
joaquimandrade is offline