Trie, also known as Digital Tree or Prefix Tree, is a kind of tree data structure used to store a dynamic set where the keys are usually strings. Tries are an extremely useful and special tree-based data structures where in the nodes are characters or alphabets and strings or words can be reTRIEved by traversing on down a branch in the data structure.

In this post, we will look into this wonderful data structure Trie.


Note : In all the pseudo codes, 0-based indexing is used and the indentations are used to differentiate between block of codes.


Trie was first introduced by a Frenchman named René de la Briandais in 1959. According to Donald Knuth in his book The Art of Computer Programming,

Trie memory for computer searching was first recommended by René de la Briandais. He pointed out that we can save memory space at the expense of running time if we use a linked list for each node vector, since most of the entries in the vectors tend to be empty.

Structure

As trie is a tree-based data structure, tries consist of basic building blocks of nodes. Each trie consists of a ROOT node which contains links or references to other nodes - one to every possible alphabetic value.

For example, let us look at a trie containing the strings "aayush", "algorithm", "data", "seedbx", "structure".

Trie containing strings "aayush", "algorithm", "data", "seedbx", "structure"
Trie containing strings "aayush", "algorithm", "data", "seedbx", "structure"

Each node contains a pointer to a number of nodes depending on the number of valid alphabetic values in the chosen language. Additionally the node contains a flag variable which is used to keep track of whether the given node marks the end of a word or not. However nodes can be modified to store additional details as per the requirements of the application in hand.

TrieNode{
    TrieNode* children[ALPHABET_SIZE]
    bool isLast
}
TrieNode for ALPHABET_SIZE=3
TrieNode for ALPHABET_SIZE=3

Note : The size and shape of the trie is directly dependent on the numbers of alphabets or characters i.e. ALPHABET_SIZE and the keys inserted in the trie.


Create Node

We create a new node and declare the default value of the isLast variable to be false and children to be an array of size ALPHABET_SIZE of NULL pointers. The function returns a pointer to the created node.

Pseudo Code :

createNewNode()
    TrieNode* toBeReturned=new TrieNode()
    toBeReturned->isEnd=false
    for i=0 up to ALPHABET_SIZE
       toBeReturned->children[i]=NULL
    return toBeReturned

Time Complexity : O(ALPHABET_SIZE)

Insert

We insert a key in the trie starting at the ROOT node. We use a helper function getIndex to get the location or index of the corresponding alphabetic value. If the corresponding node is non-existent or NULL, we create a new node and assign the pointer to the respective node. We continue doing this for every character in the key and lastly assign the isLast value of the leaf node to be true.

Pseudo Code :

insert(root, key)
   TrieNode* temp=root
   for i=0 up to key.length
       idx=getIndex(key[i])
       if temp->children[idx]==NULL
           temp->children[idx]=createNewNode()
   temp->isLast=true

Time Complexity : O( |key| )

Inserting "add" to Trie
Inserting "add" to Trie

Note : The definition of the helper function getIndex will vary according to the chosen language or alphabetic set. For lowercase English alphabets this maybe,

getIndex(c)
    return c-'a'

We search for a key in a depth-first traversal fashion starting at the root node. The key is not present under the following conditions :

  • An interior node is equal to NULL i.e. a node corresponding to a alphabetic value in the key is not present.
  • The value of the isLast flag for the leaf node is equal to false i.e. no key is ending at the specified leaf node.

Pseudo Code :

search(root, key)
   TrieNode* temp=root
   for i=0 up to key.length
       idx=getIndex(key[i])
       if temp->children[idx]==NULL
           return false
   if temp->isLast==false
       return false
   return true

Time Complexity : O( |key| )

Searching "data" in Trie
Searching "data" in Trie

Delete

We delete a key in the trie in a bottom up fashion using recurrence. However there are a few possible situations which may arise when deleting a key. Here are all possible situations :

  • Key is not present in the trie. In this case, the trie should be left undisturbed.
  • Key is unique i.e. does not contain any prefix key or suffix key. In this case, the branch containing the key should be removed.
  • Key is prefix of another longer key. In this case, the leaf node for this key should be unmarked i.e. set isLast to false.
  • Key has a prefix key. In this case, the nodes starting from the leaf node of the longest prefix key to the leaf node of the key should be removed.

Pseudo Code :

isEmpty(node)
   for i=0 up to ALPHABET_SIZE
       if node->children[i]!=NULL
           return false
   return true

delete(node, key, depth)
   if node==NULL
       return
   if depth==key.length-1
       node->isLast=false
       if isEmpty(node)
           delete node
   idx=getIndex(key[depth])
   delete(node->children[idx], key, depth+1)
   if isEmpty(node) and node->isLast==false
       delete node

Time Complexity : O( |key| * ALPHABET_SIZE )

Deleting "parcel" from Trie
Deleting "parcel" from Trie

Applications

  • Auto-Complete Feature deployed heavily in famous text-editors, applications and search engines can be implemented using tries.
  • Tries can be used to implement Spell Checkers.
  • Tries are also used in IP-Routing algorithms.
  • Tries are used to in Digital Dictionaries.

Trie is a simple data structure however the main disadvantage of using a trie is that they need a lot of memory and pointers to store the keys. However irrespective of the disadvantages, trie is useful in many situations.