An introduction to common nonlinear data structures
So far, all the data structures that we have seen so far have been
linear (Array, Dynamic Array, Stack, Queue, etc). A linear
data structure stores the elements sequentially, one by one. Each
element is linked to the element after it and the element before it.
However, some of the more powerful data structures can’t get away
using a linear backbone, and instead rely on a
nonlinear implementation, where each bit of data is connected
to more than one element. Let’s take a look at some nonlinear data
structures; the three we’ll see in this article are all implemented
as a binary tree, a tree where each node has at most 2
elements. The implementation for each is slightly different, so it
is encouraged that you look up the implementations yourself to
better understand their benefits and limitations.
Say you want to be able to efficiently retrieve and remove the
largest element from an array. Obviously, this can be done by
running a sorting algorithm on a dynamic array and then sequentially
deleting the last element, which is guaranteed to be the largest.
But what if we wanted our data structure to support insertion as
well? Remember that it takes linear time to insert, so an array
would not be able to process multiple insertion queries at a time.
The solution, capable of inserting and removing elements in O(log
N), where N is the number of elements in our dataset, is known as a
Heap, or Priority Queue. Don’t be fooled though!
Despite still being a ‘queue,’ the heap is not a linear data
structure, and is represented as a binary heap. Code utilizing the
Heap is shown below. Priority Queues are capable of keeping track of
the largest element (known as a Max Heap), or the smallest element
(Min Heap), but not both at the same time.
Code:
import heapq #we must import heapq to use priority queue in python#it defaults to min heap
pq = [4,10,3] #heapq isn't actually a data type/structure, it operates on an already existing list
heapq.heapify(pq) #this sorts the heapq in non-decreasing orderprint(heapq.heappop(pq)) #heappop returns the smallest value and automatically removes it
heapq.heappush(pq,11) #this inserts an element
heapq.heappush(pq,6)
heapq.heappush(pq,6) #python heap can store duplicatesprint(heapq.heappop(pq))
print(list(pq)) #we can print out all the values like this
pq2 = [10,8,12]
heapq._heapify_max(pq2) #this creates a max heap of our listprint(heapq.heappop(pq2)) #this prints the max value now and removes it, since its a max heapprint(heapq.heappop(pq2))
Output:
> 3
> 4
> [6, 6, 11, 10]
> 12
> 8
Code:
import java.util.PriorityQueue;//we must import this library to use priority queueimport java.util.Collections;
//we have to import collections so we can create a max heappublicclassMain{
publicstaticvoidmain(String[] args){
PriorityQueue pq = new PriorityQueue<>(); //java pqueue defaults to min heap
pq.add(4); //insert an element
pq.add(10);
pq.add(3);
System.out.println(pq.peek()); //retrieves minimum, but doesn't remove
pq.add(11);
pq.add(6);
pq.add(6); //priority queue can store duplicate elements.
System.out.println(pq.poll()); //in java, you can remove and access the top element simultaneously
pq.remove(6); //you can also remove specific elements, unlike C++while(pq.size()>0) {
System.out.println(pq.poll());
}
PriorityQueue pq2 = new PriorityQueue<>(Collections.reverseOrder());//by passing this as a parameter, we can get a max heap
pq2.add(10);
pq2.add(8);
pq2.add(12);
System.out.println(pq2.poll());
System.out.println(pq2.peek());
}
}
Output:
> 3
> 3
> 4
> 6
> 10
> 11
> 12
> 10
Code:
#include<iostream>#include<queue>//priority queue is defined in the headerusingnamespace std;
intmain(){
priority_queue pq; //the default pq is a Max Heap, which keeps the largest elements on top.
pq.push(4); //insert an element
pq.push(10);
pq.push(3);
cout << pq.top() << "\n"; //retrieve maximum, but doesn't remove
pq.push(11);
pq.push(6);
pq.push(6); //priority queue can store duplicate elements
cout << pq.top() << "\n";
pq.pop(); //removes the greatest element
pq.pop();
cout << pq.top() << "\n";
while(pq.size()) {
cout << pq.top() << "\n";
pq.pop();
}
priority_queue,greater> pq2; //by passing greater as a parameter, we can turn the max heap into a min heap
pq2.push(10);
pq2.push(8);
pq2.push(12);
cout << pq2.top() << "\n"; //retrieve minimum but doesn't remove, because this is a min heap
pq2.pop(); //removes smallest element, since this is a min heap
cout << pq2.top() << "\n";
}
Output:
> 10
> 11
> 6
> 6
> 6
> 4
> 3
> 8
> 10
Another nonlinear data structure that is represented as a binary
tree is the set. You may have heard of the underlying data
structure before, a balanced binary search tree. In simple
terms, sets are data structures that store a sorted list of all the
elements without duplicates, allowing for quick insertions and
deletions. Insertions, deletions, and lookups can be done in time
logarithmic to the size of the set. In the real world, sets are
perfect for storing dynamic sorted data, or sorted data that
is constantly changing. For example, a set (binary search tree) is
used when rendering 3D data for many video games, such as Doom, a
1993 FPS. Moreover, many variations of the binary search tree see
use in the real world too, we encourage you to check out these other
tree-based data structures! Below is the code showing the
functionalities of a set. However, if you are a python user, be
careful! The sorted set is not available in python; you must import
an external library to use it. We have code utilizing the hashset
instead for python, which has on average constant insertion and
deletion, but does not sort the elements. C++ and Java also have the
hashset available, being std::unordered_set in C++ and HashSet in
Java. HashSets are not tree-based data structures.
Code:
st = set([1,4,2,9,12,15,4]) #there will only be one 4 in this since set does not support duplicate valuesst.add(15) #insert in O(1)
st.remove(2) #removal in O(1)if9in st: #check if element exists in setprint(True)
else:
print(False)
if13in st:
print(True)
else:
print(False)
for x in st: #iterate through all elements, but not in sorted order because the python set is unsortedprint(x)
Output:
> True
> False
> 1
> 4
> 9
> 12
> 15
Code:
import java.util.TreeSet; //we must import this library to use setpublicclassMain{
publicstaticvoidmain(String[] args){
TreeSet st = new TreeSet();
st.add(1); //adds an element
st.add(4);
st.add(2);
st.add(9);
st.add(12);
st.add(5);
st.add(15);
st.add(4); //java sets can't hold dupes, so this won't do anything
st.remove(2); //removes an element
System.out.println(st.contains(9)); //checks if 9 exists
System.out.println(st.contains(13));
System.out.println(st.first()); //prints smallest element
System.out.println(st.last()); //prints greatest element;for(int x:st) { //iterate through all elements in sorted order
System.out.println(x);
}
System.out.println(st.ceiling(11)); //prints smallest element greater than or equal to 11
System.out.println(st.higher(5)); //prints smallest element strictly greater than 5
}
}
#include<iostream>#include<set>//imports the set header so we can use itusingnamespace std;
intmain(){
set st = {1,4,2,9,12,5}; //this is our set
st.insert(15); //inserts an element
st.insert(4); //this will not change our set because sets do not support dupes
st.erase(2); //removes an elementif(st.find(9)!=st.end()) { //checks if element exists in set
cout << "found 9\n";
} else {
cout << "did not find 9\n";
}
if(st.find(13)!=st.end()) {
cout << "found 13\n";
} else {
cout << "did not find 13\n";
}
cout << *st.begin() << "\n"; //prints smallest element
cout << *--st.end() << "\n"; //prints largest elementfor(auto x:st) { //we can loop through all elements in sorted order like this
cout << x << "\n";
}
cout << *st.lower_bound(11) << "\n"; //this retrieves the smallest element greater than or equal to 11
cout << *st.upper_bound(5) << "\n"; //this retrieves the smallest element greater than 5
}
Output:
> found 9
> did not find 13
> 1
> 15
> 1
> 4
> 5
> 9
> 12
> 15
> 12
> 9
Let’s take a look at another tree-based data structure, the
map. In practice, a map works exactly like an array: You have
indices and values that correspond to each index. However, a map
truly shines when you wish to use non-integer indices. Let’s take a
look at an example, the DNS Protocol, which uses a variation of the
map known as the trie. The DNS, or Domain Naming System, translates
website URLs that a human can understand, like google.com, into IP
addresses, which a machine can read. In this case, the website URL
is our index, otherwise known as a key. The IP address itself
is the value. You may not have different values map to the
same key.The standard map is built using the same framework as the
set, a balanced binary search tree. The one difference is that a map
sorts its elements by the key only, and pairs each key with the
corresponding value. Take a look at how to use the map below. Once
again, the sorted map is not available without downloading external
libraries in Python. Python uses a dictionary, which is implemented
as a HashMap. A HashMap is nearly identical to a map, except that it
has O(1) insertions, deletions, and lookups. However, it does not
sort the elements by key manually. C++ and Java can also use the
HashMap, with std::unordered_map and HashMap respectively. HashMaps,
like HashSets, are not implemented with trees.
Code:
mp = {} #creates a dictionary, python's best equivalent of a sorted map. you must import an external library to use a true sorted mapmp["gmail.com"] = "142.251.33.69"
mp["google.com"] = "142.250.217.78"#creates a key value pair of "google.com" and "142.250.217.78"print(mp["gmail.com"])
mp["youtube.com"] = "216.58.206.238"for x in mp: #iterate through all keys, they will not be sorted, but rather in order of insertionprint(x+": "+mp[x])
if"yahoo.com"in mp:
print("found key")
else:
print("did not find key")
Output:
> 142.251.33.69
> gmail.com: 142.251.33.69
> google.com: 142.250.217.78
> youtube.com: 216.58.206.238
> did not find key
Code:
import java.util.TreeMap;//we must import this library to use map
public classMain {
publicstaticvoidmain(String[] args) {
TreeMapmp = newTreeMap(); //createsamapwithstringkeysandstringvaluesmp.put("gmail.com","142.251.33.69");
mp.put("google.com","142.250.217.78"); //insertkeyvaluepairof "google.com","142.250.217.78"
System.out.println(mp.get("gmail.com")); //printsthevalueofthekey "gmail.com"
mp.put("youtube.com","216.58.206.238");
for(String x:mp.keySet()) { //loopthroughallthekeysinsortedorder, xisthecurrentkeySystem.out.println(x+": "+mp.get(x));
}
System.out.println(mp.containsKey("yahoo.com")); //checkswhether "yahoo.com" isakeyinthemap
}
}
#include<iostream>#include<map>//imports map libraryusingnamespace std;
intmain(){
map mp; //map with string as key and string as value
mp["gmail.com"] = "142.251.33.69";
mp["google.com"] = "142.250.217.78"; //creates a new element in the map, with "google.com" as the key and "142.250.217.78" as the value
cout << mp["gmail.com"] << "\n"; //this will print the value associated with the key "gmail.com"
mp["youtube.com"] = "216.58.206.238";
for(auto x:mp) { //iterating through a map in sorted order(by key)
cout << x.first << ": "; //x.first is the key of our key-value pair
cout << x.second << "\n"; //x.second here is the value
}
if(mp.find("yahoo.com")!=mp.end()) { //checks if this key is in the map
cout << "found key\n";
} else {
cout << "did not find key\n";
}
}
Output:
> 142.251.33.69
> gmail.com: 142.251.33.69
> google.com: 142.250.217.78
> youtube.com: 216.58.206.238
> did not find key
You can play around with all the code we've used in this article on
Replit: