Sponsored by Leangap
Intro to Data Structures
An introduction to common nonlinear data structures
So far, all the data structures that we have seen so far have been linear (Array, Dynamic Array, Stack, Queue, etc). A linear data structure stores the elements sequentially, one by one. Each element is linked to the element after it and the element before it. However, some of the more powerful data structures can’t get away using a linear backbone, and instead rely on a nonlinear implementation, where each bit of data is connected to more than one element. Let’s take a look at some nonlinear data structures; the three we’ll see in this article are all implemented as a binary tree, a tree where each node has at most 2 elements. The implementation for each is slightly different, so it is encouraged that you look up the implementations yourself to better understand their benefits and limitations.
Say you want to be able to efficiently retrieve and remove the largest element from an array. Obviously, this can be done by running a sorting algorithm on a dynamic array and then sequentially deleting the last element, which is guaranteed to be the largest. But what if we wanted our data structure to support insertion as well? Remember that it takes linear time to insert, so an array would not be able to process multiple insertion queries at a time. The solution, capable of inserting and removing elements in O(log N), where N is the number of elements in our dataset, is known as a Heap, or Priority Queue. Don’t be fooled though! Despite still being a ‘queue,’ the heap is not a linear data structure, and is represented as a binary heap. Code utilizing the Heap is shown below. Priority Queues are capable of keeping track of the largest element (known as a Max Heap), or the smallest element (Min Heap), but not both at the same time.
Code:
import heapq #we must import heapq to use priority queue in python
#it defaults to min heap
pq = [4,10,3] #heapq isn't actually a data type/structure, it operates on an already existing list
heapq.heapify(pq) #this sorts the heapq in non-decreasing order
print(heapq.heappop(pq)) #heappop returns the smallest value and automatically removes it
heapq.heappush(pq,11) #this inserts an element
heapq.heappush(pq,6)
heapq.heappush(pq,6) #python heap can store duplicates
print(heapq.heappop(pq))
print(list(pq)) #we can print out all the values like this
pq2 = [10,8,12]
heapq._heapify_max(pq2) #this creates a max heap of our list
print(heapq.heappop(pq2)) #this prints the max value now and removes it, since its a max heap
print(heapq.heappop(pq2))

Output:
> 3
> 4
> [6, 6, 11, 10]
> 12
> 8
Code:
import java.util.PriorityQueue;//we must import this library to use priority queue
import java.util.Collections;
//we have to import collections so we can create a max heap

public class Main {
  public static void main(String[] args) {
    PriorityQueue pq = new PriorityQueue<>(); //java pqueue defaults to min heap
    pq.add(4); //insert an element
    pq.add(10);
    pq.add(3);
    System.out.println(pq.peek()); //retrieves minimum, but doesn't remove
    pq.add(11);
    pq.add(6);
    pq.add(6); //priority queue can store duplicate elements.
    System.out.println(pq.poll()); //in java, you can remove and access the top element simultaneously
    pq.remove(6); //you can also remove specific elements, unlike C++
    while(pq.size()>0) {
      System.out.println(pq.poll());
    }
    PriorityQueue pq2 = new PriorityQueue<>(Collections.reverseOrder());//by passing this as a parameter, we can get a max heap
    pq2.add(10);
    pq2.add(8);
    pq2.add(12);
    System.out.println(pq2.poll());
    System.out.println(pq2.peek());
  }
}
Output:
> 3
> 3
> 4
> 6
> 10
> 11
> 12
> 10
Code:
#include <iostream>#include <queue> //priority queue is defined in the  header

using namespace std;

int main() {
  priority_queue pq; //the default pq is a Max Heap, which keeps the largest elements on top.
  pq.push(4); //insert an element
  pq.push(10);
  pq.push(3);
  cout << pq.top() << "\n"; //retrieve maximum, but doesn't remove
  pq.push(11);
  pq.push(6);
  pq.push(6); //priority queue can store duplicate elements
  cout << pq.top() << "\n";
  pq.pop(); //removes the greatest element
  pq.pop(); 
  cout << pq.top() << "\n";
  while(pq.size()) {
	  cout << pq.top() << "\n";
	  pq.pop();
  }
  priority_queue,greater> pq2; //by passing greater as a parameter, we can turn the max heap into a min heap
  pq2.push(10);
  pq2.push(8);
  pq2.push(12);
  cout << pq2.top() << "\n"; //retrieve minimum but doesn't remove, because this is a min heap
  pq2.pop(); //removes smallest element, since this is a min heap
  cout << pq2.top() << "\n";
}
Output:
> 10
> 11
> 6
> 6
> 6
> 4
> 3
> 8
> 10
Another nonlinear data structure that is represented as a binary tree is the set. You may have heard of the underlying data structure before, a balanced binary search tree. In simple terms, sets are data structures that store a sorted list of all the elements without duplicates, allowing for quick insertions and deletions. Insertions, deletions, and lookups can be done in time logarithmic to the size of the set. In the real world, sets are perfect for storing dynamic sorted data, or sorted data that is constantly changing. For example, a set (binary search tree) is used when rendering 3D data for many video games, such as Doom, a 1993 FPS. Moreover, many variations of the binary search tree see use in the real world too, we encourage you to check out these other tree-based data structures! Below is the code showing the functionalities of a set. However, if you are a python user, be careful! The sorted set is not available in python; you must import an external library to use it. We have code utilizing the hashset instead for python, which has on average constant insertion and deletion, but does not sort the elements. C++ and Java also have the hashset available, being std::unordered_set in C++ and HashSet in Java. HashSets are not tree-based data structures.
Code:
st = set([1,4,2,9,12,15,4]) #there will only be one 4 in this since set does not support duplicate valuesst.add(15) #insert in O(1)
st.remove(2) #removal in O(1)
if 9 in st: #check if element exists in set
  print(True)
else:
  print(False)
if 13 in st:
  print(True)
else:
  print(False)
for x in st: #iterate through all elements, but not in sorted order because the python set is unsorted
  print(x)
Output:
> True
> False
> 1
> 4
> 9
> 12
> 15
Code:
import java.util.TreeSet; //we must import this library to use set

public class Main {
  public static void main(String[] args) {
    TreeSet st = new TreeSet();
    st.add(1); //adds an element
    st.add(4);
    st.add(2);
    st.add(9);
    st.add(12);
    st.add(5);
    st.add(15);
    st.add(4); //java sets can't hold dupes, so this won't do anything
    st.remove(2); //removes an element
    System.out.println(st.contains(9)); //checks if 9 exists
    System.out.println(st.contains(13));
    System.out.println(st.first()); //prints smallest element
    System.out.println(st.last()); //prints greatest element;
    for(int x:st) { //iterate through all elements in sorted order
      System.out.println(x); 
    }
    System.out.println(st.ceiling(11)); //prints smallest element greater than or equal to 11
    System.out.println(st.higher(5)); //prints smallest element strictly greater than 5
  }
}
Output:
> true
> false
> 1
> 15
> 1
> 4
> 5
> 9
> 12
> 15
> 12
> 9
Code:
#include <iostream>#include <set> //imports the set header so we can use it

using namespace std;

int main() {
  set st = {1,4,2,9,12,5}; //this is our set
  st.insert(15); //inserts an element
  st.insert(4); //this will not change our set because sets do not support dupes
  st.erase(2); //removes an element
  if(st.find(9)!=st.end()) { //checks if element exists in set
    cout << "found 9\n";
  } else {
    cout << "did not find 9\n";
  }
  if(st.find(13)!=st.end()) {
    cout << "found 13\n";
  } else {
    cout << "did not find 13\n";
  }
  cout << *st.begin() << "\n"; //prints smallest element
  cout << *--st.end() << "\n"; //prints largest element
  for(auto x:st) { //we can loop through all elements in sorted order like this
    cout << x << "\n";
  }
  cout << *st.lower_bound(11) << "\n"; //this retrieves the smallest element greater than or equal to 11
  cout << *st.upper_bound(5) << "\n"; //this retrieves the smallest element greater than 5
}
Output:
> found 9
> did not find 13
> 1
> 15
> 1
> 4
> 5
> 9
> 12
> 15
> 12
> 9
Let’s take a look at another tree-based data structure, the map. In practice, a map works exactly like an array: You have indices and values that correspond to each index. However, a map truly shines when you wish to use non-integer indices. Let’s take a look at an example, the DNS Protocol, which uses a variation of the map known as the trie. The DNS, or Domain Naming System, translates website URLs that a human can understand, like google.com, into IP addresses, which a machine can read. In this case, the website URL is our index, otherwise known as a key. The IP address itself is the value. You may not have different values map to the same key.The standard map is built using the same framework as the set, a balanced binary search tree. The one difference is that a map sorts its elements by the key only, and pairs each key with the corresponding value. Take a look at how to use the map below. Once again, the sorted map is not available without downloading external libraries in Python. Python uses a dictionary, which is implemented as a HashMap. A HashMap is nearly identical to a map, except that it has O(1) insertions, deletions, and lookups. However, it does not sort the elements by key manually. C++ and Java can also use the HashMap, with std::unordered_map and HashMap respectively. HashMaps, like HashSets, are not implemented with trees.
Code:
mp = {} #creates a dictionary, python's best equivalent of a sorted map. you must import an external library to use a true sorted mapmp["gmail.com"] = "142.251.33.69" 
mp["google.com"] = "142.250.217.78" #creates a key value pair of "google.com" and "142.250.217.78"
print(mp["gmail.com"])
mp["youtube.com"] = "216.58.206.238"
for x in mp: #iterate through all keys, they will not be sorted, but rather in order of insertion
  print(x+": "+mp[x])
if "yahoo.com" in mp:
  print("found key")
else:
  print("did not find key")
Output:
> 142.251.33.69
> gmail.com: 142.251.33.69
> google.com: 142.250.217.78
> youtube.com: 216.58.206.238
> did not find key
Code:
import java.util.TreeMap;//we must import this library to use map

public class Main {
  public static void main(String[] args) {
    TreeMap mp = new TreeMap(); //creates a map with string keys and string values
    mp.put("gmail.com","142.251.33.69");
    mp.put("google.com","142.250.217.78"); //insert key value pair of "google.com","142.250.217.78"
    System.out.println(mp.get("gmail.com")); //prints the value of the key "gmail.com"
    mp.put("youtube.com","216.58.206.238");
    for(String x:mp.keySet()) { //loop through all the keys in sorted order, x is the current key
      System.out.println(x+": "+mp.get(x));
    }
    System.out.println(mp.containsKey("yahoo.com")); //checks whether "yahoo.com" is a key in the map
  }
}
Output:
> 142.251.33.69
> gmail.com: 142.251.33.69
> google.com: 142.250.217.78
> youtube.com: 216.58.206.238
> false
Code:
#include <iostream>#include <map> //imports map library

using namespace std;

int main() {
  map mp; //map with string as key and string as value
  mp["gmail.com"] = "142.251.33.69";
  mp["google.com"] = "142.250.217.78"; //creates a new element in the map, with "google.com" as the key and "142.250.217.78" as the value
  cout << mp["gmail.com"] << "\n"; //this will print the value associated with the key "gmail.com"
  mp["youtube.com"] = "216.58.206.238";
  for(auto x:mp) { //iterating through a map in sorted order(by key)
    cout << x.first << ": "; //x.first is the key of our key-value pair
    cout << x.second << "\n"; //x.second here is the value
  }
  if(mp.find("yahoo.com")!=mp.end()) { //checks if this key is in the map
    cout << "found key\n";
  } else {
    cout << "did not find key\n";
  }
}
Output:
> 142.251.33.69
> gmail.com: 142.251.33.69
> google.com: 142.250.217.78
> youtube.com: 216.58.206.238
> did not find key
You can play around with all the code we've used in this article on Replit:
Copyright ©2023 Howard County Hour of Code