Binary Search Tree in Pharo

Second phase of GSoC 2019 coding period on Pharo project is about implementing tree data structures. This week I’ve been working on implementing Binary Search Tree.

Consulting with mentors we have decided to start implementing trees as composite data structure also known as rooted trees. This means that a tree consists of nodes and each node has a key field and references to child nodes. Also we were talking about having the same interface for all different trees. The idea is to just start implementing them and in time the common methods will crystallize into an interface.

Binary Tree

The journey has begun with Binary Tree. A simple BinaryTree class:

Object subclass: #BinaryTree
	instanceVariableNames: 'root'
	classVariableNames: ''
	package: 'Containers-BinaryTree'

that holds reference to its root node. Node class was represented as:

Object subclass: #BinaryNode
	instanceVariableNames: 'key leftChild rightChild'
	classVariableNames: ''
	package: 'Containers-BinaryTree'

with getters and setters for each field and a method that checks if the tree is empty (if the root is nil).

Binary tree has two basic structural characteristics:

Depth

Depth of a binary tree can be calculated as length of the longest path from a node to the root. For each node in a binary tree there is exactly one path from a node to the root.

If a tree has only one node its depth is 0 and if tree has no nodes (is empty) its depth is -1.

BinaryTree >> depth
	"Depth of a binary tree is length of longest path from an arbirary node to a root. It the three is empty the depth is -1. If a three only has a root element its depth is 1 "
	self isEmpty 
	ifTrue: [ ^-1 ].
	^self depth: self root.
depth:aNode
	"Returns depth of a tree starting from the given node"
	| leftDepth rightDepth |
	leftDepth := -1.
	aNode leftChild isNotNil 
	ifTrue: [ leftDepth := self depth: aNode leftChild ].
	rightDepth := -1.
	aNode rightChild isNotNil 
	ifTrue: [ rightDepth := self depth: aNode rightChild ].
	
	( leftDepth > rightDepth )
	ifTrue: [ ^ (1 + leftDepth) ]
	ifFalse: [^ (1 + rightDepth ) ].

Size

Size of a tree represents number of nodes in a tree. An empty tree has a size of 0.

size
	"Returns size of a tree - number of nodes in a tree"
	self root isNil 
	ifTrue: [ ^0 ].
	^self size: self root.
size:aNode
	"returns size of a tree starting from a given node"
	| leftSize rightSize |
	leftSize := 0.
	rightSize := 0.
	aNode leftChild isNotNil 
	ifTrue: [ leftSize := (self size: (aNode leftChild)) ].
	aNode rightChild isNotNil
	ifTrue: [ rightSize := (self size: (aNode rightChild ) )].
	^ 1 + leftSize + rightSize.

Searching for an element in a tree

Two of the most common search algorithms used in trees are:

BFS

bfs: info
	"Breadth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it"
	| queue |
	self isEmpty 
	ifTrue: [ ^nil ].
	queue := LinkedList new.
	queue addLast: self root.
	
	[ queue isNotEmpty ] whileTrue: [ 
		| node |
		node := queue removeFirst.
		node key == info  
		ifTrue: [ ^node ].
		node leftChild isNotNil 
		ifTrue: [ queue addLast: node leftChild ].
		node rightChild isNotNil
		ifTrue: [ queue addLast: node rightChild ].
		 ].
		^nil.

DFS

dfs: info
	"Depth fisrt search for an information. If there is no key containing given infomration returns nil. If the info is found returns a node containing it"
	self isEmpty 
	ifTrue: [ ^nil ].
	^self dfs: info node: self root.
dfs: info node: aNode
	| node |
	aNode key == info
		ifTrue: [ ^ aNode ].
		
	aNode leftChild isNotNil
		ifFalse: [ ^ nil ].
	node := self dfs: info node: aNode leftChild.
	node isNotNil
		ifTrue: [ ^ node ].
		
	aNode rightChild isNotNil
		ifFalse: [ ^ nil ].
	node := self dfs: info node: aNode rightChild.
	node isNotNil
		ifTrue: [ ^ node ].

Further discussion with mentors lead to turning Binary Tree into a Binary Search Tree.

Binary Search Tree

Binary search trees are a type of binary trees with special properties:

Binary Search Tree

Key methods to be implemented are:

Adding new element

add:aKey
	"Add a node with passed key to a binary search tree"
	self isEmpty
	ifTrue: [^(self root: (self add: aKey node: self root ))].
	 ^ self add: aKey node: self root
add: aKey node: aNode
	"Add a node with passed key to a binary search tree"

	aNode isNil
		ifTrue: [ ^ (BinaryNode new key: aKey) ].
	aKey < (aNode key)
		ifTrue: [ ^(aNode leftChild: (self add: aKey node: aNode leftChild)) ].
	aKey > (aNode key)
		ifTrue: [ ^(aNode rightChild: (self add: aKey node: aNode rightChild))].
	aKey == (aNode key)
		ifTrue: [self error: 'key already exists'].

Next few methods to be implemented:

Testing

It is important for all code to be tested and as much use cases as possible. Most of these methods were tested for 3 cases:

Binary Search Trees are specific and the fact that keys are unique should also be tested.

What I learned is that writing methods comments usually helps in determining which use cases should be tested.

Next steps

Execution time and memory should be measured to check if this implementation is optimal for Pharo.

After Binary Trees different types of balancing trees need to be implemented.

Image reference:

Image presented in this post is taken from Wikipedia. Links is provided in-text.