Categories
Computer Science

String Hashing

String hashing is a technique for converting a string into a hash number. Why do we need that? Sometimes we need to compare large strings, but instead comparing the string we can compare the generated hash.

Imagine that you had to compare the next two string:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"

Both strings have 200 characters. A brute force way would be to compare each character and see if they match. Something like:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"

function equal(A, B) {
  let i;
  for(i = 0; i < A.length; i++){
    if (A[i] !== B[i]) {
      return false;
    }
  }
  return true;
}

console.log(equal(A,B));

This isn’t optimal with big strings because of it’s O(min(A, B)) performance.

Of course we could make this O(n) by adding a condition that compares A size and B size. Something like:

function equal(A, B) {
  if (A.lenght !== B.length) return false;
  let i;
  for(i = 0; i < A.length; i++){
    if (A[i] !== B[i]) {
      return false;
    }
  }
  return true;
}

As I said, worst scenario would be O(n), but imagine that we had to compare a really large string.

String Hashing is a technique where we can convert our strings into an Integer which will act as a hash number. Therefore, we will be comparing two hashes hash(A) == hash(B) this will be considered O(1). That’s the best option we can have for comparing two strings.

String Hash Formula

Where p and m are prime numbers, and s[0], s[1], s[2]... are values for each character, in this case the char code.

p: It is a prime number roughly equal to number of different characters used. For example if we are going to calculate the hash for a word that includes only lower case characters of the English alphabet, 31 would be a good number. However, if we will include upper case characters, then 53 is an appropriate option.

m: The bigger this number is, the less chance we have to collide two random strings. this variable should be also a prime number. 10 ^9 + 9 is a common choice.

Lets use this data:

p = 31
m = 1e9 + 9
word = 'apple'

We want to know what’s the hash code for the word apple, so if we add our data to our formula, we would have something like this:

then

then we get the char code:

"a" = 97
"p" = 112
"p" = 112
"l" = 108
"e" = 101

and replace this in the formula:

then we reduce the formula:

Finally

So our final hash for the word apple would be 96604250

I’ll leave you the implementation on JavaScript

function hash(word) {
    var p = 31;
    var m = 1e9 + 9;
    var hash_value = 0;
    for(var i = 0; i < word.length; i++) {
        var letter = word[i];
        var charCode = letter.charCodeAt();
        hash_value = hash_value + (charCode * Math.pow(p, i))
    }
    return hash_value % m;
}

console.log(hash("apple"));

ping me on my twitter @jorgechavz and send me your own implementation.

Categories
Computer Science

Everything you need to know about Binary Search algorithm

This is probably one of the most useful search algorithms you will use when solving computer science problems. I’ll try to explain this search method the best I can.

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array.

Wikipedia

When searching in an array with Binary Search we need to make sure that the array is sorted, you will see why in a moment.

Linear search

Let’s create a really simple scenario, let’s pretend I give you this array:

Then I ask you to find if this array has the number 90. A linear search approach would loop through the array and check if the current item is indeed our target value, in this case the number 90. JavaScript code would look alike this:

const items = [2, 9, 13, 53, 89, 90, 98];

const findElement = search => {
   for(let i = 0; i < items.length; i++) {
      if (items[i] == search) {
         return true;
      }
   }
   return false;
}

console.log(findElement(90));

Worst case scenario would be O(N) because if the element you want to find is the last one, you will loop on the entire array.

This isn’t optimal for a sorted array, binary search time complexity is O(Log N) which is normally a better option when we threat with large data sets.

Binary search approach

For this algorithm we will need 3 pointers:

  • L: Left pointer
  • R: Right pointer
  • M: Middle pointer

Graphically this would look like this:

The algorithm goes like this:

  • Start L pointing to the first element of the array.
  • Start R pointing to the last item.
  • Start a while loop until L <= R.
  • Everything inside this loop will be repeated each time:
    • Start M pointing to the greatest middle value (floor) between L and R.
    • Check if the element in M position is the one you a looking for. If so, return the value.
    • At this point we need to discard one half of the array, basically chop it from the M position. We can decide which side by either converting L = M + 1 or R = M - 1. This will depends on if the element is being pointed by M is greater or lower than our target value.
  • If L <= R the loop will end and it will mean that our target value is not in the array.

Code:

const items = [2, 9, 13, 53, 89, 90, 98];

const searchBinary = (elements, search) => {
  let L = 0;
  const n = elements.length;
  let R = n - 1;

  while(L <= R) {
    let M = Math.floor((L + R) / 2);
    if (elements[M] === search) return M;
    if (elements[M] < search) {
       L = M + 1;
    } else {
      R = M - 1;
    }
  }
  return -1;
}

console.log(searchBinary(items, 89));

Here is a little animation of how this algorithm would work:

Binary Search Animation

Special cases

What if there are duplicated values in our array? The previous approach will still works. However, it will only respond to the question “Is there any 90 in the array?” but what if you need the leftmost value, or the rightmost value. See the next example so you can visualize the issue with our first approach:

Leftmost

See how in the next animation, M starts in target value, but with this approach we will need to wait until L and R crosses, so it doesn’t matter if M is already in one of the target values..

Let’s tweak a little bit our current code so we can find the leftmost element.

const items = [2, 9, 13, 53, 89, 90, 90, 98, 100, 102];

const binarySearchLeftMost = (elements, search) => {
  let L = 0;
  const n = elements.length;
  let R = n;

  while (L < R) {
    M = Math.floor((L + R) / 2);
    if (elements[M] < search) {
      L = M + 1;
    } else {
      R = M;
    }
  }
  return L;
}

console.log(binarySearchLeftMost(items, 90));

Rightmost

Very similar to leftmost, we need need to chop all the right duplicated values right away, then start moving L closer and closer to R.

const items = [2, 9, 13, 53, 89, 90, 90, 98, 100, 102];

const binarySearchRightMost = (elements, search) => {
  let L = 0;
  const n = elements.length;
  let R = n;
  while (L < R) {
    M = Math.floor((L + R) / 2);
    if (elements[M] > search) {
      R = M;
    } else {
      L = M + 1;
    }
  }
  return L;
}

console.log(binarySearchRightMost(items, 90));

Conclusion

Binary search is a great algorithm for searching through a sorted array, we can either verify if an element is within the array, find the first occurrence or the last occurrence.

I hope you enjoyed and learned from this post, if you have any question let ping me on my twitter @jorgechavz