Categories
Computer Science

String Hashing

String hashing is a technique for converting a string into a hash number. Why do we need that? Sometimes we need to compare large strings, but instead comparing the string we can compare the generated hash.

Imagine that you had to compare the next two string:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"

Both strings have 200 characters. A brute force way would be to compare each character and see if they match. Something like:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"

function equal(A, B) {
  let i;
  for(i = 0; i < A.length; i++){
    if (A[i] !== B[i]) {
      return false;
    }
  }
  return true;
}

console.log(equal(A,B));

This isn’t optimal with big strings because of it’s O(min(A, B)) performance.

Of course we could make this O(n) by adding a condition that compares A size and B size. Something like:

function equal(A, B) {
  if (A.lenght !== B.length) return false;
  let i;
  for(i = 0; i < A.length; i++){
    if (A[i] !== B[i]) {
      return false;
    }
  }
  return true;
}

As I said, worst scenario would be O(n), but imagine that we had to compare a really large string.

String Hashing is a technique where we can convert our strings into an Integer which will act as a hash number. Therefore, we will be comparing two hashes hash(A) == hash(B) this will be considered O(1). That’s the best option we can have for comparing two strings.

String Hash Formula

Where p and m are prime numbers, and s[0], s[1], s[2]... are values for each character, in this case the char code.

p: It is a prime number roughly equal to number of different characters used. For example if we are going to calculate the hash for a word that includes only lower case characters of the English alphabet, 31 would be a good number. However, if we will include upper case characters, then 53 is an appropriate option.

m: The bigger this number is, the less chance we have to collide two random strings. this variable should be also a prime number. 10 ^9 + 9 is a common choice.

Lets use this data:

p = 31
m = 1e9 + 9
word = 'apple'

We want to know what’s the hash code for the word apple, so if we add our data to our formula, we would have something like this:

then

then we get the char code:

"a" = 97
"p" = 112
"p" = 112
"l" = 108
"e" = 101

and replace this in the formula:

then we reduce the formula:

Finally

So our final hash for the word apple would be 96604250

I’ll leave you the implementation on JavaScript

function hash(word) {
    var p = 31;
    var m = 1e9 + 9;
    var hash_value = 0;
    for(var i = 0; i < word.length; i++) {
        var letter = word[i];
        var charCode = letter.charCodeAt();
        hash_value = hash_value + (charCode * Math.pow(p, i))
    }
    return hash_value % m;
}

console.log(hash("apple"));

ping me on my twitter @jorgechavz and send me your own implementation.

Categories
Computer Science

Big O Notation and Time Complexity

Sometimes, solving a problem isn’t the main issue. Problems can have many different solutions and we as developers need to fine the best solution. But what makes a solution better against others? The answer can vary. However, when we write an algorithm we will try to find the solution that solves the problem faster (time), and the one that uses less resources (memory).

Time complexity

Time complexity represents the time or memory that an algorithm takes in order to complete the solution. The less time an algorithm takes, the happier our end users will be. So, knowing how to write more efficient algorithms is crucial for having better performance in our software.

How do we go about analyzing which solutions are most efficient?

Big O Notation

We can represent time complexity with math, here is when Big O Notation come to play. This notation is just a way of analyzing how long an algorithm with a given number of inputs (n) will take to complete its task.

Here are some simple time complexities with its Big O Notation, feel free to look in Wikipedia for more in-depth definitions:

O(1) Constant time: It doesn’t matter the n number of elements in our data. It will just take a single step for complete the task. A clear example is the access of an item into an array

const fruits = ["apple", "pineapple", "orange"];

// The time for getting the last item will be
// the same than getting the first item.

console.log(fruits[0]);
console.log(fruits[2]);

O(s) Linear runtime: Everything will depends on the value of s. The bigger s is, the more time it will take to resolve the algorithm. A for loop could be an example. The loop will take s number of iterations to get out of the loop.

for(let i; i < s; i++) {
   console.log(i);
}
Representation of a linear time and a constant time.

O(s2) Quadratic time: It is really common on algorithms that involve nested iterations. Giving n size, it will take square n to complete the task.

for(let i; i < s; i++) {
  for(let j; j < s; j++) {
      console.log(`${i}, ${j}`);
  }
}

O(log n) Logarithmic time: Given an input of size n, the number of steps it takes to accomplish the task are decreased by some factor with each step. For example, a binary search could be logarithmic since with every step we reduce the possible solutions in half, so the more iterations the closest we are for completing the task.

Conclusion

Time complexity and Big O Notation will help us to understand an a more visible way if an algorithm is faster or not. Different data structures have different running time. So, understanding this topics will allow us to use data structures that can fit well for a better performance.

We won’t cover all time complexities. However, I hope this short post gives you an idea of what time complexity is and how to represent those running times with math (Big O Notation).