JavaScript one liner extracting unique words from webpages

Unlike JSON parsing with ruby one-liner ? this is indeed a very strong raw JavaScript one liner that /me managed to frame last weekend.

So what's the deal?

Extracting all the words in a given webpage, which later can be used to parse or search!

The whole idea is to collect all the words in the page and create a set() out of it.

So here is the one liner for extracting unique words form any page :

var words = document.body.textContent.split(/\s+/).sort().filter( function(v,i,o){return v!==o[i-1];});

Let's break it down!

document.body.innerText || document.body.textContent would give us text (string) present in the body of the current document.

split(/\s+/) would split the text (string object) collected above into an array of strings by separating the string into substrings.

sort() indeed sort the array that was created above.

The interesting part of the one liner!

filter( function(v,i,o){return v!==o[i-1];}); this method creates a set out of the given array, set as in remove duplicates so that searching/parsing would be faster and better, this can be eliminated.

filter() basically creates a new array with all elements that pass the test implemented by the provided function.

Lets see an example to understand function(v,i,o){return v!==o[i-1];});

> string = "hemanth is testing hemanth"
> words=string.split(/\s+/).sort()
["hemanth", "hemanth", "is", "testing"]
> words.filter( function(v,i,o){console.log(i,v,o[i-1],v!==o[i-1],o);return v!==o[i-1];});
0 "hemanth" undefined true ["hemanth", "hemanth", "is", "testing"]
1 "hemanth" "hemanth" false ["hemanth", "hemanth", "is", "testing"]
2 "is" "hemanth" true ["hemanth", "hemanth", "is", "testing"]
3 "testing" "is" true ["hemanth", "hemanth", "is", "testing"]
["hemanth", "is", "testing"]

Now that we have that array with unique elements in it, searching/parsing is easy?

Hope this helps someone, as it's helping me ;)

Happy Hacking!

EDIT 0 To ignore case 'return !i||v&&!RegExp(o[i-1],'i').test(v)'

Share this