Notebook
Okay, doing a query in a dataset with more than 42 Mrows in less than 4 seconds is impressing. How much would it take using pure Python 'in' operator?
As can be seen, the new 'contains' can be more than 10 times faster than a regular 'in' operator.Of course, more complex queries can be done. For example, let's see how many mentions to Einstein before year 1920 there are in our subset of the Google books:
Ok, a few mentions to Einstein in Google books, including 5 books as soon as 1905. Supposedly some of these refers to his seminals articles from 1905. And using 4 seconds (at ~10 Mrow/s) for doing the new, more complex query is pretty good too.