Topic: How can I filter a dataframe based on comparison against every item in another array?

I have the basic names dataframe.

I want to filter it based on whether or not the 'names' column in each row starts with any of the items in another array.

How would I go about this? I'm trying something like this but it's sending the whole series of names into my is_in_prefixes function which seems to defeat the purpose of the lambda function, maybe I don't get what those are supposed to be used for

target_names = names.apply(lambda x: is_in_prefixes(x.name))

In [1]:

from pandas import DataFrame

In [2]:

# make a test DataFrame

composers = DataFrame([{'name': 'Bach', 'date of birth':1685 },
                       {'name': 'Hildegard of Bingen', 'date of birth':1098},
                       {'name': 'Mozart', 'date of birth':1756},
                       {'name': 'Beethoven', 'date of birth':1770},
                       {'name': 'Shaw', 'date of birth': 1982}],
                      columns=['name', 'date of birth'])

# make a list with desired prefixes
desired_prefixes = ['Ba', 'S', "Mo"]

composers.name

Out[2]:

0                   Bach
1    Hildegard of Bingen
2                 Mozart
3              Beethoven
4                   Shaw
Name: name, dtype: object

In [3]:

# one solution with regular expressions

import re

# use of alternation: http://docs.python.org/2/howto/regex.html
# make a regexp out of list of desired prefixes

desire_prefix_regexp = "^"+ "|".join(desired_prefixes)
print desire_prefix_regexp

composers[composers.name.apply(lambda s: re.search(desire_prefix_regexp, s) is not None)]

^Ba|S|Mo

Out[3]:

	name	date of birth
0	Bach	1685
2	Mozart	1756
4	Shaw	1982

3 rows × 2 columns

In [3]: