Topic: How can I filter a dataframe based on comparison against every item in another array?
I have the basic names dataframe.
I want to filter it based on whether or not the 'names' column in each row starts with any of the items in another array.
How would I go about this? I'm trying something like this but it's sending the whole series of names into my is_in_prefixes function which seems to defeat the purpose of the lambda function, maybe I don't get what those are supposed to be used for
target_names = names.apply(lambda x: is_in_prefixes(x.name))
from pandas import DataFrame
# make a test DataFrame
composers = DataFrame([{'name': 'Bach', 'date of birth':1685 },
{'name': 'Hildegard of Bingen', 'date of birth':1098},
{'name': 'Mozart', 'date of birth':1756},
{'name': 'Beethoven', 'date of birth':1770},
{'name': 'Shaw', 'date of birth': 1982}],
columns=['name', 'date of birth'])
# make a list with desired prefixes
desired_prefixes = ['Ba', 'S', "Mo"]
composers.name
0 Bach 1 Hildegard of Bingen 2 Mozart 3 Beethoven 4 Shaw Name: name, dtype: object
# one solution with regular expressions
import re
# use of alternation: http://docs.python.org/2/howto/regex.html
# make a regexp out of list of desired prefixes
desire_prefix_regexp = "^"+ "|".join(desired_prefixes)
print desire_prefix_regexp
composers[composers.name.apply(lambda s: re.search(desire_prefix_regexp, s) is not None)]
^Ba|S|Mo
name | date of birth | |
---|---|---|
0 | Bach | 1685 |
2 | Mozart | 1756 |
4 | Shaw | 1982 |
3 rows × 2 columns