Notebook
Type: function String form: <function read_csv at 0x1087a0938> File: /Library/Python/2.7/site-packages/pandas-0.12.0_307_g3a2fe0b-py2.7-macosx-10.8-intel.egg/pandas/io/parsers.py Definition: pd.read_csv(filepath_or_buffer, sep=',', dialect=None, compression=None, doublequote=True, escapechar=None, quotechar='"', quoting=0, skipinitialspace=False, lineterminator=None, header='infer', index_col=None, names=None, prefix=None, skiprows=None, skipfooter=None, skip_footer=0, na_values=None, na_fvalues=None, true_values=None, false_values=None, delimiter=None, converters=None, dtype=None, usecols=None, engine='c', delim_whitespace=False, as_recarray=False, na_filter=True, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, warn_bad_lines=True, error_bad_lines=True, keep_default_na=True, thousands=None, comment=None, decimal='.', parse_dates=False, keep_date_col=False, dayfirst=False, date_parser=None, memory_map=False, nrows=None, iterator=False, chunksize=None, verbose=False, encoding=None, squeeze=False, mangle_dupe_cols=True, tupleize_cols=True) Docstring: Read CSV (comma-separated) file into DataFrame Also supports optionally iterating or breaking of the file into chunks. Parameters ---------- filepath_or_buffer : string or file handle / StringIO. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv sep : string, default ',' Delimiter to use. If sep is None, will try to automatically determine this. Regular expressions are accepted. lineterminator : string (length 1), default None Character to break file into lines. Only valid with C parser quotechar : string The character to used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored. quoting : int Controls whether quotes should be recognized. Values are taken from `csv.QUOTE_*` values. Acceptable values are 0, 1, 2, and 3 for QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONE, and QUOTE_NONNUMERIC, respectively. skipinitialspace : boolean, default False Skip spaces after delimiter escapechar : string dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32} compression : {'gzip', 'bz2', None}, default None For on-the-fly decompression of on-disk data dialect : string or csv.Dialect instance, default None If None defaults to Excel dialect. Ignored if sep longer than 1 char See csv.Dialect documentation for more details header : int, default 0 if names parameter not specified, Row to use for the column labels of the parsed DataFrame. Specify None if there is no header row. Can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3]. Interveaning rows that are not specified (E.g. 2 in this example are skipped) skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file index_col : int or sequence or False, default None Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to _not_ use the first column as the index (row names) names : array-like List of column names to use. If file contains no header row, then you should explicitly pass header=None prefix : string or None (default) Prefix to add to column numbers when no header, e.g 'X' for X0, X1, ... na_values : list-like or dict, default None Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values true_values : list Values to consider as True false_values : list Values to consider as False keep_default_na : bool, default True If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they're appended to parse_dates : boolean, list of ints or names, list of lists, or dict If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' keep_date_col : boolean, default False If True and parse_dates specifies combining multiple columns then keep the original columns. date_parser : function Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. dayfirst : boolean, default False DD/MM format dates, international and European format thousands : str, default None Thousands separator comment : str, default None Indicates remainder of line should not be parsed Does not support line commenting (will return empty line) decimal : str, default '.' Character to recognize as decimal point. E.g. use ',' for European data nrows : int, default None Number of rows of file to read. Useful for reading pieces of large files iterator : boolean, default False Return TextFileReader object chunksize : int, default None Return TextFileReader object for iteration skipfooter : int, default 0 Number of line at bottom of file to skip converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels verbose : boolean, default False Indicate number of NA values placed in non-numeric columns delimiter : string, default None Alternative argument name for sep. Regular expressions are accepted. encoding : string, default None Encoding to use for UTF when reading/writing (ex. 'utf-8') squeeze : boolean, default False If the parsed data only contains one column then return a Series na_filter: boolean, default True Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file usecols : array-like Return a subset of the columns. Results in much faster parsing time and lower memory usage. mangle_dupe_cols: boolean, default True Duplicate columns will be specified as 'X.0'...'X.N', rather than 'X'...'X' tupleize_cols: boolean, default False Leave a list of tuples on columns as is (default is to convert to a Multi Index on the columns) Returns ------- result : DataFrame or TextParser