Module Csv
Read and write the CSV (comma separated values) format.
This library should be compatible with RFC4180 if one sets strip=false in the creation functions.
- version
- 2.4
- author
- Richard Jones <rjones@redhat.com>
- author
- Christophe Troestler <Christophe.Troestler@umons.ac.be>
type t= string list listRepresentation of CSV data in memory. This is a list of rows (also called records), each row being a list of columns.
Input/output objects
class type in_obj_channel = object ... endThe most basic input object for best interoperability.
class type out_obj_channel = object ... endThe most basic output object for best interoperability.
Input
exceptionFailure of int * int * stringFailure(nrecord, nfield, msg)is raised to indicate a parsing error for the field numbernfieldon the record numbernrecord, the descriptionmsgsays what is wrong. The first record and the first field of a record are numbered1(to correspond to the usual spreadsheet numbering but differing fromList.nthof the OCaml representation).
val of_in_obj : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> in_obj_channel -> in_channelof_in_obj ?separator ?excel_tricks in_chancreates a new "channel" to access the data in CSV form available from the channelin_chan. Note that data is read on a as-needed basis by the functionsnext,fold_left,... so the channel must stay open while these functions are executed. When you are done with the channel, callclose_into close it.- parameter separator
What character the separator is. The default is
','. You should be aware however that, in the countries where comma is used as a decimal separator, Excel will use';'as the separator.
- parameter strip
Whether to remove the white space around unquoted fields. The default is
truefor backward compatibility reasons.
- parameter has_header
tells that the first row of the CSV channel is to be interpreted as a header (this row will not be returned by
next). This is useful to use the functions in theRowsmodule below. Default:false.
- parameter header
Supply the header to use for this CSV channel. If both
headerandhas_headerare given, the names ofheadertake precedence; if a name inheaderis"", the one in the CSV header is used. If a name appears twice, only its first occurrence is used.
- parameter backslash_escape
Whether to allow \", \n,... in quoted fields. This is used by MySQL for example but is not standard CSV so it is set to
falseby default.
- parameter excel_tricks
enables Excel tricks, namely the fact that '"' followed by '0' in a quoted string means ASCII NULL and the fact that a field of the form ="..." only returns the string inside the quotes. Default:
true.
- parameter fix
Parses the CSV data without raising the exception
Csv.Failure. If the data does not conform to the CSV format (e.g. because of badly escaped quotes), try to repair it. Default:false.
val of_channel : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> std_in_channel -> in_channelSame as
Csv.of_in_objexcept that the data is read from a standard channel. Note that, because the data is read on a as-needed basis, thestd_in_channelmust stay open while you are reading data.
val of_string : ?separator:char -> ?strip:bool -> ?has_header:bool -> ?header:string list -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> string -> in_channelSame as
Csv.of_in_objexcept that the data is read from a string.
val load : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> string -> tload fnameloads the CSV filefname. Iffilenameis"-"then load fromstdin.- parameter separator
What character the separator is. The default is
','. You should be aware however that, in the countries where comma is used as a decimal separator, Excel will use';'as the separator.
- parameter backslash_escape
Whether to allow \", \n,... in quoted fields. This is used by MySQL for example but is not standard CSV so it is set to
falseby default.
- parameter excel_tricks
enables Excel tricks, namely the fact that '"' followed by '0' in a quoted string means ASCII NULL and the fact that a field of the form ="..." only returns the string inside the quotes. Default:
true.
- parameter fix
Fix invalid CSV in order to parse it without raising
Csv.Failure. Seeof_in_obj.
val load_in : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> std_in_channel -> tload_in chloads a CSV file from the input channelch. SeeCsv.loadfor the meaning ofseparatorandexcel_tricks.
val to_in_obj : in_channel -> in_obj_channelFor efficiency reasons, the
in_channelbuffers the data from the original channel. If you want to examine the data by other means than the methods below (say after a failure), you need to use this function in order not to "loose" data in the buffer.
val close_in : in_channel -> unitclose_in iccloses the channelic. The underlying channel is closed as well.
val next : in_channel -> string listnext icreturns the next record in the CSV file.- raises End_of_file
if no more record can be read.
- raises Csv.Failure
if the CSV format is not respected. The partial record read is available with
Csv.current_record.
val fold_left : f:('a -> string list -> 'a) -> init:'a -> in_channel -> 'afold_left f a iccomputes (f ... (f (f a r0) r1) ... rN) where r1,...,rN are the records in the CSV file. Iffraises an exception, the record available at that moment is accessible throughCsv.current_record.
val fold_right : f:(string list -> 'a -> 'a) -> in_channel -> 'a -> 'afold_right f ic acomputes (f r1 ... (f rN-1 (f rN a)) ...) where r1,...,rN-1, rN are the records in the CSV file. All records are read before applyingfso this method is not convenient if your file is large.
val iter : f:(string list -> unit) -> in_channel -> unititer f iciteratesfon all remaining records. Iffraises an exception, the record available at that moment is accessible throughCsv.current_record.
val input_all : in_channel -> tinput_all icreturn a list of the CSV records till the end of the file.
val current_record : in_channel -> string listThe current record under examination. This is useful in order to gather the parsed data in case of
Failure.
val load_rows : ?separator:char -> ?strip:bool -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?fix:bool -> (string list -> unit) -> std_in_channel -> unit- deprecated
use
Csv.iteron aCsv.in_channelcreated withCsv.of_channel.
Output
val to_out_obj : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> out_obj_channel -> out_channelto_out_obj ?separator ?excel_tricks out_chancreates a new "channel" to output the data in CSV form.- parameter separator
What character the separator is. The default is
','.
- parameter backslash_escape
Prefer to escape the separator in a quoted string with a backslash (e.g. "\"") instead of doubling it. Also backslash-escape '\n', '\r', '\t', '\b', '\026' (as '\Z') and '\000' (as '\0'). This is nice for interoperability but is nonstandard CSV so it is set to
falseby default.
- parameter excel_tricks
enables Excel tricks, namely the fact that '\000' is represented as '"' followed by '0' and the fact that a field with leading or trailing spaces or a leading '0' will be encoded as ="..." (to avoid Excel "helping" you). Default:
false.
- parameter quote_all
force all fields to be quoted, even if this is not required by the CSV specification.
val to_channel : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> std_out_channel -> out_channelSame as
Csv.to_out_objbut output to a standard channel.
val to_buffer : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> Stdlib.Buffer.t -> out_channelSame as
Csv.to_out_objbut output to a buffer.
val close_out : out_channel -> unitclose_out occlose the channeloc. The underlying channel is closed as well.
val output_record : out_channel -> string list -> unitoutput_record oc rwrite the recordris CSV form to the channeloc.
val output_all : out_channel -> t -> unitoutput_all oc csvoutputs all records incsvto the channeloc.
val save_out : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> std_out_channel -> t -> unit- deprecated
Save
Csv.tto a channel.
val save : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> string -> t -> unitsave fname csvsaves thecsvdata to the filefname.
val print : ?separator:char -> ?backslash_escape:bool -> ?excel_tricks:bool -> ?quote_all:bool -> t -> unitPrint the CSV data.
val print_readable : ?length:(string -> int) -> t -> unitPrint the CSV data to
stdoutin a human-readable format. Not much is guaranteed about how the CSV is printed, except that it will be easier to follow than a "raw" output done withCsv.print. This is a one-way operation. There is no easy way to parse the output of this command back into CSV data.- parameter length
Function to compute the length of the column. It defaults to
String.length(i.e., count in bytes) but may be replaced to handle non-ASCII encodings.
val save_out_readable : std_out_channel -> ?length:(string -> int) -> t -> unitSame as
Csv.print_readable, allowing the output to be sent to a channel.
Functions to access rows when a header is present
module Row : sig ... endA row with a header.
module Rows : sig ... endAccessing rows (when a header was provided).
Functions acting on CSV data loaded in memory
val lines : t -> intReturn the number of lines in a CSV data.
val columns : t -> intWork out the (maximum) number of columns in a CSV file. Note that each line may be a different length, so this finds the one with the most columns.
val trim : ?top:bool -> ?left:bool -> ?right:bool -> ?bottom:bool -> t -> tThis takes a CSV file and trims empty cells.
All four of the option arguments (
~top,~left,~right,~bottom) default totrue.The exact behaviour is:
~right: If true, remove any empty cells at the right hand end of any row. The number of columns in the resulting CSV structure will not necessarily be the same for each row.~top: If true, remove any empty rows (no cells, or containing just empty cells) from the top of the CSV structure.~bottom: If true, remove any empty rows from the bottom of the CSV structure.~left: If true, remove any empty columns from the left of the CSV structure. Note that~leftand~rightare quite different:~leftconsiders the whole CSV structure, whereas~rightconsiders each row in isolation.
val square : t -> tMake the CSV data "square" (actually rectangular). This pads out each row with empty cells so that all rows are the same length as the longest row. After this operation, every row will have length
Csv.columns.
val is_square : t -> boolReturn true iff the CSV is "square" (actually rectangular). This means that each row has the same number of cells.
val set_columns : cols:int -> t -> tset_columns cols csvmakes the CSV data square by forcing the width to the given number ofcols. Any short rows are padded with blank cells. Any long rows are truncated.
val set_rows : rows:int -> t -> tset_rows rows csvmakes the CSV data have exactlyrowsrows by adding empty rows or truncating rows as necessary.Note that
set_rowsdoes not make the CSV square. If you want it to be square, call eitherCsv.squareorCsv.set_columnsafter.
val set_size : rows:int -> cols:int -> t -> tset_size rows cols csvmakes the CSV data square by forcing the size torows * cols, adding blank cells or truncating as necessary. It is the same as callingset_columns cols (set_rows rows csv)
val sub : r:int -> c:int -> rows:int -> cols:int -> t -> tsub r c rows cols csvreturns a subset ofcsv. The subset is defined as having top left corner at rowr, columnc(counting from0) and beingrowsdeep andcolswide.The returned CSV will be "square".
val compare : t -> t -> intCompare two CSV files for equality, ignoring blank cells at the end of a row, and empty rows appended to one or the other. This is "semantic" equality - roughly speaking, the two CSV files would look the same if opened in a spreadsheet program.
val concat : t list -> tConcatenate CSV files so that they appear side by side, arranged left to right across the page. Each CSV file (except the final one) is first squared.
(To concatenate CSV files so that they appear from top to bottom, just use
List.concat).
val transpose : t -> tPermutes the lines and columns of the CSV data. Nonexistent cells become empty cells after transpose if they must be created.
val to_array : t -> string array arrayval of_array : string array array -> tConvenience functions to convert to and from a matrix representation.
to_arraywill produce a ragged matrix (not all rows will have the same length) unless you callCsv.squarefirst.
val associate : string list -> t -> (string * string) list listassociate header datatakes a block of data and converts each row in turn into an assoc list which maps column header to data cell.Typically a spreadsheet will have the format:
header1 header2 header3 data11 data12 data13 data21 data22 data23 ...This function arranges the data into a more usable form which is robust against changes in column ordering. The output of the function is:
[ ["header1", "data11"; "header2", "data12"; "header3", "data13"]; ["header1", "data21"; "header2", "data22"; "header3", "data23"]; etc. ]Each row is turned into an assoc list (see
List.assoc).If a row is too short, it is padded with empty cells (
""). If a row is too long, it is truncated.You would typically call this function as:
let header, data = match csv with h :: d -> h, d | [] -> assert false;; let data = Csv.associate header data;;The header strings are shared, so the actual space in memory consumed by the spreadsheet is not much larger.
val combine : header:string list -> string list -> (string * string) listcombine ~header rowreturns a row with elements(h, x)wherehis the header name andxthe corresponding row entry. If therowhas less entries thanheader, they are interpreted as being empty. Seeassociatewhich applies this function to all rows.