Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom delimiter for grouping option #1253

Open
lukaspirpamer opened this issue Jul 12, 2024 · 4 comments
Open

Custom delimiter for grouping option #1253

lukaspirpamer opened this issue Jul 12, 2024 · 4 comments
Labels

Comments

@lukaspirpamer
Copy link

When the grouping option is enabled, "," is used as a field delimiter and the delimiter of the input csv file is ignored.
Would it be possible to use the automatically determined delimiter or delimiter of the csv-input file?

groups = self.args.groups.split(',')

@jpmckinney
Copy link
Member

Why?

@jpmckinney jpmckinney changed the title Wrong delimiter is used when the grouping option Custom delimiter for grouping option Jul 12, 2024
@lukaspirpamer
Copy link
Author

Hi James,
thanks for your prompt reply!

Because the grouping option converts the delimiter to be ","
For example, when using csvstack and the delimiter is ";" in the input csv files, in the output file it will be converted to "," when the grouping option is applied. I would have expected the same behaviour as without using the grouping option. What do you think?

Best,
Lukas

@jpmckinney
Copy link
Member

jpmckinney commented Jul 15, 2024

Can you provide a sample command, with sample input?

All CSV Kit commands assume that comma is used as the delimiter, except for in2csv.

If you do the following, the semi-colons are preserved only because csvstack considers them to be part of the data, rather than considering them delimiters:

$ printf 'a;b;c\n1;2;3' | csvstack
a;b;c
1;2;3

You can set a custom delimiter with -d:

$ printf 'a;b;c\n1;2;3' | csvstack -d ';'
a,b,c
1,2,3

You'll see that, now, csvstack understands that ; is the delimiter, and therefore uses comma in the output.

To get output that uses a different delimiter, you must use csvformat.

The reason for this design decision, is that all tools use a common format, and only in2csv controls modifying the input format (along with options like -d), and only csvformat controls modifying the output format. This avoids having to continuously reconfigure the input/output in every single command, when piping output between commands.

@jpmckinney
Copy link
Member

Basically, if you are currently doing csvstack a.csv lot.csv of.csv files.csv that.csv use.csv semicolons.csv, then you are effectively doing the same as cat .... csvstack doesn't recognize the semicolons as delimiters, unless you use -d (in which case, the output will use commas, as described above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants