Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra record is read when file has line ending in the end #21

Open
joelverhagen opened this issue Jan 12, 2024 · 0 comments
Open

Extra record is read when file has line ending in the end #21

joelverhagen opened this issue Jan 12, 2024 · 0 comments
Assignees

Comments

@joelverhagen
Copy link

It is common to have a new line (\n or \r\n) at the end of a text file following the last line (e.g. in Posix as shared on Stack Overflow). Generally this is not seen as the separator for a new CSV record. For the CSV records in my benchmark (https://github.com/joelverhagen/NCsvPerf), all of the parsers I've tested so far have this property of not yielding an empty record at the end.

Repro of what I am talking about:

using System.Text;
using Addax.Formats.Tabular;

var lines = new[] { "a,b,c", "1,2,3", "x,y,z" };
var file = string.Join("\r\n", lines) + "\r\n"; // line ending at the end
var dialect = new TabularDialect("\r\n", ',', '\"');
var stream = new MemoryStream(Encoding.UTF8.GetBytes(file));
using (var reader = new TabularReader(stream, dialect))
{
    while (reader.TryPickRecord())
    {
        Console.WriteLine("Record:");
        while (reader.TryReadField())
        {
            Console.Write("  Field: ");
            if (reader.TryGetString(out var value))
            {
                Console.WriteLine(value);
            }
            else
            {
                Console.WriteLine("(no value)");
            }
        }
    }
}

Actual output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z
Record:
  Field:

Expected output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z

I think this can be easily worked around by detecting a single empty string field on a line when more fields are expected, which is what I will do for my benchmark which will include Addax.

Nice work on the library! Thanks!

joelverhagen added a commit to joelverhagen/NCsvPerf that referenced this issue Jan 12, 2024
@alexanderkozlenko alexanderkozlenko self-assigned this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants