Extra record is read when file has line ending in the end #21

joelverhagen · 2024-01-12T22:08:52Z

It is common to have a new line (\n or \r\n) at the end of a text file following the last line (e.g. in Posix as shared on Stack Overflow). Generally this is not seen as the separator for a new CSV record. For the CSV records in my benchmark (https://github.com/joelverhagen/NCsvPerf), all of the parsers I've tested so far have this property of not yielding an empty record at the end.

Repro of what I am talking about:

using System.Text;
using Addax.Formats.Tabular;

var lines = new[] { "a,b,c", "1,2,3", "x,y,z" };
var file = string.Join("\r\n", lines) + "\r\n"; // line ending at the end
var dialect = new TabularDialect("\r\n", ',', '\"');
var stream = new MemoryStream(Encoding.UTF8.GetBytes(file));
using (var reader = new TabularReader(stream, dialect))
{
    while (reader.TryPickRecord())
    {
        Console.WriteLine("Record:");
        while (reader.TryReadField())
        {
            Console.Write("  Field: ");
            if (reader.TryGetString(out var value))
            {
                Console.WriteLine(value);
            }
            else
            {
                Console.WriteLine("(no value)");
            }
        }
    }
}

Actual output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z
Record:
  Field:

Expected output:

Record:
  Field: a
  Field: b
  Field: c
Record:
  Field: 1
  Field: 2
  Field: 3
Record:
  Field: x
  Field: y
  Field: z

I think this can be easily worked around by detecting a single empty string field on a line when more fields are expected, which is what I will do for my benchmark which will include Addax.

Nice work on the library! Thanks!

The text was updated successfully, but these errors were encountered:

Workaround filed here: alexanderkozlenko/addax#21

joelverhagen added a commit to joelverhagen/NCsvPerf that referenced this issue Jan 12, 2024

Add Addax with workaround

45fecf7

Workaround filed here: alexanderkozlenko/addax#21

alexanderkozlenko self-assigned this Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra record is read when file has line ending in the end #21

Extra record is read when file has line ending in the end #21

joelverhagen commented Jan 12, 2024

Extra record is read when file has line ending in the end #21

Extra record is read when file has line ending in the end #21

Comments

joelverhagen commented Jan 12, 2024