Skip to content

Commit

Permalink
Add array record handlers
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderkozlenko committed Nov 30, 2023
1 parent 60eb543 commit aeffa87
Show file tree
Hide file tree
Showing 44 changed files with 442 additions and 208 deletions.
2 changes: 1 addition & 1 deletion doc/docfx.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
]
}
],
"dest": "api",
"output": "api",
"filter": "api-filter.yml",
"namespaceLayout": "nested"
}
Expand Down
73 changes: 5 additions & 68 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,19 @@ A general-purpose framework that works with tabular data represented as delimite
- The [TabularReader`<T>`](xref:Addax.Formats.Tabular.TabularReader`1) type provides forward-only, read-only access to tabular data records as typed plain objects.
- The [TabularWriter`<T>`](xref:Addax.Formats.Tabular.TabularWriter`1) type provides forward-only, write-only access to tabular data records as typed plain objects.
- The minimal API flavor:
- The [TabularData](xref:Addax.Formats.Tabular.TabularData) type provides static methods for reading and writing collections of tabular data records.
- The [TabularData](xref:Addax.Formats.Tabular.TabularData) type provides static methods for working with collections of tabular data records and inferring dialects.

<p />

The value converter abstraction provides an ability to work with tabular data fields as typed values, while the record handler abstraction provides an ability to work with tabular data records as typed plain objects by defining a complete read-write workflow. Although record handlers can be created manually, by default they are generated by the built-in source generator according to the metadata explicitly declared with attributes. Each API flavor requires an instance of the [TabularDialect](xref:Addax.Formats.Tabular.TabularDialect) type that specifies how to read and write tabular data.
The value converter abstraction provides an ability to work with tabular data fields as typed values, while the record handler abstraction provides an ability to work with tabular data records as typed plain objects by defining a complete read-write workflow. Although record handlers can be created manually, by default they are generated by the built-in source generator according to the metadata explicitly declared with attributes. Each API flavor requires an instance of the [TabularDialect](xref:Addax.Formats.Tabular.TabularDialect) type that specifies how to read and write tabular data. Framework types that perform I/O operations provide synchronous and asynchronous API, including cancellation support.

<p />

### How to Use

<p />

How to process tabular data with a known structure:
How to work with tabular data of a specific structure:

<p />

Expand Down Expand Up @@ -194,7 +194,7 @@ using (new TabularReader(File.OpenRead "books.csv", dialect)) (fun reader ->

<p />

How to process tabular data with a known structure that has a header:
How to work with tabular data of a specific structure that has a header:

<p />

Expand Down Expand Up @@ -377,7 +377,7 @@ using (new TabularReader(File.OpenRead "books.csv", dialect)) (fun reader ->

<p />

How to process a limited amount of records with the minimal API:
How to work with tabular data of a specific structure using the minimal API:

<p />

Expand Down Expand Up @@ -474,66 +474,3 @@ for book in books do
---

<p />

How to display the first ten records from a file with an unknown structure using a built-in record handler:

<p />

# [High-level API (C#)](#tab/api-hl-cs)

```cs
var dialect = new TabularDialect("\r\n", ',', '\"');

using (var reader = new TabularReader<string?[]>(File.OpenRead("data.csv"), dialect))
{
while (reader.TryReadRecord() && (reader.RecordsRead <= 10))
{
Console.WriteLine(string.Join('|', reader.CurrentRecord));
}
}
```

# [Low-level API (C#)](#tab/api-ll-cs)

```cs
var dialect = new TabularDialect("\r\n", ',', '\"');

using (var reader = new TabularReader(File.OpenRead("books.csv"), dialect))
{
while (reader.TryPickRecord() && (reader.RecordsRead <= 10))
{
while (reader.TryReadField())
{
Console.Write($"{reader.GetString()}|");
}

Console.WriteLine();
}
}
```

# [High-level API (F#)](#tab/api-hl-fs)

```fs
let dialect = new TabularDialect("\r\n", ',', '\"')
using (new TabularReader<array<string>>(File.OpenRead "books.csv", dialect)) (fun reader ->
while reader.TryReadRecord () && (reader.RecordsRead <= 10) do
printfn "%s" (String.concat "|" reader.CurrentRecord)
)
```

# [Low-level API (F#)](#tab/api-ll-fs)

```fs
let dialect = new TabularDialect("\r\n", ',', '\"')
using (new TabularReader(File.OpenRead "books.csv", dialect)) (fun reader ->
while reader.TryPickRecord () && (reader.RecordsRead <= 10) do
while reader.TryReadField () do
printf "%s|" (reader.GetString ())
printfn ""
)
```

---
2 changes: 1 addition & 1 deletion doc/topics/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ uid: urn:topics:benchmarks

<p />

The following benchmarks reflect the approximate time and memory required to process `1,048,576` fields:
The following benchmarks reflect the approximate time and memory required to process 1,048,576 fields:

<p />

Expand Down
75 changes: 50 additions & 25 deletions doc/topics/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ uid: urn:topics:features

<p />

### Value Types
### Tabular Fields

<p />

The framework has built-in support for working with tabular fields as values of the following types:
The framework has built-in support for working with tabular fields as values of the following types:

<p />

|Runtime Type|Representation|Standard|
|-|-|-|
|Type|String Format|Standard|
|:-|:-|:-|
|`System.Boolean`|Lexical space: `"true" | "false" | "1" | "0"`|W3C XSD 1.1 P2|
|`System.Byte`|Format specifier: `"g"`||
|`System.Char`|One UTF-16 code unit||
Expand All @@ -32,7 +32,7 @@ The framework has built-in support for working with tabular fields as values of
|`System.Int128`|Format specifier: `"g"`||
|`System.SByte`|Format specifier: `"g"`||
|`System.Single`|Format specifier: `"g"`||
|`System.String`|Up to `2,147,483,591` UTF-16 code units||
|`System.String`|Up to 2,147,483,591 UTF-16 code units||
|`System.TimeOnly`|Format: `"HH':'mm':'ss.FFFFFFF"`|RFC 3339 / ISO 8601-1:2019|
|`System.TimeSpan`|Format: `"[-]'P'd'DT'h'H'm'M's.FFFFFFF'S'"`|RFC 3339 / ISO 8601-1:2019|
|`System.UInt16`|Format specifier: `"g"`||
Expand All @@ -45,29 +45,66 @@ The framework has built-in support for working with tabular fields as values of

<p />

Any generated record handler also supports type members of the `System.Nullable<T>` type with any supported value type as the underlying type.
Any generated record handler also supports type members of the `System.Nullable<T>` type with any supported value type as the underlying type. To map a type member of the `System.Byte[]` type for a generated record handler, one of the available value converters must be specified explicitly:

<p />

To use a type member of the `System.Byte[]` type with a generated record handler, one of the available converters must be specified explicitly:
- [TabularBase16BinaryConverter](xref:Addax.Formats.Tabular.Converters.TabularBase16BinaryConverter)
- [TabularBase64BinaryConverter](xref:Addax.Formats.Tabular.Converters.TabularBase64BinaryConverter)

<p />

- [TabularBase16BinaryConverter](xref:Addax.Formats.Tabular.Converters.TabularBase16BinaryConverter)
- [TabularBase64BinaryConverter](xref:Addax.Formats.Tabular.Converters.TabularBase64BinaryConverter)
### Tabular Records

<p />

### Dialect Inferrence
The framework has built-in support for working with tabular records as single-dimensional arrays `T[]` or `System.Nullable<T>[]` of any supported type (except `System.Byte[]`). For example, tabular records of any file can be interpreted as string arrays, even if they have different number of fields:

<p />

# [C#](#tab/cs)

```cs
var dialect = new TabularDialect("\r\n", ',', '\"');

using (var reader = new TabularReader<string?[]>(File.OpenRead("data.csv"), dialect))
{
while (reader.TryReadRecord())
{
Console.WriteLine(string.Join('|', reader.CurrentRecord));
}
}
```

# [F#](#tab/fs)

```fs
let dialect = new TabularDialect("\r\n", ',', '\"')
using (new TabularReader<array<string>>(File.OpenRead "books.csv", dialect)) (fun reader ->
while reader.TryReadRecord ()) do
printfn "%s" (String.concat "|" reader.CurrentRecord)
)
```

---

<p />

> [!NOTE]
> The section describes a preview feature that is available in the latest pre-release package.
The framework also provides generic record handlers for working with tabular records as single-dimensional arrays of any type:

<p />

A dialect can be inferred from a stream based on frequency of the eligible token values:
- [TabularArrayHandler\<T\>](xref:Addax.Formats.Tabular.Handlers.TabularArrayHandler`1)
- [TabularSparseArrayHandler\<T\>](xref:Addax.Formats.Tabular.Handlers.TabularSparseArrayHandler`1)

<p />

### Dialect Inferrence

<p />

A tabular dialect can be inferred from a stream based on frequency of the eligible token values:

<p />

Expand Down Expand Up @@ -127,10 +164,6 @@ type TabularReader<'T> =

<p />

### Memory Usage

<p />

The field reader provides access to the last read field in a way that allows reading the field without additional string allocations:

<p />
Expand Down Expand Up @@ -181,11 +214,3 @@ let options = new TabularOptions (
```

---

<p />

### References

<p />

- [W3C - Model for Tabular Data and Metadata on the Web](https://w3.org/TR/2015/REC-tabular-data-model-20151217)
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
// (c) Oleksandr Kozlenko. Licensed under the MIT license.

using System.Diagnostics.CodeAnalysis;

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts binary data encoded with "base16" ("hex") encoding from or to a sequence of characters.</summary>
public class TabularBase16BinaryConverter : TabularConverter<byte[]>
/// <summary>Converts binary data encoded with "base16" ("hex") encoding from or to a character sequence.</summary>
public class TabularBase16BinaryConverter : TabularConverter<byte[]?>
{
internal static readonly TabularBase16BinaryConverter Instance = new();

Expand Down Expand Up @@ -41,7 +43,7 @@ public override bool TryFormat(byte[]? value, Span<char> destination, IFormatPro
}

/// <inheritdoc />
public override bool TryParse(ReadOnlySpan<char> source, IFormatProvider? provider, out byte[]? value)
public override bool TryParse(ReadOnlySpan<char> source, IFormatProvider? provider, [NotNullWhen(true)] out byte[]? value)
{
source = source.Trim();

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
// (c) Oleksandr Kozlenko. Licensed under the MIT license.

using System.Diagnostics.CodeAnalysis;
using Addax.Formats.Tabular.Buffers;

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts binary data encoded with "base64" encoding from or to a sequence of characters.</summary>
public class TabularBase64BinaryConverter : TabularConverter<byte[]>
/// <summary>Converts binary data encoded with "base64" encoding from or to a character sequence.</summary>
public class TabularBase64BinaryConverter : TabularConverter<byte[]?>
{
internal static readonly TabularBase64BinaryConverter Instance = new();

Expand All @@ -21,7 +22,7 @@ public override bool TryFormat(byte[]? value, Span<char> destination, IFormatPro
}

/// <inheritdoc />
public override bool TryParse(ReadOnlySpan<char> source, IFormatProvider? provider, out byte[]? value)
public override bool TryParse(ReadOnlySpan<char> source, IFormatProvider? provider, [NotNullWhen(true)] out byte[]? value)
{
var bufferSize = (int)Math.Ceiling(source.Length / 4.0) * 3;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="BigInteger" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="BigInteger" /> value from or to a character sequence.</summary>
public class TabularBigIntegerConverter : TabularConverter<BigInteger>
{
internal static readonly TabularBigIntegerConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="bool" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="bool" /> value from or to a character sequence.</summary>
public class TabularBooleanConverter : TabularConverter<bool>
{
internal static readonly TabularBooleanConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="byte" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="byte" /> value from or to a character sequence.</summary>
public class TabularByteConverter : TabularConverter<byte>
{
internal static readonly TabularByteConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="char" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="char" /> value from or to a character sequence.</summary>
public class TabularCharConverter : TabularConverter<char>
{
internal static readonly TabularCharConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="DateOnly" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="DateOnly" /> value from or to a character sequence.</summary>
public class TabularDateOnlyConverter : TabularConverter<DateOnly>
{
internal static readonly TabularDateOnlyConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="DateTime" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="DateTime" /> value from or to a character sequence.</summary>
public class TabularDateTimeConverter : TabularConverter<DateTime>
{
internal static readonly TabularDateTimeConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="DateTimeOffset" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="DateTimeOffset" /> value from or to a character sequence.</summary>
public class TabularDateTimeOffsetConverter : TabularConverter<DateTimeOffset>
{
internal static readonly TabularDateTimeOffsetConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="decimal" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="decimal" /> value from or to a character sequence.</summary>
public class TabularDecimalConverter : TabularConverter<decimal>
{
internal static readonly TabularDecimalConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="double" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="double" /> value from or to a character sequence.</summary>
public class TabularDoubleConverter : TabularConverter<double>
{
internal static readonly TabularDoubleConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="Guid" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="Guid" /> value from or to a character sequence.</summary>
public class TabularGuidConverter : TabularConverter<Guid>
{
internal static readonly TabularGuidConverter Instance = new();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

namespace Addax.Formats.Tabular.Converters;

/// <summary>Converts a <see cref="Half" /> value from or to a sequence of characters.</summary>
/// <summary>Converts a <see cref="Half" /> value from or to a character sequence.</summary>
public class TabularHalfConverter : TabularConverter<Half>
{
internal static readonly TabularHalfConverter Instance = new();
Expand Down
Loading

0 comments on commit aeffa87

Please sign in to comment.