diff --git a/docs/content/aws.md b/docs/content/aws.md index 2f8e1911d..3b5a48e58 100644 --- a/docs/content/aws.md +++ b/docs/content/aws.md @@ -4,6 +4,7 @@ url: aws menu: main: parent: Integrations + identifier: aws_integration weight: 0 --- + +# Metrics Reporting + +As of 1.1.0 Iceberg supports the [`MetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) and the [`MetricsReport`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReport.html) APIs. These two APIs allow expressing different metrics reports while supporting a pluggable way of reporting these reports. + +## Type of Reports + +### ScanReport +A [`ScanReport`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/ScanReport.html) carries metrics being collected during scan planning against a given table. Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like: +* total scan planning duration +* number of data/delete files included in the result +* number of data/delete manifests scanned/skipped +* number of data/delete files scanned/skipped +* number of equality/positional delete files scanned + + +### CommitReport +A [`CommitReport`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/CommitReport.html) carries metrics being collected after committing changes to a table (aka producing a snapshot). Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like: +* total duration +* number of attempts required for the commit to succeed +* number of added/removed data/delete files +* number of added/removed equality/positional delete files +* number of added/removed equality/positional deletes + + +## Available Metrics Reporters + +### [`LoggingMetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/LoggingMetricsReporter.html) + +This is the default metrics reporter when nothing else is configured and its purpose is to log results to the log file. Example output would look as shown below: + +``` +INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics report: +ScanReport{ + tableName=scan-planning-with-eq-and-pos-delete-files, + snapshotId=2, + filter=ref(name="data") == "(hash-27fa7cc0)", + schemaId=0, + projectedFieldIds=[1, 2], + projectedFieldNames=[id, data], + scanMetrics=ScanMetricsResult{ + totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.026569404S, count=1}, + resultDataFiles=CounterResult{unit=COUNT, value=1}, + resultDeleteFiles=CounterResult{unit=COUNT, value=2}, + totalDataManifests=CounterResult{unit=COUNT, value=1}, + totalDeleteManifests=CounterResult{unit=COUNT, value=1}, + scannedDataManifests=CounterResult{unit=COUNT, value=1}, + skippedDataManifests=CounterResult{unit=COUNT, value=0}, + totalFileSizeInBytes=CounterResult{unit=BYTES, value=10}, + totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=20}, + skippedDataFiles=CounterResult{unit=COUNT, value=0}, + skippedDeleteFiles=CounterResult{unit=COUNT, value=0}, + scannedDeleteManifests=CounterResult{unit=COUNT, value=1}, + skippedDeleteManifests=CounterResult{unit=COUNT, value=0}, + indexedDeleteFiles=CounterResult{unit=COUNT, value=2}, + equalityDeleteFiles=CounterResult{unit=COUNT, value=1}, + positionalDeleteFiles=CounterResult{unit=COUNT, value=1}}, + metadata={ + iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 4868d2823004c8c256a50ea7c25cff94314cc135)}} +``` + +``` +INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics report: +CommitReport{ + tableName=scan-planning-with-eq-and-pos-delete-files, + snapshotId=1, + sequenceNumber=1, + operation=append, + commitMetrics=CommitMetricsResult{ + totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.098429626S, count=1}, + attempts=CounterResult{unit=COUNT, value=1}, + addedDataFiles=CounterResult{unit=COUNT, value=1}, + removedDataFiles=null, + totalDataFiles=CounterResult{unit=COUNT, value=1}, + addedDeleteFiles=null, + addedEqualityDeleteFiles=null, + addedPositionalDeleteFiles=null, + removedDeleteFiles=null, + removedEqualityDeleteFiles=null, + removedPositionalDeleteFiles=null, + totalDeleteFiles=CounterResult{unit=COUNT, value=0}, + addedRecords=CounterResult{unit=COUNT, value=1}, + removedRecords=null, + totalRecords=CounterResult{unit=COUNT, value=1}, + addedFilesSizeInBytes=CounterResult{unit=BYTES, value=10}, + removedFilesSizeInBytes=null, + totalFilesSizeInBytes=CounterResult{unit=BYTES, value=10}, + addedPositionalDeletes=null, + removedPositionalDeletes=null, + totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, + addedEqualityDeletes=null, + removedEqualityDeletes=null, + totalEqualityDeletes=CounterResult{unit=COUNT, value=0}}, + metadata={ + iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 4868d2823004c8c256a50ea7c25cff94314cc135)}} +``` + + +### [`RESTMetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/rest/RESTMetricsReporter.html) + +This is the default when using the [`RESTCatalog`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/rest/RESTCatalog.html) and its purpose is to send metrics to a REST server at the `/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics` endpoint as defined in the [REST OpenAPI spec](https://github.com/apache/iceberg/blob/master/open-api/rest-catalog-open-api.yaml). + +Sending metrics via REST can be controlled with the `rest-metrics-reporting-enabled` (defaults to `true`) property. + + +## Implementing a custom Metrics Reporter + +Implementing the [`MetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) API gives full flexibility in dealing with incoming [`MetricsReport`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReport.html) instances. For example, it would be possible to send results to a Prometheus endpoint or any other observability framework/system. + +Below is a short example illustrating an `InMemoryMetricsReporter` that stores reports in a list and makes them available: +```java +public class InMemoryMetricsReporter implements MetricsReporter { + + private List metricsReports = Lists.newArrayList(); + + @Override + public void report(MetricsReport report) { + metricsReports.add(report); + } + + public List reports() { + return metricsReports; + } +} +``` + +## Registering a custom Metrics Reporter + +### Via Catalog Configuration + +The [catalog property](../configuration#catalog-properties) `metrics-reporter-impl` allows registering a given [`MetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) by specifying its fully-qualified class name, e.g. `metrics-reporter-impl=org.apache.iceberg.metrics.InMemoryMetricsReporter`. + +### Via the Java API during Scan planning + +Independently of the [`MetricsReporter`](../../../javadoc/{{% icebergVersion %}}/org/apache/iceberg/metrics/MetricsReporter.html) being registered at the catalog level via the `metrics-reporter-impl` property, it is also possible to supply additional reporters during scan planning as shown below: + +```java +TableScan tableScan = + table + .newScan() + .metricsReporter(customReporterOne) + .metricsReporter(customReporterTwo); + +try (CloseableIterable fileScanTasks = tableScan.planFiles()) { + // ... +} +``` \ No newline at end of file diff --git a/docs/content/nessie.md b/docs/content/nessie.md index b64847f72..47b91c891 100644 --- a/docs/content/nessie.md +++ b/docs/content/nessie.md @@ -4,6 +4,7 @@ url: nessie menu: main: parent: Integrations + identifier: nessie_integration weight: 0 ---