Skip to content

Commit

Permalink
Merge branch 'prestodb:master' into rest
Browse files Browse the repository at this point in the history
  • Loading branch information
jkhaliqi authored Aug 22, 2024
2 parents 443e3b8 + 797019c commit cfff25e
Show file tree
Hide file tree
Showing 87 changed files with 2,601 additions and 415 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ resources will be hot-reloaded and changes are reflected on browser refresh.

[Velox](https://github.com/facebookincubator/velox) is a C++ database library which provides reusable, extensible, and high-performance data processing components.

Check out [building instructions](https://github.com/prestodb/presto/tree/master/presto-native-execution#building) to get started.
Check out [building instructions](https://github.com/prestodb/presto/tree/master/presto-native-execution#build-from-source) to get started.


<hr>
Expand Down
5 changes: 2 additions & 3 deletions presto-docs/src/main/sphinx/connector/clickhouse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,8 @@ SQL support
-----------

The connector provides read and write access to data and metadata in
a ClickHouse catalog. In addition to the :ref:`globally available
<sql-globally-available>` and :ref:`read operation <sql-read-operations>`
statements, the connector supports the following features:
a ClickHouse catalog. In addition to the globally available and
read operation statements, the connector supports the following features:

* :doc:`/sql/insert`
* :doc:`/sql/truncate`
Expand Down
3 changes: 1 addition & 2 deletions presto-docs/src/main/sphinx/connector/googlesheets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,5 @@ fetching the sheet data for every table, unless it is already cached.
SQL support
-----------

The connector provides :ref:`globally available <sql-globally-available>` and
:ref:`read operation <sql-read-operations>` statements to access data and
The connector provides globally available and read operation statements to access data and
metadata in Google Sheets.
3 changes: 1 addition & 2 deletions presto-docs/src/main/sphinx/connector/iceberg.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1329,8 +1329,7 @@ Example Queries
^^^^^^^^^^^^^^^

Similar to the example queries in `SCHEMA EVOLUTION`_, create an Iceberg
table named `ctas_nation` from the TPCH `nation` table::

table named `ctas_nation` from the TPCH `nation` table:

.. code-block:: sql

Expand Down
11 changes: 11 additions & 0 deletions presto-docs/src/main/sphinx/functions/ip.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,14 @@ IP Functions
SELECT is_subnet_of(IPPREFIX '192.168.3.131/26', IPPREFIX '192.168.3.144/30'); -- true
SELECT is_subnet_of(IPPREFIX '64:ff9b::17/64', IPPREFIX '64:ffff::17/64'); -- false
SELECT is_subnet_of(IPPREFIX '192.168.3.131/26', IPPREFIX '192.168.3.131/26'); -- true

.. function:: ip_prefix_collapse(array(ip_prefix)) -> array(ip_prefix)

Returns the minimal CIDR representation of the input ``IPPREFIX`` array.
Every ``IPPREFIX`` in the input array must be the same IP version (that is, only IPv4 or only IPv6)
or the query will fail and raise an error. ::

SELECT IP_PREFIX_COLLAPSE(ARRAY[IPPREFIX '192.168.0.0/24', IPPREFIX '192.168.1.0/24']); -- [{192.168.0.0/23}]
SELECT IP_PREFIX_COLLAPSE(ARRAY[IPPREFIX '2620:10d:c090::/48', IPPREFIX '2620:10d:c091::/48']); -- [{2620:10d:c090::/47}]
SELECT IP_PREFIX_COLLAPSE(ARRAY[IPPREFIX '192.168.1.0/24', IPPREFIX '192.168.0.0/24', IPPREFIX '192.168.2.0/24', IPPREFIX '192.168.9.0/24']); -- [{192.168.0.0/23}, {192.168.2.0/24}, {192.168.9.0/24}]

4 changes: 4 additions & 0 deletions presto-docs/src/main/sphinx/installation/jdbc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ The above URL can be used as follows to create a connection:
String url = "jdbc:presto://example.net:8080/hive/sales";
Connection connection = DriverManager.getConnection(url, "test", null);
.. _jdbc-java-connection:

Connection Parameters
---------------------

Expand All @@ -75,6 +77,8 @@ These methods may be mixed; some parameters may be specified in the URL
while others are specified using properties. However, the same parameter
may not be specified using both methods.

.. _jdbc-parameter-reference:

Parameter Reference
-------------------

Expand Down
49 changes: 41 additions & 8 deletions presto-docs/src/main/sphinx/presto_cpp/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,36 @@ HTTP endpoints related to tasks are registered to Proxygen in

Other HTTP endpoints include:

* POST: v1/memory
* Reports memory, but no assignments are adjusted unlike in Java workers.
* GET: v1/info
* GET: v1/status
* POST: v1/memory: Reports memory, but no assignments are adjusted unlike in Java workers
* GET: v1/info/metrics: Returns worker level metrics in Prometheus Data format. Refer section `Worker Metrics Collection <#worker-metrics-collection>`_ for more info. Here is a sample Metrics data returned by this API.

The request/response flow of Presto C++ is identical to Java workers. The
tasks or new splits are registered via `TaskUpdateRequest`. Resource
utilization and query progress are sent to the coordinator via task endpoints.
.. code-block:: text
# TYPE presto_cpp_num_http_request counter
presto_cpp_num_http_request{cluster="testing",worker=""} 0
# TYPE presto_cpp_num_http_request_error counter
presto_cpp_num_http_request_error{cluster="testing",worker=""} 0
# TYPE presto_cpp_memory_pushback_count counter
presto_cpp_memory_pushback_count{cluster="testing",worker=""} 0
# TYPE velox_driver_yield_count counter
velox_driver_yield_count{cluster="testing",worker=""} 0
# TYPE velox_cache_shrink_count counter
velox_cache_shrink_count{cluster="testing",worker=""} 0
# TYPE velox_memory_cache_num_stale_entries counter
velox_memory_cache_num_stale_entries{cluster="testing",worker=""} 0
# TYPE velox_arbitrator_requests_count counter
velox_arbitrator_requests_count{cluster="testing",worker=""} 0
* GET: v1/info: Returns basic information about the worker. Here is an example:

.. code-block:: text
{"coordinator":false,"environment":"testing","nodeVersion":{"version":"testversion"},"starting":false,"uptime":"49.00s"}
* GET: v1/status: Returns memory pool information.

The request/response flow of Presto C++ is identical to Java workers. The tasks or new splits are registered via `TaskUpdateRequest`. Resource utilization and query progress are sent to the coordinator via task endpoints.

Remote Function Execution
-------------------------
Expand Down Expand Up @@ -169,7 +190,7 @@ Size of the SSD cache when async data cache is enabled.
* **Default value:** ``true``
* **Presto on Spark default value:** ``false``

Enable periodic clean up of old tasks. The default value is ``true`` for Presto C++.
Enable periodic clean up of old tasks. The default value is ``true`` for Presto C++.
For Presto on Spark this property defaults to ``false``, as zombie or stuck tasks
are handled by Spark by speculative execution.

Expand All @@ -185,6 +206,18 @@ Old task is defined as a PrestoTask which has not received heartbeat for at leas
``old-task-cleanup-ms``, or is not running and has an end time more than
``old-task-cleanup-ms`` ago.

Worker metrics collection
-------------------------

Users can enable collection of worker level metrics by setting the property:

``runtime-metrics-collection-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``boolean``
* **Default value:** ``false``

When true, the default behavior is a no-op. There is a prior setup that must be done before enabling this flag. To enable
metrics collection in Prometheus Data Format refer `here <https://github.com/prestodb/presto/tree/master/presto-native-execution#build-prestissimo>`_.

Session Properties
------------------
Expand Down
7 changes: 7 additions & 0 deletions presto-docs/src/main/sphinx/presto_cpp/properties.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,13 @@ The configuration properties of Presto C++ workers are described here, in alphab
1) the non-reserved space in ``query-memory-gb`` is used up; and 2) the amount
it tries to get is less than ``memory-pool-reserved-capacity``.

``runtime-metrics-collection-enabled``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* **Type:** ``boolean``
* **Default value:** ``false``

Enables collection of worker level metrics.

``system-memory-gb``
^^^^^^^^^^^^^^^^^^^^

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@
import static com.facebook.presto.common.type.Decimals.isShortDecimal;
import static com.facebook.presto.hive.BaseHiveColumnHandle.ColumnType.REGULAR;
import static com.facebook.presto.hive.HiveColumnHandle.isInfoColumnHandle;
import static com.facebook.presto.hive.HiveColumnHandle.isPathColumnHandle;
import static com.facebook.presto.hive.HiveCommonSessionProperties.isUseParquetColumnNames;
import static com.facebook.presto.hive.HiveErrorCode.HIVE_INVALID_METADATA;
import static com.facebook.presto.hive.HiveErrorCode.HIVE_PARTITION_DROPPED_DURING_QUERY;
Expand Down Expand Up @@ -426,16 +425,6 @@ private HiveSplitSource computeSplitSource(SplitSchedulingContext splitSchedulin
return splitSource;
}

private static Optional<Domain> getPathDomain(TupleDomain<Subfield> domainPredicate, Map<String, HiveColumnHandle> predicateColumns)
{
checkArgument(!domainPredicate.isNone(), "Unexpected domain predicate: none");

return domainPredicate.getDomains().get().entrySet().stream()
.filter(entry -> isPathColumnHandle(predicateColumns.get(entry.getKey().getRootName())))
.findFirst()
.map(Map.Entry::getValue);
}

private static Map<Integer, Domain> getInfoColumnConstraints(TupleDomain<Subfield> domainPredicate, Map<String, HiveColumnHandle> predicateColumns)
{
checkArgument(!domainPredicate.isNone(), "Unexpected domain predicate: none");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ public class ManifestPartitionLoader
private static final String[] BLOCK_LOCATION_HOSTS = {"localhost"};

private final Table table;
Map<Integer, Domain> infoColumnConstraints;
private final Map<Integer, Domain> infoColumnConstraints;
private final ConnectorSession session;
private final HdfsEnvironment hdfsEnvironment;
private final HdfsContext hdfsContext;
Expand All @@ -87,7 +87,7 @@ public ManifestPartitionLoader(
boolean schedulerUsesHostAddresses)
{
this.table = requireNonNull(table, "table is null");
this.infoColumnConstraints = requireNonNull(infoColumnConstraints, "pathDomain is null");
this.infoColumnConstraints = requireNonNull(infoColumnConstraints, "infoColumnConstraints is null");
this.session = requireNonNull(session, "session is null");
this.hdfsEnvironment = requireNonNull(hdfsEnvironment, "hdfsEnvironment is null");
this.hdfsContext = new HdfsContext(session, table.getDatabaseName(), table.getTableName(), table.getStorage().getLocation(), false);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ public class StoragePartitionLoader
private static final ListenableFuture<?> COMPLETED_FUTURE = immediateFuture(null);

private final Table table;
Map<Integer, Domain> infoColumnConstraints;
private final Map<Integer, Domain> infoColumnConstraints;
private final Optional<BucketSplitInfo> tableBucketInfo;
private final HdfsEnvironment hdfsEnvironment;
private final HdfsContext hdfsContext;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -248,15 +248,6 @@ private static List<HostAddress> getHostAddresses(BlockLocation blockLocation)
.collect(toImmutableList());
}

private static boolean pathMatchesPredicate(Optional<Domain> pathDomain, String path)
{
if (!pathDomain.isPresent()) {
return true;
}

return pathDomain.get().includesNullableValue(utf8Slice(path));
}

private static boolean infoColumnsMatchPredicates(Map<Integer, Domain> constraints,
String path,
long fileSize,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import static com.facebook.presto.common.type.StandardTypes.TIMESTAMP;
import static com.facebook.presto.common.type.StandardTypes.VARCHAR;
import static java.util.Objects.requireNonNull;
import static org.apache.iceberg.IcebergLibUtils.withIncrementalCleanup;

public class ExpireSnapshotsProcedure
implements Provider<Procedure>
Expand Down Expand Up @@ -85,7 +86,10 @@ private void doExpireSnapshots(ConnectorSession clientSession, String schema, St
SchemaTableName schemaTableName = new SchemaTableName(schema, tableName);
Table icebergTable = IcebergUtil.getIcebergTable(metadata, clientSession, schemaTableName);

ExpireSnapshots expireSnapshots = icebergTable.expireSnapshots();
// Incremental clean up strategy has a bug when expire specified snapshots.
// So explicitly use reachable file cleanup strategy here.
// Referring to https://github.com/apache/iceberg/issues/10982
ExpireSnapshots expireSnapshots = withIncrementalCleanup(icebergTable.expireSnapshots(), false);

if (snapshotIds != null) {
for (long id : snapshotIds) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.iceberg;

import static com.google.common.base.Preconditions.checkArgument;
import static java.util.Objects.requireNonNull;

public class IcebergLibUtils
{
private IcebergLibUtils()
{}

/**
* Call the method in Iceberg lib's protected class to set explicitly
* whether to use incremental cleanup when expiring snapshots
* */
public static ExpireSnapshots withIncrementalCleanup(ExpireSnapshots expireSnapshots, boolean incrementalCleanup)
{
requireNonNull(expireSnapshots, "expireSnapshots is null");
checkArgument(expireSnapshots instanceof RemoveSnapshots, "expireSnapshots is not an instance of RemoveSnapshots");
return ((RemoveSnapshots) expireSnapshots).withIncrementalCleanup(incrementalCleanup);
}
}
Loading

0 comments on commit cfff25e

Please sign in to comment.