Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
how-to-handle-unique-counting.md	how-to-handle-unique-counting.md

description
Learn how to query Pinot using PQL

Pinot Query Language (PQL)

PQL

PQL is a derivative of SQL that supports selection, projection, aggregation, and grouping aggregation.

PQL Limitations

PQL is only a derivative of SQL, and it does not support Joins nor Subqueries. In order to support them, we suggest to rely on PrestoDB https://prestodb.io/, although Subqueries are not completely supported by PrestoDB at the moment of writing.

PQL Examples

The Pinot Query Language (PQL) is very similar to standard SQL:

SELECT COUNT(*) FROM myTable

Aggregation

SELECT COUNT(*), MAX(foo), SUM(bar) FROM myTable

Grouping on Aggregation

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz LIMIT 50

Ordering on Aggregation

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM myTable
  GROUP BY bar, baz 
  ORDER BY bar, MAX(foo) DESC LIMIT 50

Filtering

SELECT COUNT(*) FROM myTable
  WHERE foo = 'foo'
  AND bar BETWEEN 1 AND 20
  OR (baz < 42 AND quux IN ('hello', 'goodbye') AND quuux NOT IN (42, 69))

Selection (Projection)

SELECT * FROM myTable
  WHERE quux < 5
  LIMIT 50

Ordering on Selection

SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 100

Pagination on Selection

Note: results might not be consistent if column ordered by has same value in multiple rows.

SELECT foo, bar FROM myTable
  WHERE baz > 20
  ORDER BY bar DESC
  LIMIT 50, 100

Wild-card match (in WHERE clause only)

To count rows where the column airlineName starts with U

SELECT count(*) FROM SomeTable
  WHERE regexp_like(airlineName, '^U.*')
  GROUP BY airlineName TOP 10

UDF

As of now, functions have to be implemented within Pinot. Injecting functions is not allowed yet. The example below demonstrate the use of UDFs. More examples in Transform Function in Aggregation Grouping

SELECT count(*) FROM myTable
  GROUP BY dateTimeConvert(timeColumnName, '1:MILLISECONDS:EPOCH', '1:HOURS:EPOCH', '1:HOURS')

BYTES column

Pinot supports queries on BYTES column using HEX string. The query response also uses hex string to represent bytes value.

E.g. the query below fetches all the rows for a given UID.

SELECT * FROM myTable
  WHERE UID = "c8b3bce0b378fc5ce8067fc271a34892"

PQL Specification

SELECT

The select statement is as follows

SELECT <outputColumn> (, outputColumn, outputColumn,...)
  FROM <tableName>
  (WHERE ... | GROUP BY ... | ORDER BY ... | TOP ... | LIMIT ...)

outputColumn can be * to project all columns, columns (foo, bar, baz) or aggregation functions like (MIN(foo), MAX(bar), AVG(baz)).

Filter Functions on Single Value/Multi-value

EQUALS
IN
NOT IN
GT
LT
BETWEEN
REGEXP_LIKE

For Multi-Valued columns, EQUALS is similar to CONTAINS.

Supported aggregations on single-value columns

COUNT
MIN
MAX
SUM
AVG
MINMAXRANGE
DISTINCT
DISTINCTCOUNT
DISTINCTCOUNTHLL
DISTINCTCOUNTRAWHLL: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
FASTHLL (WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format)
PERCENTILE[0-100]: e.g. PERCENTILE5, PERCENTILE50, PERCENTILE99, etc.
PERCENTILEEST[0-100]: e.g. PERCENTILEEST5, PERCENTILEEST50, PERCENTILEEST99, etc.

Supported aggregations on multi-value columns

COUNTMV
MINMV
MAXMV
SUMMV
AVGMV
MINMAXRANGEMV
DISTINCTCOUNTMV
DISTINCTCOUNTHLLMV
DISTINCTCOUNTRAWHLLMV: Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL (see pinot-core/**/HllUtil.java as an example) and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.
FASTHLLMV (WARN: will be deprecated soon. It does not make lots of sense to configure serialized HyperLogLog column as a dimension)
PERCENTILE[0-100]MV: e.g. PERCENTILE5MV, PERCENTILE50MV, PERCENTILE99MV, etc.
PERCENTILEEST[0-100]MV: e.g. PERCENTILEEST5MV, PERCENTILEEST50MV, PERCENTILEEST99MV, etc.

WHERE

Supported predicates are comparisons with a constant using the standard SQL operators (=, <, <=, >, >=, <>, ‘!=’) , range comparisons using BETWEEN (foo BETWEEN 42 AND 69), set membership (foo IN (1, 2, 4, 8)) and exclusion (foo NOT IN (1, 2, 4, 8)). For BETWEEN, the range is inclusive.

Comparison with a regular expression is supported using the regexp_like function, as in WHERE regexp_like(columnName, 'regular expression')

GROUP BY

The GROUP BY clause groups aggregation results by a list of columns, or transform functions on columns (see below)

ORDER BY

The ORDER BY clause orders selection results or group by results by a list of columns. PQL supports ordering DESC or ASC.

TOP

The TOP n clause causes the ‘n’ largest group results to be returned. If not specified, the top 10 groups are returned.

LIMIT

The LIMIT n clause causes the selection results to contain at most ‘n’ results. The LIMIT a, b clause paginate the selection results from the ‘a’ th results and return at most ‘b’ results. By default, 10 records are returned in the result.

Transform Function in Aggregation and Grouping

In aggregation and grouping, each column can be transformed from one or multiple columns. For example, the following query will calculate the maximum value of column foo divided by column bar grouping on the column time converted from time unit MILLISECONDS to SECONDS:

SELECT MAX(DIV(foo, bar) FROM myTable
  GROUP BY DATETIMECONVERT(time, '1:MILLISECONDS:EPOCH', '1:SECONDS:EPOCH', '1:SECONDS')

Supported transform functions

Function	Description
ADD	Sum of at least two values
SUB	Difference between two values
MULT	Product of at least two values
DIV	Quotient of two values
MOD	Modulo of two values
ABS	Absolute of a value
CEIL	Rounded up to the nearest integer.
FLOOR	Rounded down to the nearest integer.
EXP	exponential of
LN	Euler’s number raised to the power of x.
SQRT	Square root of a value
TIMECONVERT	Takes 3 arguments, converts the value into another time unit. Examples `TIMECONVERT(time, 'MILLISECONDS', 'SECONDS')` - This expression converts the value of column `time` (taken to be in milliseconds) to the nearest seconds (i.e. the nearest seconds that is lower than the value of `date` column)
DATETIMECONVERT	Takes 4 arguments, converts the value into another date time format, and buckets time based on the given time granularity. `DATETIMECONVERT(columnName, inputFormat, outputFormat, outputGranularity)`where, `columnName` - column name to convert `inputFormat` - format of the column columnName `outputFormat` - format of the result desired after conversion `outputGranularity` - the granularity in which to bucket the result Format is expressed as `<time size>:<time unit>:<time format>:<pattern>` where, `time size` - size of the time unit eg: 1, 10 `time unit` - HOURS, DAYS etc `time format` - EPOCH or SIMPLE_DATE_FORMAT `pattern` - this is defined in case of SIMPLE_DATE_FORMAT. eg: yyyyMMdd. A specific timezone can be passed using tz(timezone). `timezone` - can be expressed as long form tz(Asia/Kolkata), or short form tz(IST) or in terms of GMT tz(GMT+0530). Default is UTC. It is recommended to use long form timezone, as short forms are ambiguous with daylight savings (eg: PDT works during daylight savings, PST otherwise) Granularity is expressed as `<time size>:<time unit>` Examples 1) To convert column "Date" from hoursSinceEpoch to daysSinceEpoch and bucket it to 1 day granularity `dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:EPOCH', '1:DAYS')` 2) To simply bucket millis "Date" to 15 minutes granularity `dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '15:MINUTES')` 3) To convert column "Date" from hoursSinceEpoch to format yyyyMdd and bucket it to 1 days granularity `dateTimeConvert(Date, '1:HOURS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd', '1:DAYS')` 4) To convert column "Date" from format yyyy/MM/dd to weeksSinceEpoch and bucket it to 1 weeks granularity `dateTimeConvert(Date, '1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd', '1:WEEKS:EPOCH', '1:WEEKS')` 5) To convert column "Date" from millis to format yyyyMdd in timezone PST `dateTimeConvert(Date, '1:MILLISECONDS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd tz(America/Los_Angeles)', '1:DAYS')`
DATETRUNC	(Presto) SQL compatible date truncation, equivalent to the Presto function date_trunc. Takes at least 3 and upto 5 arguments, converts the value into a specified output granularity seconds since UTC epoch that is bucketed on a unit in a specified timezone. Examples `DATETRUNC('week', time_in_seconds, 'SECONDS')` This expression converts the column `time_in_seconds`, which is a long containing seconds since UTC epoch truncated at `WEEK` (where a Week starts at Monday UTC midnight). The output is a long seconds since UTC epoch. `DATETRUNC('quarter', DIV(time_milliseconds/1000), 'SECONDS', 'America/Los_Angeles', 'HOURS')` This expression converts the expression `time_in_milliseconds/1000` (which is thus in seconds) into hours that are truncated at `QUARTER` at the Los Angeles time zone (where a Quarter begins on 1/1, 4/1, 7/1, 10/1 in Los Angeles timezone). The output is expressed as hours since UTC epoch (note that the output is not Los Angeles timezone)
ARRAYLENGTH	Returns the length of a multi-value column
VALUEIN	Takes at least 2 arguments, where the first argument is a multi-valued column, and the following arguments are constant values. The transform function will filter the value from the multi-valued column with the given constant values. The `VALUEIN` transform function is especially useful when the same multi-valued column is both filtering column and grouping column. Examples `VALUEIN(mvColumn, 3, 5, 15)`
JSONEXTRACTSCALAR	`JSONEXTRACTSCALAR(jsonField, 'jsonPath', 'resultsType')`evaluates the `jsonPath` on `jsonField` (a string containing JSON) and returns the result as a type `resultsType` `jsonFieldName` is a String field with Json document. `jsonPath` is a JsonPath expression to read from JSON document `results_type` refers to the results data type, could be `INT`, `LONG`, `FLOAT`, `DOUBLE`, `STRING`, `INT_ARRAY`, `LONG_ARRAY`, `FLOAT_ARRAY`, `DOUBLE_ARRAY`, `STRING_ARRAY`. Examples `JSONEXTRACTSCALAR(profile_json_str, '$.name', 'STRING') -> "bob"` `JSONEXTRACTSCALAR(profile_json_str, '$.age', 'INT') -> 37`
JSONEXTRACTKEY	`JSONEXTRACTKEY(jsonField, 'jsonPath')` extracts all field names based on `jsonPath` as a `STRING_ARRAY.` `jsonFieldName` is a String field with Json document. `jsonPath` is a JsonPath expression to read from JSON document Examples `JSONEXTRACTSCALAR(profile_json_str, '$.*') -> ["name", "age", "phone"...]`
GROOVY	GROOVY('result value metadata json', ''groovy script', arg0, arg1, arg2...) This function will execute the groovy script using the arguments provided and return the result which matches the provided result value metadata. 1st argument - json string representing result value metadata. Must contain non-null keys `resultType` and `isSingleValue`. 2nd argument - groovy script string, which uses `arg0`, `arg1`, `arg2` etc to refer to the arguments provided within the script remaining arguments - pinot columns/other transform functions which are arguments to the groovy script Examples: (these are just illustrations, you can write practically any groovy script) `groovy('{"returnType":"INT","isSingleValue":true}', 'arg0 + arg1', colA, colB)` - adds colA and colB and returns a single-value INT `groovy('{"returnType":"INT","isSingleValue":true}', 'arg0.toList().max()', mvColumn)` - find the max element in mvColumn array and returns a single-value INT `groovy('{"returnType":"LONG","isSingleValue":false}', 'arg0.findIndexValues{ it > 5 }', mvColumn)` - find all elements of the array mvColumn and returns as a multi-value LONG column `groovy('{"returnType":"DOUBLE","isSingleValue":true}', 'arg0 * arg1', arraylength(mvColumn), colB)` - multiplies length of array mvColumn with colB and returns a single-value DOUBLE `groovy( '{"returnType":"DOUBLE","isSingleValue":true}', 'def x = 0; arg0.eachWithIndex{item, idx-> if (item == "foo") {x = x + arg1[idx] }}; return x' , mvColumnA, mvColumnB)` - finds all indexes in mvColumnA which have value `foo`, adds values at those indexes in mvColumnB `groovy('{\"returnType\":\"FLOAT\", \"isSingleValue\":true}', 'def result; switch(arg0.length()) { case 10: result = 1.1; break; case 20: result = 1.2; break; default: result = 1.3;}; return result.floatValue()', mvCol)` - switch case which returns a FLOAT value depending on length of mvCol array `groovy('new Date().format( "yyyyMMdd" )', '{"returnType":"STRING","isSingleValue":true}')` - groovy script which takes no arguments

Differences with SQL

{% hint style="info" %} These differences only apply to the PQL endpoint. They do not hold true for the standard-SQL endpoint, which is the recommended endpoint. More information about the two types of endpoints in Querying Pinot {% endhint %}

TOP works like LIMIT for truncation in group by queries
No need to select the columns to group with. The following two queries are both supported in PQL, where the non-aggregation columns are ignored.

SELECT MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50

SELECT bar, baz, MIN(foo), MAX(foo), SUM(foo), AVG(foo) FROM mytable
  GROUP BY bar, baz
  TOP 50

The results will always order by the aggregated value (descending). The results for query

SELECT MIN(foo), MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50

will be the same as the combining results from the following queries

SELECT MIN(foo) FROM myTable
  GROUP BY bar
  TOP 50
SELECT MAX(foo) FROM myTable
  GROUP BY bar
  TOP 50

where we don’t put the results for the same group together.

No support for ORDER BY in aggregation group by. However, ORDER BY support was added recently and is available in the standard-SQL endpoint. It can be used in the PQL endpoint by passing queryOptions into the payload as follows

{
  "pql" : "SELECT SUM(foo), SUM(bar) from myTable GROUP BY moo ORDER BY SUM(bar) ASC, moo DESC TOP 10",
  "queryOptions" : "groupByMode=sql;responseFormat=sql"
}

where,

groupByMode=sql - standard sql way of execution group by, hence accepting order by
responseFormat=sql - standard sql way of displaying results, in a tabular manner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pinot-query-language

pinot-query-language

README.md

Pinot Query Language (PQL)

PQL

PQL Limitations

PQL Examples

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

UDF

BYTES column

PQL Specification

SELECT

Filter Functions on Single Value/Multi-value

Supported aggregations on single-value columns

Supported aggregations on multi-value columns

WHERE

GROUP BY

ORDER BY

TOP

LIMIT

Transform Function in Aggregation and Grouping

Supported transform functions

Differences with SQL

Files

pinot-query-language

Directory actions

More options

Directory actions

More options

Latest commit

History

pinot-query-language

Folders and files

parent directory

README.md

Pinot Query Language (PQL)

PQL

PQL Limitations

PQL Examples

Aggregation

Grouping on Aggregation

Ordering on Aggregation

Filtering

Selection (Projection)

Ordering on Selection

Pagination on Selection

Wild-card match (in WHERE clause only)

UDF

BYTES column

PQL Specification

SELECT

Filter Functions on Single Value/Multi-value

Supported aggregations on single-value columns

Supported aggregations on multi-value columns

WHERE

GROUP BY

ORDER BY

TOP

LIMIT

Transform Function in Aggregation and Grouping

Supported transform functions

Differences with SQL