Skip to content

Commit

Permalink
支持Phoenix5.x版本读写插件
Browse files Browse the repository at this point in the history
  • Loading branch information
bake.snn committed Mar 11, 2019
1 parent d4d1ea6 commit db0333b
Show file tree
Hide file tree
Showing 25 changed files with 2,072 additions and 0 deletions.
164 changes: 164 additions & 0 deletions hbase20xsqlreader/doc/hbase20xsqlreader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# hbase20xsqlreader 插件文档


___



## 1 快速介绍

hbase20xsqlreader插件实现了从Phoenix(HBase SQL)读取数据,对应版本为HBase2.X和Phoenix5.X。

## 2 实现原理

简而言之,hbase20xsqlreader通过Phoenix轻客户端去连接Phoenix QueryServer,并根据用户配置信息生成查询SELECT 语句,然后发送到QueryServer读取HBase数据,并将返回结果使用DataX自定义的数据类型拼装为抽象的数据集,最终传递给下游Writer处理。

## 3 功能说明

### 3.1 配置样例

* 配置一个从Phoenix同步抽取数据到本地的作业:

```
{
"job": {
"content": [
{
"reader": {
"name": "hbase20xsqlreader", //指定插件为hbase20xsqlreader
"parameter": {
"queryServerAddress": "http://127.0.0.1:8765", //填写连接Phoenix QueryServer地址
"serialization": "PROTOBUF", //QueryServer序列化格式
"table": "TEST", //读取表名
"column": ["ID", "NAME"], //所要读取列名
"splitKey": "ID" //切分列,必须是表主键
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"encoding": "UTF-8",
"print": true
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}
```


### 3.2 参数说明

* **queryServerAddress**

* 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。

* 必选:是 <br />

* 默认值:无 <br />

* **serialization**

* 描述:QueryServer使用的序列化协议

* 必选:否 <br />

* 默认值:PROTOBUF <br />

* **table**

* 描述:所要读取表名

* 必选:是 <br />

* 默认值:无 <br />

* **schema**

* 描述:表所在的schema

* 必选:否 <br />

* 默认值:无 <br />
* **column**

* 描述:填写需要从phoenix表中读取的列名集合,使用JSON的数组描述字段信息,空值表示读取所有列。

* 必选: 否<br />

* 默认值:全部列 <br />

* **splitKey**

* 描述:读取表时对表进行切分并行读取,切分时有两种方式:1.根据该列的最大最小值按照指定channel个数均分,这种方式仅支持整形和字符串类型切分列;2.根据设置的splitPoint进行切分

* 必选:是 <br />

* 默认值:无 <br />

* **splitPoints**

* 描述:由于根据切分列最大最小值切分时不能保证避免数据热点,splitKey支持用户根据数据特征动态指定切分点,对表数据进行切分。建议切分点根据Region的startkey和endkey设置,保证每个查询对应单个Region

* 必选: 否<br />

* 默认值:无 <br />

* **where**

* 描述:支持对表查询增加过滤条件,每个切分都会携带该过滤条件。

* 必选: 否<br />

* 默认值:无<br />

* **querySql**

* 描述:支持指定多个查询语句,但查询列类型和数目必须保持一致,用户可根据实际情况手动输入表查询语句或多表联合查询语句,设置该参数后,除queryserverAddress参数必须设置外,其余参数将失去作用或可不设置。

* 必选: 否<br />

* 默认值:无<br />


### 3.3 类型转换

目前hbase20xsqlreader支持大部分Phoenix类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。

下面列出MysqlReader针对Mysql类型转换列表:


| DataX 内部类型| Phoenix 数据类型 |
| -------- | ----- |
| String |CHAR, VARCHAR|
| Bytes |BINARY, VARBINARY|
| Bool |BOOLEAN |
| Long |INTEGER, TINYINT, SMALLINT, BIGINT |
| Double |FLOAT, DECIMAL, DOUBLE, |
| Date |DATE, TIME, TIMESTAMP |



## 4 性能报告


## 5 约束限制

* 切分表时切分列仅支持单个列,且该列必须是表主键
* 不设置splitPoint默认使用自动切分,此时切分列仅支持整形和字符型
* 表名和SCHEMA名及列名大小写敏感,请与Phoenix表实际大小写保持一致
* 仅支持通过Phoenix QeuryServer读取数据,因此您的Phoenix必须启动QueryServer服务才能使用本插件

## 6 FAQ

***


116 changes: 116 additions & 0 deletions hbase20xsqlreader/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>datax-all</artifactId>
<groupId>com.alibaba.datax</groupId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>hbase20xsqlreader</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<properties>
<phoenix.version>5.0.0-HBase-2.0</phoenix.version>
</properties>

<dependencies>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-common</artifactId>
<version>${datax-project-version}</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-queryserver</artifactId>
<version>${phoenix.version}</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
</exclusions>
</dependency>


<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>2.0.44-beta</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-core</artifactId>
<version>${datax-project-version}</version>
<exclusions>
<exclusion>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-service-face</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>plugin-rdbms-util</artifactId>
<version>0.0.1-SNAPSHOT</version>
<scope>compile</scope>
</dependency>
</dependencies>

<build>
<resources>
<resource>
<directory>src/main/java</directory>
<includes>
<include>**/*.properties</include>
</includes>
</resource>
</resources>
<plugins>
<!-- compiler plugin -->
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>${project-sourceEncoding}</encoding>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>src/main/assembly/package.xml</descriptor>
</descriptors>
<finalName>datax</finalName>
</configuration>
<executions>
<execution>
<id>dwzip</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
35 changes: 35 additions & 0 deletions hbase20xsqlreader/src/main/assembly/package.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id></id>
<formats>
<format>dir</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>src/main/resources</directory>
<includes>
<include>plugin.json</include>
<include>plugin_job_template.json</include>
</includes>
<outputDirectory>plugin/reader/hbase20xsqlreader</outputDirectory>
</fileSet>
<fileSet>
<directory>target/</directory>
<includes>
<include>hbase20xsqlreader-0.0.1-SNAPSHOT.jar</include>
</includes>
<outputDirectory>plugin/reader/hbase20xsqlreader</outputDirectory>
</fileSet>
</fileSets>

<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<outputDirectory>plugin/reader/hbase20xsqlreader/libs</outputDirectory>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
</assembly>
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package com.alibaba.datax.plugin.reader.hbase20xsqlreader;

public class Constant {
public static final String PK_TYPE = "pkType";

public static final Object PK_TYPE_STRING = "pkTypeString";

public static final Object PK_TYPE_LONG = "pkTypeLong";

public static final String DEFAULT_SERIALIZATION = "PROTOBUF";

public static final String CONNECT_STRING_TEMPLATE = "jdbc:phoenix:thin:url=%s;serialization=%s";

public static final String CONNECT_DRIVER_STRING = "org.apache.phoenix.queryserver.client.Driver";

public static final String SELECT_COLUMNS_TEMPLATE = "SELECT COLUMN_NAME, COLUMN_FAMILY FROM SYSTEM.CATALOG WHERE TABLE_NAME='%s' AND COLUMN_NAME IS NOT NULL";

public static String QUERY_SQL_TEMPLATE_WITHOUT_WHERE = "select %s from %s ";

public static String QUERY_SQL_TEMPLATE = "select %s from %s where (%s)";

public static String QUERY_MIN_MAX_TEMPLATE = "SELECT MIN(%s),MAX(%s) FROM %s";

public static String QUERY_COLUMN_TYPE_TEMPLATE = "SELECT %s FROM %s LIMIT 1";

public static String QUERY_SQL_PER_SPLIT = "querySqlPerSplit";

}
Loading

0 comments on commit db0333b

Please sign in to comment.