forked from alibaba/DataX
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
bake.snn
committed
Mar 11, 2019
1 parent
d4d1ea6
commit db0333b
Showing
25 changed files
with
2,072 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# hbase20xsqlreader 插件文档 | ||
|
||
|
||
___ | ||
|
||
|
||
|
||
## 1 快速介绍 | ||
|
||
hbase20xsqlreader插件实现了从Phoenix(HBase SQL)读取数据,对应版本为HBase2.X和Phoenix5.X。 | ||
|
||
## 2 实现原理 | ||
|
||
简而言之,hbase20xsqlreader通过Phoenix轻客户端去连接Phoenix QueryServer,并根据用户配置信息生成查询SELECT 语句,然后发送到QueryServer读取HBase数据,并将返回结果使用DataX自定义的数据类型拼装为抽象的数据集,最终传递给下游Writer处理。 | ||
|
||
## 3 功能说明 | ||
|
||
### 3.1 配置样例 | ||
|
||
* 配置一个从Phoenix同步抽取数据到本地的作业: | ||
|
||
``` | ||
{ | ||
"job": { | ||
"content": [ | ||
{ | ||
"reader": { | ||
"name": "hbase20xsqlreader", //指定插件为hbase20xsqlreader | ||
"parameter": { | ||
"queryServerAddress": "http://127.0.0.1:8765", //填写连接Phoenix QueryServer地址 | ||
"serialization": "PROTOBUF", //QueryServer序列化格式 | ||
"table": "TEST", //读取表名 | ||
"column": ["ID", "NAME"], //所要读取列名 | ||
"splitKey": "ID" //切分列,必须是表主键 | ||
} | ||
}, | ||
"writer": { | ||
"name": "streamwriter", | ||
"parameter": { | ||
"encoding": "UTF-8", | ||
"print": true | ||
} | ||
} | ||
} | ||
], | ||
"setting": { | ||
"speed": { | ||
"channel": "3" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
|
||
### 3.2 参数说明 | ||
|
||
* **queryServerAddress** | ||
|
||
* 描述:hbase20xsqlreader需要通过Phoenix轻客户端去连接Phoenix QueryServer,因此这里需要填写对应QueryServer地址。 | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **serialization** | ||
|
||
* 描述:QueryServer使用的序列化协议 | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:PROTOBUF <br /> | ||
|
||
* **table** | ||
|
||
* 描述:所要读取表名 | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **schema** | ||
|
||
* 描述:表所在的schema | ||
|
||
* 必选:否 <br /> | ||
|
||
* 默认值:无 <br /> | ||
* **column** | ||
|
||
* 描述:填写需要从phoenix表中读取的列名集合,使用JSON的数组描述字段信息,空值表示读取所有列。 | ||
|
||
* 必选: 否<br /> | ||
|
||
* 默认值:全部列 <br /> | ||
|
||
* **splitKey** | ||
|
||
* 描述:读取表时对表进行切分并行读取,切分时有两种方式:1.根据该列的最大最小值按照指定channel个数均分,这种方式仅支持整形和字符串类型切分列;2.根据设置的splitPoint进行切分 | ||
|
||
* 必选:是 <br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **splitPoints** | ||
|
||
* 描述:由于根据切分列最大最小值切分时不能保证避免数据热点,splitKey支持用户根据数据特征动态指定切分点,对表数据进行切分。建议切分点根据Region的startkey和endkey设置,保证每个查询对应单个Region | ||
|
||
* 必选: 否<br /> | ||
|
||
* 默认值:无 <br /> | ||
|
||
* **where** | ||
|
||
* 描述:支持对表查询增加过滤条件,每个切分都会携带该过滤条件。 | ||
|
||
* 必选: 否<br /> | ||
|
||
* 默认值:无<br /> | ||
|
||
* **querySql** | ||
|
||
* 描述:支持指定多个查询语句,但查询列类型和数目必须保持一致,用户可根据实际情况手动输入表查询语句或多表联合查询语句,设置该参数后,除queryserverAddress参数必须设置外,其余参数将失去作用或可不设置。 | ||
|
||
* 必选: 否<br /> | ||
|
||
* 默认值:无<br /> | ||
|
||
|
||
### 3.3 类型转换 | ||
|
||
目前hbase20xsqlreader支持大部分Phoenix类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。 | ||
|
||
下面列出MysqlReader针对Mysql类型转换列表: | ||
|
||
|
||
| DataX 内部类型| Phoenix 数据类型 | | ||
| -------- | ----- | | ||
| String |CHAR, VARCHAR| | ||
| Bytes |BINARY, VARBINARY| | ||
| Bool |BOOLEAN | | ||
| Long |INTEGER, TINYINT, SMALLINT, BIGINT | | ||
| Double |FLOAT, DECIMAL, DOUBLE, | | ||
| Date |DATE, TIME, TIMESTAMP | | ||
|
||
|
||
|
||
## 4 性能报告 | ||
|
||
略 | ||
|
||
## 5 约束限制 | ||
|
||
* 切分表时切分列仅支持单个列,且该列必须是表主键 | ||
* 不设置splitPoint默认使用自动切分,此时切分列仅支持整形和字符型 | ||
* 表名和SCHEMA名及列名大小写敏感,请与Phoenix表实际大小写保持一致 | ||
* 仅支持通过Phoenix QeuryServer读取数据,因此您的Phoenix必须启动QueryServer服务才能使用本插件 | ||
|
||
## 6 FAQ | ||
|
||
*** | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<parent> | ||
<artifactId>datax-all</artifactId> | ||
<groupId>com.alibaba.datax</groupId> | ||
<version>0.0.1-SNAPSHOT</version> | ||
</parent> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<artifactId>hbase20xsqlreader</artifactId> | ||
<version>0.0.1-SNAPSHOT</version> | ||
<packaging>jar</packaging> | ||
|
||
<properties> | ||
<phoenix.version>5.0.0-HBase-2.0</phoenix.version> | ||
</properties> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>datax-common</artifactId> | ||
<version>${datax-project-version}</version> | ||
<exclusions> | ||
<exclusion> | ||
<artifactId>slf4j-log4j12</artifactId> | ||
<groupId>org.slf4j</groupId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.apache.phoenix</groupId> | ||
<artifactId>phoenix-queryserver</artifactId> | ||
<version>${phoenix.version}</version> | ||
<exclusions> | ||
<exclusion> | ||
<artifactId>servlet-api</artifactId> | ||
<groupId>javax.servlet</groupId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
|
||
|
||
<dependency> | ||
<groupId>junit</groupId> | ||
<artifactId>junit</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.mockito</groupId> | ||
<artifactId>mockito-core</artifactId> | ||
<version>2.0.44-beta</version> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>datax-core</artifactId> | ||
<version>${datax-project-version}</version> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>datax-service-face</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>com.alibaba.datax</groupId> | ||
<artifactId>plugin-rdbms-util</artifactId> | ||
<version>0.0.1-SNAPSHOT</version> | ||
<scope>compile</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<build> | ||
<resources> | ||
<resource> | ||
<directory>src/main/java</directory> | ||
<includes> | ||
<include>**/*.properties</include> | ||
</includes> | ||
</resource> | ||
</resources> | ||
<plugins> | ||
<!-- compiler plugin --> | ||
<plugin> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<configuration> | ||
<source>1.6</source> | ||
<target>1.6</target> | ||
<encoding>${project-sourceEncoding}</encoding> | ||
</configuration> | ||
</plugin> | ||
<plugin> | ||
<artifactId>maven-assembly-plugin</artifactId> | ||
<configuration> | ||
<descriptors> | ||
<descriptor>src/main/assembly/package.xml</descriptor> | ||
</descriptors> | ||
<finalName>datax</finalName> | ||
</configuration> | ||
<executions> | ||
<execution> | ||
<id>dwzip</id> | ||
<phase>package</phase> | ||
<goals> | ||
<goal>single</goal> | ||
</goals> | ||
</execution> | ||
</executions> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
<assembly | ||
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd"> | ||
<id></id> | ||
<formats> | ||
<format>dir</format> | ||
</formats> | ||
<includeBaseDirectory>false</includeBaseDirectory> | ||
<fileSets> | ||
<fileSet> | ||
<directory>src/main/resources</directory> | ||
<includes> | ||
<include>plugin.json</include> | ||
<include>plugin_job_template.json</include> | ||
</includes> | ||
<outputDirectory>plugin/reader/hbase20xsqlreader</outputDirectory> | ||
</fileSet> | ||
<fileSet> | ||
<directory>target/</directory> | ||
<includes> | ||
<include>hbase20xsqlreader-0.0.1-SNAPSHOT.jar</include> | ||
</includes> | ||
<outputDirectory>plugin/reader/hbase20xsqlreader</outputDirectory> | ||
</fileSet> | ||
</fileSets> | ||
|
||
<dependencySets> | ||
<dependencySet> | ||
<useProjectArtifact>false</useProjectArtifact> | ||
<outputDirectory>plugin/reader/hbase20xsqlreader/libs</outputDirectory> | ||
<scope>runtime</scope> | ||
</dependencySet> | ||
</dependencySets> | ||
</assembly> |
28 changes: 28 additions & 0 deletions
28
...0xsqlreader/src/main/java/com/alibaba/datax/plugin/reader/hbase20xsqlreader/Constant.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
package com.alibaba.datax.plugin.reader.hbase20xsqlreader; | ||
|
||
public class Constant { | ||
public static final String PK_TYPE = "pkType"; | ||
|
||
public static final Object PK_TYPE_STRING = "pkTypeString"; | ||
|
||
public static final Object PK_TYPE_LONG = "pkTypeLong"; | ||
|
||
public static final String DEFAULT_SERIALIZATION = "PROTOBUF"; | ||
|
||
public static final String CONNECT_STRING_TEMPLATE = "jdbc:phoenix:thin:url=%s;serialization=%s"; | ||
|
||
public static final String CONNECT_DRIVER_STRING = "org.apache.phoenix.queryserver.client.Driver"; | ||
|
||
public static final String SELECT_COLUMNS_TEMPLATE = "SELECT COLUMN_NAME, COLUMN_FAMILY FROM SYSTEM.CATALOG WHERE TABLE_NAME='%s' AND COLUMN_NAME IS NOT NULL"; | ||
|
||
public static String QUERY_SQL_TEMPLATE_WITHOUT_WHERE = "select %s from %s "; | ||
|
||
public static String QUERY_SQL_TEMPLATE = "select %s from %s where (%s)"; | ||
|
||
public static String QUERY_MIN_MAX_TEMPLATE = "SELECT MIN(%s),MAX(%s) FROM %s"; | ||
|
||
public static String QUERY_COLUMN_TYPE_TEMPLATE = "SELECT %s FROM %s LIMIT 1"; | ||
|
||
public static String QUERY_SQL_PER_SPLIT = "querySqlPerSplit"; | ||
|
||
} |
Oops, something went wrong.