Skip to content

Commit

Permalink
Merge pull request awslabs#180 from awslabs/kafka_ingestor
Browse files Browse the repository at this point in the history
ingesting data to kafka topic
  • Loading branch information
sethusrinivasan authored Mar 28, 2024
2 parents 3204ea4 + 93784ea commit eb72be8
Show file tree
Hide file tree
Showing 6 changed files with 2,485 additions and 15 deletions.
35 changes: 20 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
## Amazon Timestream Tools and Samples
## Amazon Timestream Tools and Samples

[Amazon Timestream](https://aws.amazon.com/timestream/) is a fast, scalable, fully managed, purpose-built time series database that makes it easy to store and
analyze trillions of time series data points per day. Amazon Timestream saves you time and cost in managing the
lifecycle of time series data by keeping recent data in memory and moving historical data to a cost optimized storage
tier based upon user defined policies. Amazon Timestream’s purpose-built query engine lets you access and analyze
analyze trillions of time series data points per day. Amazon Timestream saves you time and cost in managing the
lifecycle of time series data by keeping recent data in memory and moving historical data to a cost optimized storage
tier based upon user defined policies. Amazon Timestream’s purpose-built query engine lets you access and analyze
recent and historical data together, without having to specify its location. Amazon Timestream has built-in time series
analytics functions, helping you identify trends and patterns in your data in near real-time. Timestream is serverless
and automatically scales up or down to adjust capacity and performance. Because you don’t need to manage the
and automatically scales up or down to adjust capacity and performance. Because you don’t need to manage the
underlying infrastructure, you can focus on optimizing and building your applications.

Amazon Timestream also integrates with commonly used services for data collection, visualization, and machine learning.
You can send data to Amazon Timestream using AWS IoT Core, Amazon Kinesis, Amazon MSK, and open source Telegraf.
Amazon Timestream also integrates with commonly used services for data collection, visualization, and machine learning.
You can send data to Amazon Timestream using AWS IoT Core, Amazon Kinesis, Amazon MSK, and open source Telegraf.
You can visualize data using Amazon QuickSight, Grafana, and business intelligence tools through JDBC. You can also use
Amazon SageMaker with Amazon Timestream for machine learning. For more information on how to use Amazon Timestream see the [AWS documentation](https://docs.aws.amazon.com/timestream/latest/developerguide/index.html).

This repository contains sample applications, plugins, notebooks, data connectors, and adapters to help you get
started with Amazon Timestream and to enable you to use Amazon Timestream with other tools and services.

This repository contains sample applications, plugins, notebooks, data connectors, and adapters to help you get
started with Amazon Timestream and to enable you to use Amazon Timestream with other tools and services.

## Sample applications
This repository contains fully functional sample applications to help you get started with Amazon Timestream.

The getting started application shows how to create a database and table, populate the table with ~126K rows of sample data, and run sample queries.
This repository contains fully functional sample applications to help you get started with Amazon Timestream.

The getting started application shows how to create a database and table, populate the table with ~126K rows of sample data, and run sample queries.
This sample application is currently available for the following programming languages:

* [Getting started with Java](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/sample_apps/java/)
Expand All @@ -33,23 +33,28 @@ This sample application is currently available for the following programming lan
* [Getting started with .NET](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/sample_apps/dotnet/)

To query time series data using Amazon Timestream's JDBC driver, refer to the following application:
* [Querying data with JDBC](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/integrations/jdbc)

* [Querying data with JDBC](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/integrations/jdbc)

## Working with other tools and services

To continue to use your preferred data collection, analytics, visualization, and machine learning tools with Amazon Timestream, refer to the following:

* [Analyzing time series data with Amazon SageMaker Notebooks](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/integrations/sagemaker/)
* [Sending data to Amazon Timestream using AWS IoT Core](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/integrations/iot_core/)
* [Sending data to Amazon Timestream using open source Telegraf](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/integrations/telegraf/)
* [Sending data to Amazon Timestream using Apache Flink](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/integrations/flink_connector/)
* [Sending data to Amazon Timestream using Apache Kafka](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/integrations/kafka_connector/)
* [Writing and Querying with AWS SDK for pandas (awswrangler)](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/integrations/pandas/)


## Data ingestion and query tools
To understand the performance and scale capabilities of Amazon Timestream, you can run the following workload:

* [Running large scale workloads with Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/tools/perf-scale-workload/)

You can use the following tools to continuously send data to Amazon Timestream:
* [Publishing data with Amazon Kinesis to send to Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/tools/kinesis_ingestor/)
* [Multi-threaded continuous data generator for writing DevOps metrics into Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/tools/continuous-ingestor/)

* [Publishing data with Amazon Kinesis to send to Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/tools/python/kinesis_ingestor/)
* [Publishing data with Apache Kafka/ Amazon MSK to send to Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/blob/mainline/tools/python/kafka_ingestor/)
* [Multi-threaded continuous data generator for writing DevOps metrics into Amazon Timestream](https://github.com/awslabs/amazon-timestream-tools/tree/mainline/tools/python/continuous-ingestor/)
195 changes: 195 additions & 0 deletions tools/java/kafka_ingestor/Purchase_History_Publishing.jmx
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.6">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Test Plan">
<boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables">
<collectionProp name="Arguments.arguments"/>
</elementProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Puchase History Thread Group">
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller">
<intProp name="LoopController.loops">-1</intProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">1</stringProp>
<stringProp name="ThreadGroup.ramp_time">1</stringProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">300</stringProp>
<stringProp name="ThreadGroup.delay"></stringProp>
</ThreadGroup>
<hashTree>
<LoopController guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller">
<boolProp name="LoopController.continue_forever">true</boolProp>
<stringProp name="LoopController.loops">1</stringProp>
</LoopController>
<hashTree>
<JSR223Sampler guiclass="TestBeanGUI" testclass="JSR223Sampler" testname="Purchase History Ingestor">
<stringProp name="cacheKey">false</stringProp>
<stringProp name="filename"></stringProp>
<stringProp name="parameters">${channel} ${ip_address} ${session_id} ${user_id} ${event} ${user_group} ${current_time} ${query} ${product_id} ${product} ${quantityrequired}</stringProp>
<stringProp name="script">import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerRecord
import groovy.json.JsonOutput

try {
//Set connection properties
Properties properties = new Properties()
properties.put(&apos;bootstrap.servers&apos;, props.get(&apos;bootstrapServer&apos;))
properties.put(&apos;key.serializer&apos;, &apos;org.apache.kafka.common.serialization.StringSerializer&apos;)
properties.put(&apos;value.serializer&apos;, &apos;org.apache.kafka.common.serialization.StringSerializer&apos;)
properties.put(&apos;security.protocol&apos;, &apos;SASL_SSL&apos;)
properties.put(&apos;sasl.mechanism&apos;, &apos;AWS_MSK_IAM&apos;)
properties.put(&apos;sasl.jaas.config&apos;, &apos;software.amazon.msk.auth.iam.IAMLoginModule required;&apos;)
properties.put(&apos;sasl.client.callback.handler.class&apos;, &apos;software.amazon.msk.auth.iam.IAMClientCallbackHandler&apos;)
//Establish Connection
KafkaProducer &lt; String, String &gt; producer = new KafkaProducer &lt; String, String &gt; (properties)
int fail = 0
try {
message = JsonOutput.toJson([current_time: new Date().getTime(),
channel: vars.get(&apos;channel&apos;), ip_address: vars.get(&apos;ip_address&apos;),
session_id: vars.get(&apos;session_id&apos;),
user_id: vars.get(&apos;user_id&apos;),
event: vars.get(&apos;event&apos;), user_group: vars.get(&apos;user_group&apos;),
query: vars.get(&apos;query&apos;), product_id: vars.get(&apos;product_id&apos;),
product: vars.get(&apos;product&apos;), quantityrequired: vars.get(&apos;quantityrequired&apos;)])
log.info(&apos;Message to be published--&gt;: &apos; + message)
ProducerRecord &lt; String, String &gt; producerRecord =
new ProducerRecord &lt; String, String &gt;(props.get(&apos;topic&apos;), vars.get(&apos;user_id&apos;), message)
producer.send(producerRecord)
log.info(&apos;Published message&apos;)
producerRecord = null
} catch (Exception e) {
log.error(&apos;Error Publishing: &apos; + e)
fail = fail + 1
}
//Set results as per executions
if (fail &gt; 0) {
SampleResult.setErrorCount(fail)
SampleResult.setSuccessful(false)
} else {
SampleResult.setSuccessful(true)
}
} catch (Exception e) {
log.error(&apos;ERROR: &apos;, e)
SampleResult.setSuccessful(false)
} finally {
producer.close()
producer = null
props = null
}
</stringProp>
<stringProp name="scriptLanguage">groovy</stringProp>
</JSR223Sampler>
<hashTree/>
<CSVDataSet guiclass="TestBeanGUI" testclass="CSVDataSet" testname="Purchase History CSV File">
<stringProp name="delimiter">,</stringProp>
<stringProp name="fileEncoding">UTF-8</stringProp>
<stringProp name="filename">purchase_history.csv</stringProp>
<boolProp name="ignoreFirstLine">true</boolProp>
<boolProp name="quotedData">false</boolProp>
<boolProp name="recycle">true</boolProp>
<stringProp name="shareMode">shareMode.all</stringProp>
<boolProp name="stopThread">false</boolProp>
<stringProp name="variableNames">channel,ip_address,session_id,user_id,event,user_group,current_time,query,product_id,product,quantity</stringProp>
</CSVDataSet>
<hashTree/>
</hashTree>
<ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
<JSR223PreProcessor guiclass="TestBeanGUI" testclass="JSR223PreProcessor" testname="UDV Check">
<stringProp name="cacheKey">true</stringProp>
<stringProp name="filename"></stringProp>
<stringProp name="parameters">${topic} ${bootstrap_server}</stringProp>
<stringProp name="script">topic = vars.get(&apos;topic&apos;)
bootstrapServer = vars.get(&apos;bootstrap_server&apos;)
stop = false
if (topic == null || &apos;null&apos;.equals(topic)) {
log.error(&apos;ERROR: topic is not passed in the arguments&apos;)
println(&apos;ERROR: topic is not passed in the arguments&apos;)
stop = true
} else {
log.info(&apos;INFO: topic is set to {}&apos;, topic)
props.put(&apos;topic&apos;, topic)
}

if (bootstrapServer == null || &apos;null&apos;.equals(bootstrapServer)) {
println(&apos;ERROR: bootstrapServer is not passed in the arguments&apos;)
log.error(&apos;ERROR: bootstrapServer is not passed in the arguments&apos;)
stop = true
} else {
log.info(&apos;INFO: bootstrapServer is set to {}&apos;, bootstrapServer)
try {
String[] splits = bootstrapServer.split(&apos;:&apos;)
String uri = splits[0]
String port = Integer.valueOf(splits[1])
log.info(&apos;INFO: URI: [{}], port: [{}]&apos;, uri, port)
props.put(&apos;bootstrapServer&apos;, bootstrapServer)
} catch (Throwable e) {
stop = true
println(&apos;ERROR: bootstrapServer &apos; + bootstrapServer +
&apos; is NOT valid. It must be of format HOSTNAME:PORT&apos;)
log.error(&apos;ERROR: bootstrapServer {} is NOT valid. It must be of format HOSTNAME:PORT&apos;, bootstrapServer)
}
}
if (stop == true) {
prev.setStopTestNow(true)
exit(1)
}</stringProp>
<stringProp name="scriptLanguage">groovy</stringProp>
</JSR223PreProcessor>
<hashTree/>
<Arguments guiclass="ArgumentsPanel" testclass="Arguments" testname="UDV">
<collectionProp name="Arguments.arguments">
<elementProp name="topic" elementType="Argument">
<stringProp name="Argument.name">topic</stringProp>
<stringProp name="Argument.value">${__P(topic,null)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
<elementProp name="bootstrap_server" elementType="Argument">
<stringProp name="Argument.name">bootstrap_server</stringProp>
<stringProp name="Argument.value">${__P(bootstrap_server,null)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
</collectionProp>
</Arguments>
<hashTree/>
</hashTree>
</hashTree>
</jmeterTestPlan>
Loading

0 comments on commit eb72be8

Please sign in to comment.