Skip to content

Commit

Permalink
Merge branch 'main' into feature/codeql
Browse files Browse the repository at this point in the history
  • Loading branch information
aleks-ivanov committed Jan 14, 2025
2 parents ed9476f + 91212b9 commit 9f7e852
Show file tree
Hide file tree
Showing 196 changed files with 7,950 additions and 4,365 deletions.
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ bld/

# Visual Studio 2015/2017 cache/options directory
.vs/
# Visual Studio Code configuration directory
.vscode/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/

Expand Down Expand Up @@ -185,6 +187,7 @@ PublishScripts/
*.snupkg
# The packages folder can be ignored because of Package Restore
**/[Pp]ackages/*
.packages/*
# except build/, which is used as an MSBuild target.
!**/[Pp]ackages/build/
# Uncomment if necessary however generally it will be regenerated when needed
Expand Down Expand Up @@ -372,4 +375,9 @@ hs_err_pid*
.ionide/

# Mac dev
.DS_Store
.DS_Store

# Scala intermideate build files
**/.bloop/
**/.metals/
**/.bsp/**
6 changes: 4 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
MIT License
The MIT License (MIT)

Copyright (c) 2019 .NET Foundation
Copyright (c) .NET Foundation and Contributors

All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 1 addition & 1 deletion NuGet.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
<add key="dotnet-public" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-public/nuget/v3/index.json" />
<add key="dotnet-tools" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-tools/nuget/v3/index.json" />
<add key="dotnet-eng" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-eng/nuget/v3/index.json" />
<add key="dotnet5" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json" />
<add key="dotnet8" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/index.json" />
</packageSources>
</configuration>
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

.NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 6, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) & [Azure](deployment/README.md#databricks) Databricks.
.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 8, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) & [Azure](deployment/README.md#databricks) Databricks.

**Note**: We currently have a Spark Project Improvement Proposal JIRA at [SPIP: .NET bindings for Apache Spark](https://issues.apache.org/jira/browse/SPARK-27006) to work with the community towards getting .NET support by default into Apache Spark. We highly encourage you to participate in the discussion.

Expand Down Expand Up @@ -40,7 +40,7 @@
<tbody align="center">
<tr>
<td>2.4*</td>
<td rowspan=4><a href="https://github.com/dotnet/spark/releases/tag/v2.1.1">v2.1.1</a></td>
<td rowspan=5><a href="https://github.com/dotnet/spark/releases/tag/v2.1.1">v2.1.1</a></td>
</tr>
<tr>
<td>3.0</td>
Expand All @@ -50,6 +50,9 @@
</tr>
<tr>
<td>3.2</td>
</tr>
<tr>
<td>3.5</td>
</tr>
</tbody>
</table>
Expand All @@ -61,7 +64,7 @@
.NET for Apache Spark releases are available [here](https://github.com/dotnet/spark/releases) and NuGet packages are available [here](https://www.nuget.org/packages/Microsoft.Spark).

## Get Started
These instructions will show you how to run a .NET for Apache Spark app using .NET 6.
These instructions will show you how to run a .NET for Apache Spark app using .NET 8.
- [Windows Instructions](docs/getting-started/windows-instructions.md)
- [Ubuntu Instructions](docs/getting-started/ubuntu-instructions.md)
- [MacOs Instructions](docs/getting-started/macos-instructions.md)
Expand All @@ -79,8 +82,8 @@ Building from source is very easy and the whole process (from cloning to being a

| | | Instructions |
| :---: | :--- | :--- |
| ![Windows icon](docs/img/windows-icon-32.png) | **Windows** | <ul><li>Local - [.NET Framework 4.6.1](docs/building/windows-instructions.md#using-visual-studio-for-net-framework-461)</li><li>Local - [.NET 6](docs/building/windows-instructions.md#using-net-core-cli-for-net-core)</li><ul> |
| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | **Ubuntu** | <ul><li>Local - [.NET 6](docs/building/ubuntu-instructions.md)</li><li>[Azure HDInsight Spark - .NET 6](deployment/README.md)</li></ul> |
| ![Windows icon](docs/img/windows-icon-32.png) | **Windows** | <ul><li>Local - [.NET Framework 4.8](docs/building/windows-instructions.md#using-visual-studio-for-net-framework)</li><li>Local - [.NET 8](docs/building/windows-instructions.md#using-net-core-cli-for-net-core)</li><ul> |
| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | **Ubuntu** | <ul><li>Local - [.NET 8](docs/building/ubuntu-instructions.md)</li><li>[Azure HDInsight Spark - .NET 8](deployment/README.md)</li></ul> |

<a name="samples"></a>
## Samples
Expand Down
6 changes: 3 additions & 3 deletions azure-pipelines-e2e-tests-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ stages:
mvn -version
- task: UseDotNet@2
displayName: 'Use .NET 6 sdk'
displayName: 'Use .NET 8 sdk'
inputs:
packageType: sdk
version: 6.x
version: 8.x
installationPath: $(Agent.ToolsDirectory)/dotnet

- task: DownloadPipelineArtifact@2
Expand All @@ -78,7 +78,7 @@ stages:
artifactName: Microsoft.Spark.Binaries

- pwsh: |
$framework = "net6.0"
$framework = "net8.0"
if ($env:AGENT_OS -eq 'Windows_NT') {
$runtimeIdentifier = "win-x64"
Expand Down
1 change: 0 additions & 1 deletion azure-pipelines-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,6 @@ extends:
- script: build.cmd -pack
-c $(buildConfiguration)
-ci
$(_OfficialBuildIdArgs)
/p:PublishSparkWorker=true
/p:SparkWorkerPublishDir=$(Build.ArtifactStagingDirectory)\Microsoft.Spark.Worker
displayName: '.NET build'
Expand Down
2 changes: 2 additions & 0 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,8 @@ stages:
variables:
${{ if and(ne(variables['System.TeamProject'], 'public'), notin(variables['Build.Reason'], 'PullRequest')) }}:
_OfficialBuildIdArgs: /p:OfficialBuildId=$(BUILD.BUILDNUMBER)
${{ else }}:
_OfficialBuildIdArgs: ''

steps:
- task: DownloadBuildArtifacts@0
Expand Down
2 changes: 1 addition & 1 deletion benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ TPCH timing results is written to stdout in the following form: `TPCH_Result,<la
<true for sql tests, false for functional tests>
```
**Note**: Ensure that you build the worker and application with .NET 6 in order to run hardware acceleration queries.
**Note**: Ensure that you build the worker and application with .NET 8 in order to run hardware acceleration queries.
## Python
1. Upload [run_python_benchmark.sh](run_python_benchmark.sh) and all [python tpch benchmark](python/) files to the cluster.
Expand Down
6 changes: 3 additions & 3 deletions benchmark/csharp/Tpch/Tpch.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>net461;net6.0</TargetFrameworks>
<TargetFrameworks Condition="'$(OS)' != 'Windows_NT'">net6.0</TargetFrameworks>
<TargetFrameworks>net48;net8.0</TargetFrameworks>
<TargetFrameworks Condition="'$(OS)' != 'Windows_NT'">net8.0</TargetFrameworks>
<RootNamespace>Tpch</RootNamespace>
<AssemblyName>Tpch</AssemblyName>
</PropertyGroup>
Expand All @@ -16,7 +16,7 @@
</ItemGroup>

<Choose>
<When Condition="'$(TargetFramework)' == 'net6.0'">
<When Condition="'$(TargetFramework)' == 'net8.0'">
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>
Expand Down
6 changes: 3 additions & 3 deletions deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Microsoft.Spark.Worker is a backend component that lives on the individual worke
## Azure HDInsight Spark
[Azure HDInsight Spark](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-overview) is the Microsoft implementation of Apache Spark in the cloud that allows users to launch and configure Spark clusters in Azure. You can use HDInsight Spark clusters to process your data stored in Azure (e.g., [Azure Storage](https://azure.microsoft.com/en-us/services/storage/) and [Azure Data Lake Storage](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)).

> **Note:** Azure HDInsight Spark is Linux-based. Therefore, if you are interested in deploying your app to Azure HDInsight Spark, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** Azure HDInsight Spark is Linux-based. Therefore, if you are interested in deploying your app to Azure HDInsight Spark, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
### Deploy Microsoft.Spark.Worker
*Note that this step is required only once*
Expand Down Expand Up @@ -115,7 +115,7 @@ EOF
## Amazon EMR Spark
[Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html) is a managed cluster platform that simplifies running big data frameworks on AWS.
> **Note:** AWS EMR Spark is Linux-based. Therefore, if you are interested in deploying your app to AWS EMR Spark, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** AWS EMR Spark is Linux-based. Therefore, if you are interested in deploying your app to AWS EMR Spark, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
### Deploy Microsoft.Spark.Worker
*Note that this step is only required at cluster creation*
Expand Down Expand Up @@ -160,7 +160,7 @@ foo@bar:~$ aws emr add-steps \
## Databricks
[Databricks](http://databricks.com) is a platform that provides cloud-based big data processing using Apache Spark.
> **Note:** [Azure](https://azure.microsoft.com/en-us/services/databricks/) and [AWS](https://databricks.com/aws) Databricks is Linux-based. Therefore, if you are interested in deploying your app to Databricks, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** [Azure](https://azure.microsoft.com/en-us/services/databricks/) and [AWS](https://databricks.com/aws) Databricks is Linux-based. Therefore, if you are interested in deploying your app to Databricks, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
Databricks allows you to submit Spark .NET apps to an existing active cluster or create a new cluster everytime you launch a job. This requires the **Microsoft.Spark.Worker** to be installed **first** before you submit a Spark .NET app.
Expand Down
39 changes: 21 additions & 18 deletions docs/building/ubuntu-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@ Building Spark .NET on Ubuntu 18.04
==========================

# Table of Contents
- [Open Issues](#open-issues)
- [Pre-requisites](#pre-requisites)
- [Building Spark .NET on Ubuntu 18.04](#building-spark-net-on-ubuntu-1804)
- [Table of Contents](#table-of-contents)
- [Open Issues:](#open-issues)
- [Pre-requisites:](#pre-requisites)
- [Building](#building)
- [Building Spark .NET Scala Extensions Layer](#building-spark-net-scala-extensions-layer)
- [Building .NET Sample Applications using .NET Core CLI](#building-net-sample-applications-using-net-core-cli)
- [Building .NET Sample Applications using .NET 8 CLI](#building-net-sample-applications-using-net-8-cli)
- [Run Samples](#run-samples)

# Open Issues:
Expand All @@ -16,7 +18,7 @@ Building Spark .NET on Ubuntu 18.04

If you already have all the pre-requisites, skip to the [build](ubuntu-instructions.md#building) steps below.

1. Download and install **[.NET 6 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/6.0)** - installing the SDK will add the `dotnet` toolchain to your path.
1. Download and install **[.NET 8 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/8.0)** - installing the SDK will add the `dotnet` toolchain to your path.
2. Install **[OpenJDK 8](https://openjdk.java.net/install/)**
- You can use the following command:
```bash
Expand All @@ -40,7 +42,7 @@ If you already have all the pre-requisites, skip to the [build](ubuntu-instructi
```bash
mkdir -p ~/bin/maven
cd ~/bin/maven
wget https://dlcdn.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
wget https://archive.apache.org/dist/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
tar -xvzf apache-maven-3.6.3-bin.tar.gz
ln -s apache-maven-3.6.3 current
export M2_HOME=~/bin/maven/current
Expand Down Expand Up @@ -110,65 +112,66 @@ Let us now build the Spark .NET Scala extension layer. This is easy to do:
```
cd src/scala
mvn clean package
mvn clean package
```
You should see JARs created for the supported Spark versions:
* `microsoft-spark-2-3/target/microsoft-spark-2-3_2.11-<version>.jar`
* `microsoft-spark-2-4/target/microsoft-spark-2-4_2.11-<version>.jar`
* `microsoft-spark-3-0/target/microsoft-spark-3-0_2.12-<version>.jar`
* `microsoft-spark-3-0/target/microsoft-spark-3-5_2.12-<version>.jar`
## Building .NET Sample Applications using .NET 6 CLI
## Building .NET Sample Applications using .NET 8 CLI
1. Build the Worker
```bash
cd ~/dotnet.spark/src/csharp/Microsoft.Spark.Worker/
dotnet publish -f net6.0 -r linux-x64
dotnet publish -f net8.0 -r linux-x64
```
<details>
<summary>&#x1F4D9; Click to see sample console output</summary>
```bash
user@machine:/home/user/dotnet.spark/src/csharp/Microsoft.Spark.Worker$ dotnet publish -f net6.0 -r linux-x64
user@machine:/home/user/dotnet.spark/src/csharp/Microsoft.Spark.Worker$ dotnet publish -f net8.0 -r linux-x64
Microsoft (R) Build Engine version 16.0.462+g62fb89029d for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
Restore completed in 36.03 ms for /home/user/dotnet.spark/src/csharp/Microsoft.Spark.Worker/Microsoft.Spark.Worker.csproj.
Restore completed in 35.94 ms for /home/user/dotnet.spark/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj.
Microsoft.Spark -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark/Debug/netstandard2.0/Microsoft.Spark.dll
Microsoft.Spark.Worker -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net6.0/linux-x64/Microsoft.Spark.Worker.dll
Microsoft.Spark.Worker -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net6.0/linux-x64/publish/
Microsoft.Spark.Worker -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net8.0/linux-x64/Microsoft.Spark.Worker.dll
Microsoft.Spark.Worker -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net8.0/linux-x64/publish/
```
</details>
2. Build the Samples
```bash
cd ~/dotnet.spark/examples/Microsoft.Spark.CSharp.Examples/
dotnet publish -f net6.0 -r linux-x64
dotnet publish -f net8.0 -r linux-x64
```
<details>
<summary>&#x1F4D9; Click to see sample console output</summary>
```bash
user@machine:/home/user/dotnet.spark/examples/Microsoft.Spark.CSharp.Examples$ dotnet publish -f net6.0 -r linux-x64
user@machine:/home/user/dotnet.spark/examples/Microsoft.Spark.CSharp.Examples$ dotnet publish -f net8.0 -r linux-x64
Microsoft (R) Build Engine version 16.0.462+g62fb89029d for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
Restore completed in 37.11 ms for /home/user/dotnet.spark/src/csharp/Microsoft.Spark/Microsoft.Spark.csproj.
Restore completed in 281.63 ms for /home/user/dotnet.spark/examples/Microsoft.Spark.CSharp.Examples/Microsoft.Spark.CSharp.Examples.csproj.
Microsoft.Spark -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark/Debug/netstandard2.0/Microsoft.Spark.dll
Microsoft.Spark.CSharp.Examples -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net6.0/linux-x64/Microsoft.Spark.CSharp.Examples.dll
Microsoft.Spark.CSharp.Examples -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net6.0/linux-x64/publish/
Microsoft.Spark.CSharp.Examples -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net8.0/linux-x64/Microsoft.Spark.CSharp.Examples.dll
Microsoft.Spark.CSharp.Examples -> /home/user/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net8.0/linux-x64/publish/
```
</details>
# Run Samples
Once you build the samples, you can use `spark-submit` to submit your .NET 6 apps. Make sure you have followed the [pre-requisites](#pre-requisites) section and installed Apache Spark.
Once you build the samples, you can use `spark-submit` to submit your .NET 8 apps. Make sure you have followed the [pre-requisites](#pre-requisites) section and installed Apache Spark.
1. Set the `DOTNET_WORKER_DIR` or `PATH` environment variable to include the path where the `Microsoft.Spark.Worker` binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net6.0/linux-x64/publish`)
2. Open a terminal and go to the directory where your app binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net6.0/linux-x64/publish`)
1. Set the `DOTNET_WORKER_DIR` or `PATH` environment variable to include the path where the `Microsoft.Spark.Worker` binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.Worker/Debug/net8.0/linux-x64/publish`)
2. Open a terminal and go to the directory where your app binary has been generated (e.g., `~/dotnet.spark/artifacts/bin/Microsoft.Spark.CSharp.Examples/Debug/net8.0/linux-x64/publish`)
3. Running your app follows the basic structure:
```bash
spark-submit \
Expand Down
Loading

0 comments on commit 9f7e852

Please sign in to comment.