Skip to content

Commit

Permalink
Adding Purview's Discovery REST endpoints and Version 0.10.0 (wjohnso…
Browse files Browse the repository at this point in the history
…n#180)

Supports Purview's autocomplete, browse, query, suggest endpoints along with deprecating PurviewClient.search_entities but making that generator interface available in PurviewClient.discovery.search_entities but now pointing to the /query endpoint.
  • Loading branch information
wjohnson authored Dec 14, 2021
1 parent 6f038e6 commit afe2b09
Show file tree
Hide file tree
Showing 13 changed files with 467 additions and 140 deletions.
190 changes: 95 additions & 95 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -1,95 +1,95 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
push:
branches:
- master
- release/**
paths-ignore:
- 'docs/**'
- 'samples/**'
pull_request:
branches: [ master ]
paths-ignore:
- 'docs/**'
- 'samples/**'
release:
types: [created]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest wheel
pip install 'openpyxl>=3.0'
pip install 'requests>=2.0'
pip install .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 ./pyapacheatlas --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 ./pyapacheatlas --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest tests/unit
deploy:
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'created'
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install wheel
- name: Create the artifacts
run: |
python setup.py bdist_wheel sdist
- name: What version am I using?
run: |
CODE_VERSION=$(python setup.py --version)
TAG_VERSION=$(echo $GITHUB_REF | sed 's#.*/##')
if [[ "$TAG_VERSION" == "$CODE_VERSION" ]]; then echo "Match"; else echo "No Match" && exit 1; fi
echo ::set-output name=package_version::$TAG_VERSION
id: vnum

- name: Publish to Test PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
verbose: true

- name: Publish to PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python package

on:
push:
branches:
- master
- release/**
paths-ignore:
- 'docs/**'
- 'samples/**'
pull_request:
branches: [ master ]
paths-ignore:
- 'docs/**'
- 'samples/**'
release:
types: [created]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest wheel
pip install 'openpyxl>=3.0'
pip install 'requests>=2.0'
pip install .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 ./pyapacheatlas --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 ./pyapacheatlas --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest tests/unit
deploy:
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'created'
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install wheel
- name: Create the artifacts
run: |
python setup.py bdist_wheel sdist
- name: What version am I using?
run: |
CODE_VERSION=$(python setup.py --version)
TAG_VERSION=$(echo $GITHUB_REF | sed 's#.*/##')
if [[ "$TAG_VERSION" == "$CODE_VERSION" ]]; then echo "Match"; else echo "No Match" && exit 1; fi
echo ::set-output name=package_version::$TAG_VERSION
id: vnum

- name: Publish to Test PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
verbose: true

- name: Publish to PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
62 changes: 40 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
# PyApacheAtlas: API Support for Azure Purview and Apache Atlas
# PyApacheAtlas: A Python SDK for Azure Purview and Apache Atlas

A python package to work with the Azure Purview and Apache Atlas API. Supporting bulk loading, custom lineage, and more from a Pythonic set of classes and Excel templates.
![PyApacheAtlas Logo](https://repository-images.githubusercontent.com/278894029/9a92fb00-37ee-11eb-8d1a-093914a7ceeb)

PyApacheAtlas lets you work with the Azure Purview and Apache Atlas APIs in a Pythonic way. Supporting bulk loading, custom lineage, custom type definition and more from an SDK and Excel templates / integration.

The package supports programmatic interaction and an Excel template for low-code uploads.

The Excel template provides a means to:
* Bulk upload entities
## Using Excel to Accelerate Metadata Uploads

* Bulk upload entities.
* Upload entities / assets for built-in or custom types.
* Supports adding glossary terms to entities.
* Supports adding classifications to entities.
* Supports creating relationships between entities (e.g. columns of a table).
* Creating custom lineage between two existing entities and using the Azure Purview Column Mappings / Lineage feature.
* Bulk upload of type definitions.
* Bulk upload of classification definitions (Purview Classification rules are not currently supported).
* Creating custom table and complex column level lineage in the [Hive Bridge style](https://atlas.apache.org/0.8.3/Bridge-Hive.html).
* Supports Azure Purview ColumnMapping Attributes.
* Creating custom lineage between existing entities.
* Defining Purview Column Mappings / Column Lineage.
* Bulk upload custom type definitions.
* Bulk upload of classification definitions (Purview Classification Rules not supported).

## Using the Pythonic SDK for Purview and Atlas

The PyApacheAtlas package itself supports those operations and more for the advanced user:
* Programmatically create Entities, Types (Entity, Relationship, etc.).
Expand All @@ -31,14 +36,13 @@ The PyApacheAtlas package itself supports those operations and more for the adva
* Able to create arbitrary relationships between entities.
* e.g. associating a given column with a table.
* Deleting types (by name) or entities (by guid).
* Creating a column lineage scaffolding as in the Hive Bridge Style .
* Performing "What-If" analysis to check if...
* Your entities are valid types.
* Your entities are missing required attributes.
* Your entities are using undefined attributes.
* Search (only for Azure Purview advanced search).
* Authentication to Azure Purview via Service Principal.
* Authentication using basic authentication of username and password for open source Atlas.
* Azure Purview's Search: query, autocomplete, suggest, browse.
* Authentication to Azure Purview using azure-identity and Service Principal
* Authentication to Apache Atlas using basic authentication of username and password.

## Quickstart

Expand All @@ -48,10 +52,29 @@ The PyApacheAtlas package itself supports those operations and more for the adva
python -m pip install pyapacheatlas
```

### Create a Purview Client Connection
### Using Azure-Identity and the Azure CLI to Connect to Purview

For connecting to Azure Purview, it's even more convenient to install the [azure-identity](https://pypi.org/project/azure-identity/) package and its support for Managed Identity, Environment Credential, and Azure CLI credential.

If you want to use your Azure CLI credential rather than a service principal, install azure-identity by running `pip install azure-identity` and then run the code below.

```
from azure.identity import AzureCliCredential
from pyapacheatlas.core import PurviewClient
cred = AzureCliCredential()
# Create a client to connect to your service.
client = PurviewClient(
account_name = "Your-Purview-Account-Name",
authentication = cred
)
```

### Create a Purview Client Connection Using Service Principal

Provides connectivity to your Atlas / Azure Purview service.
Supports getting and uploading entities and type defs.
If you don't want to install any additional packages, you should use the built-in ServicePrincipalAuthentication class.

```
from pyapacheatlas.auth import ServicePrincipalAuthentication
Expand All @@ -70,11 +93,6 @@ client = PurviewClient(
)
```

For users wanting to use the `AtlasClient` and Purview, the Atlas Endpoint for
Purview is `https://{your_purview_name}.catalog.purview.azure.com/api/atlas/v2`.
The PurviewClient abstracts away having to know the endpoint url and is
the better way to use this package with Purview.

### Create Entities "By Hand"

You can also create your own entities by hand with the helper `AtlasEntity` class.
Expand Down Expand Up @@ -119,4 +137,4 @@ Learn more about the Excel [features and configuration in the wiki](https://gith
* Learn more about this package in the [github wiki](https://github.com/wjohnson/pyapacheatlas/wiki/Excel-Template-and-Configuration).
* The [Apache Atlas REST API](http://atlas.apache.org/api/v2/)
* The [Purview CLI Package](https://github.com/tayganr/purviewcli) provides CLI support.
* Purview [REST API Official Docs](https://docs.microsoft.com/en-us/azure/purview/tutorial-using-rest-apis)
* Purview [REST API Official Docs](https://docs.microsoft.com/en-us/rest/api/purview/)
2 changes: 1 addition & 1 deletion pyapacheatlas/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.9.1"
__version__ = "0.10.0"
6 changes: 6 additions & 0 deletions pyapacheatlas/core/client.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .util import AtlasException, AtlasBaseClient, batch_dependent_entities, PurviewLimitation, PurviewOnly
from .glossary import _CrossPlatformTerm, GlossaryClient, PurviewGlossaryClient
from .discovery.purview import PurviewDiscoveryClient
from .typedef import BaseTypeDef
from .msgraph import MsGraphClient
from .entity import AtlasClassification, AtlasEntity
Expand Down Expand Up @@ -1254,6 +1255,7 @@ def upload_relationship(self, relationship):

return results

# TODO: Remove at 1.0.0 release
def _search_generator(self, search_params, starting_offset=0):
"""
Generator to page through the search query results.
Expand Down Expand Up @@ -1300,6 +1302,9 @@ def search_entities(self, query, limit=50, search_filter=None, starting_offset=0
:return: The results of your search as a generator.
:rtype: Iterator(dict)
"""
# TODO: Remove at 1.0.0 release
warnings.warn(
"PurviewClient.search_entities is being deprecated. Please use PurviewClient.discovery.search_entities instead.")

if limit > 1000 or limit < 1:
raise ValueError(
Expand Down Expand Up @@ -1508,6 +1513,7 @@ def __init__(self, account_name, authentication=None):

self.glossary = PurviewGlossaryClient(endpoint_url, authentication)
self.msgraph = MsGraphClient(authentication)
self.discovery = PurviewDiscoveryClient(f"https://{account_name.lower()}.purview.azure.com/catalog/api", authentication)

@PurviewOnly
def get_entity_next_lineage(self, guid, direction, getDerivedLineage=False, offset=0, limit=-1):
Expand Down
1 change: 1 addition & 0 deletions pyapacheatlas/core/discovery/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .purview import PurviewDiscoveryClient
Loading

0 comments on commit afe2b09

Please sign in to comment.