Skip to content


Adding Purview's Discovery REST endpoints and Version 0.10.0 (wjohnso…
Browse files Browse the repository at this point in the history

Supports Purview's autocomplete, browse, query, suggest endpoints along with deprecating PurviewClient.search_entities but making that generator interface available in PurviewClient.discovery.search_entities but now pointing to the /query endpoint.
  • Loading branch information
wjohnson authored Dec 14, 2021
1 parent 6f038e6 commit afe2b09
Show file tree
Hide file tree
Showing 13 changed files with 467 additions and 140 deletions.
190 changes: 95 additions & 95 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
@@ -1,95 +1,95 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see:

name: Python package

- master
- release/**
- 'docs/**'
- 'samples/**'
branches: [ master ]
- 'docs/**'
- 'samples/**'
types: [created]


runs-on: ubuntu-latest
python-version: [3.6, 3.7, 3.8, 3.9]

- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest wheel
pip install 'openpyxl>=3.0'
pip install 'requests>=2.0'
pip install .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 ./pyapacheatlas --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 ./pyapacheatlas --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest tests/unit
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'created'
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v2
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install wheel
- name: Create the artifacts
run: |
python bdist_wheel sdist
- name: What version am I using?
run: |
CODE_VERSION=$(python --version)
TAG_VERSION=$(echo $GITHUB_REF | sed 's#.*/##')
if [[ "$TAG_VERSION" == "$CODE_VERSION" ]]; then echo "Match"; else echo "No Match" && exit 1; fi
echo ::set-output name=package_version::$TAG_VERSION
id: vnum

- name: Publish to Test PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
verbose: true

- name: Publish to PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see:

name: Python package

- master
- release/**
- 'docs/**'
- 'samples/**'
branches: [ master ]
- 'docs/**'
- 'samples/**'
types: [created]


runs-on: ubuntu-latest
python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]

- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest wheel
pip install 'openpyxl>=3.0'
pip install 'requests>=2.0'
pip install .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 ./pyapacheatlas --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 ./pyapacheatlas --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest tests/unit
runs-on: ubuntu-latest
if: github.event_name == 'release' && github.event.action == 'created'
- uses: actions/checkout@v2
- name: Set up Python 3.7
uses: actions/setup-python@v2
python-version: 3.7

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install wheel
- name: Create the artifacts
run: |
python bdist_wheel sdist
- name: What version am I using?
run: |
CODE_VERSION=$(python --version)
TAG_VERSION=$(echo $GITHUB_REF | sed 's#.*/##')
if [[ "$TAG_VERSION" == "$CODE_VERSION" ]]; then echo "Match"; else echo "No Match" && exit 1; fi
echo ::set-output name=package_version::$TAG_VERSION
id: vnum

- name: Publish to Test PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
verbose: true

- name: Publish to PyPI
if: github.event_name == 'release' && github.event.action == 'created'
uses: pypa/[email protected]
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
62 changes: 40 additions & 22 deletions
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
# PyApacheAtlas: API Support for Azure Purview and Apache Atlas
# PyApacheAtlas: A Python SDK for Azure Purview and Apache Atlas

A python package to work with the Azure Purview and Apache Atlas API. Supporting bulk loading, custom lineage, and more from a Pythonic set of classes and Excel templates.
![PyApacheAtlas Logo](

PyApacheAtlas lets you work with the Azure Purview and Apache Atlas APIs in a Pythonic way. Supporting bulk loading, custom lineage, custom type definition and more from an SDK and Excel templates / integration.

The package supports programmatic interaction and an Excel template for low-code uploads.

The Excel template provides a means to:
* Bulk upload entities
## Using Excel to Accelerate Metadata Uploads

* Bulk upload entities.
* Upload entities / assets for built-in or custom types.
* Supports adding glossary terms to entities.
* Supports adding classifications to entities.
* Supports creating relationships between entities (e.g. columns of a table).
* Creating custom lineage between two existing entities and using the Azure Purview Column Mappings / Lineage feature.
* Bulk upload of type definitions.
* Bulk upload of classification definitions (Purview Classification rules are not currently supported).
* Creating custom table and complex column level lineage in the [Hive Bridge style](
* Supports Azure Purview ColumnMapping Attributes.
* Creating custom lineage between existing entities.
* Defining Purview Column Mappings / Column Lineage.
* Bulk upload custom type definitions.
* Bulk upload of classification definitions (Purview Classification Rules not supported).

## Using the Pythonic SDK for Purview and Atlas

The PyApacheAtlas package itself supports those operations and more for the advanced user:
* Programmatically create Entities, Types (Entity, Relationship, etc.).
Expand All @@ -31,14 +36,13 @@ The PyApacheAtlas package itself supports those operations and more for the adva
* Able to create arbitrary relationships between entities.
* e.g. associating a given column with a table.
* Deleting types (by name) or entities (by guid).
* Creating a column lineage scaffolding as in the Hive Bridge Style .
* Performing "What-If" analysis to check if...
* Your entities are valid types.
* Your entities are missing required attributes.
* Your entities are using undefined attributes.
* Search (only for Azure Purview advanced search).
* Authentication to Azure Purview via Service Principal.
* Authentication using basic authentication of username and password for open source Atlas.
* Azure Purview's Search: query, autocomplete, suggest, browse.
* Authentication to Azure Purview using azure-identity and Service Principal
* Authentication to Apache Atlas using basic authentication of username and password.

## Quickstart

Expand All @@ -48,10 +52,29 @@ The PyApacheAtlas package itself supports those operations and more for the adva
python -m pip install pyapacheatlas

### Create a Purview Client Connection
### Using Azure-Identity and the Azure CLI to Connect to Purview

For connecting to Azure Purview, it's even more convenient to install the [azure-identity]( package and its support for Managed Identity, Environment Credential, and Azure CLI credential.

If you want to use your Azure CLI credential rather than a service principal, install azure-identity by running `pip install azure-identity` and then run the code below.

from azure.identity import AzureCliCredential
from pyapacheatlas.core import PurviewClient
cred = AzureCliCredential()
# Create a client to connect to your service.
client = PurviewClient(
account_name = "Your-Purview-Account-Name",
authentication = cred

### Create a Purview Client Connection Using Service Principal

Provides connectivity to your Atlas / Azure Purview service.
Supports getting and uploading entities and type defs.
If you don't want to install any additional packages, you should use the built-in ServicePrincipalAuthentication class.

from pyapacheatlas.auth import ServicePrincipalAuthentication
Expand All @@ -70,11 +93,6 @@ client = PurviewClient(

For users wanting to use the `AtlasClient` and Purview, the Atlas Endpoint for
Purview is `https://{your_purview_name}`.
The PurviewClient abstracts away having to know the endpoint url and is
the better way to use this package with Purview.

### Create Entities "By Hand"

You can also create your own entities by hand with the helper `AtlasEntity` class.
Expand Down Expand Up @@ -119,4 +137,4 @@ Learn more about the Excel [features and configuration in the wiki](https://gith
* Learn more about this package in the [github wiki](
* The [Apache Atlas REST API](
* The [Purview CLI Package]( provides CLI support.
* Purview [REST API Official Docs](
* Purview [REST API Official Docs](
2 changes: 1 addition & 1 deletion pyapacheatlas/
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.9.1"
__version__ = "0.10.0"
6 changes: 6 additions & 0 deletions pyapacheatlas/core/
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from .util import AtlasException, AtlasBaseClient, batch_dependent_entities, PurviewLimitation, PurviewOnly
from .glossary import _CrossPlatformTerm, GlossaryClient, PurviewGlossaryClient
from .discovery.purview import PurviewDiscoveryClient
from .typedef import BaseTypeDef
from .msgraph import MsGraphClient
from .entity import AtlasClassification, AtlasEntity
Expand Down Expand Up @@ -1254,6 +1255,7 @@ def upload_relationship(self, relationship):

return results

# TODO: Remove at 1.0.0 release
def _search_generator(self, search_params, starting_offset=0):
Generator to page through the search query results.
Expand Down Expand Up @@ -1300,6 +1302,9 @@ def search_entities(self, query, limit=50, search_filter=None, starting_offset=0
:return: The results of your search as a generator.
:rtype: Iterator(dict)
# TODO: Remove at 1.0.0 release
"PurviewClient.search_entities is being deprecated. Please use PurviewClient.discovery.search_entities instead.")

if limit > 1000 or limit < 1:
raise ValueError(
Expand Down Expand Up @@ -1508,6 +1513,7 @@ def __init__(self, account_name, authentication=None):

self.glossary = PurviewGlossaryClient(endpoint_url, authentication)
self.msgraph = MsGraphClient(authentication)
self.discovery = PurviewDiscoveryClient(f"https://{account_name.lower()}", authentication)

def get_entity_next_lineage(self, guid, direction, getDerivedLineage=False, offset=0, limit=-1):
Expand Down
1 change: 1 addition & 0 deletions pyapacheatlas/core/discovery/
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .purview import PurviewDiscoveryClient

0 comments on commit afe2b09

Please sign in to comment.