Skip to content


Main Functionality (wjohnson#10)
Browse files Browse the repository at this point in the history
Represents the main functionality of the PyApacheAtlas package.

It closes the following issues:

* Closes wjohnson#1 by supporting column mappings via a table process attribute and maps all related columns (from one or many inputs) to that column mapping.
* Closes wjohnson#3 by supporting a classification column at the Table and Column level.
* Closes wjohnson#4 with the WhatIfValidator to find invalid attributes, missing attributes, or invalid types.
* Closes wjohnson#6 with complete doc strings and types that can be interpreted by vs code and Sphinx docs can be generated.
* Closes wjohnson#7 as the parameters previously specifying `atlas_typedefs` now focus on `relationship_defs` and will always expect to have the correct relationshipAttributeDefs populated from scaffolding.
* Closes wjohnson#9 with support for multiple inputs to table or columns.  Sort of gracefully handles column expressions without sources by choosing the first table process that has the desired target.
  • Loading branch information
wjohnson authored Jul 24, 2020
1 parent d02b81b commit 8ac6111
Show file tree
Hide file tree
Showing 44 changed files with 2,416 additions and 36 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,5 @@ cython_debug/

# Personal edits
103 changes: 103 additions & 0 deletions
Original file line number Diff line number Diff line change
@@ -1,3 +1,106 @@
# PyApacheAtlas

A python package to work with the Apache Atlas API and support bulk loading from different file types.

The package currently supports:
* Creating a column lineage scaffolding as in the [Hive Bridge style](
* Creating and reading from an excel template file
* From Excel, constructing the defined entities and column lineages.
* Table entities
* Column entities
* Table lineage processes
* Column lineage processes
* Supports Azure Data Catalog ColumnMapping Attributes.
* Performing "What-If" analysis to check if...
* Your entities are valid types.
* Your entities are missing required attributes.
* Your entities are using undefined attributes.
* Authentication to Azure Data Catalog via Service Principal.
* Authentication using basic authentication of username and password.

## Quickstart

### Create a Client Connection

Provides connectivity to your Atlas / Data Catalog service. Supports getting and uploading entities and type defs.

from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatalas.core import AtlasClient
auth = ServicePrincipalAuthentication(
tenant_id = "",
client_id = "",
client_secret = ""
client = AtlasClient(
endpoint_url = "https://MYENDPOINT/api/atlas/v2",
auth = auth

### Create Entities "By Hand"

You can also create your own entities by hand with the helper `AtlasEntity` class. Convert it with `to_json` to prepare it for upload.

from pyapacheatalas.core import AtlasEntity
# Get All Type Defs
all_type_defs = client.get_all_typedefs()
# Get Specific Entities
list_of_entities = client.get_entity(guid=["abc-123-def","ghi-456-jkl"])
# Create a new entity
ae = AtlasEntity(
name = "my table",
typeName = "demo_table",
qualified_name = "somedb.schema.mytable",
guid = -1000
# Upload that entity with the client
upload_results = client.upload_entities([ae.to_json()])

### Create Entities from Excel

Read from a standardized excel template to create table, column, table process, and column lineage entities. Follows / Requires the hive bridge style of column lineages.

from pyapacheatlas.core import GuidTracker, TypeCategory
from pyapacheatlas.scaffolding import column_lineage_scaffold
from pyapacheatlas.scaffolding.templates import excel_template
from pyapacheatlas import from_excel
from pyapacheatlas.readers.excel import ExcelConfiguration
file_path = "./atlas_excel_template.xlsx"
# Create the Excel Template
# Populate the excel file manually!
# Generate the base atlas type defs
all_type_defs = client.get_typedefs(TypeCategory.ENTITY)
# Create objects for
guid_tracker = GuidTracker()
excel_config = ExcelConfiguration()
# Read from excel file and convert to
entities = from_excel(file_path, excel_config, guid_tracker)
# Prepare a batch by converting everything to json
batch = {"entities":[e.to_json() for e in entities]}
upload_results = client.upload_entities(batch)

## Additional Resources

* Learn more about this package in the github wiki.
* The [Apache Atlas client in Python](
* The [Apache Atlas REST API](
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
Binary file added docs/img/column_lineage_scaffolding.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

if errorlevel 9009 (
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.If you don't have Sphinx installed, grab it from
exit /b 1

goto end


54 changes: 54 additions & 0 deletions docs/source/
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Configuration file for the Sphinx documentation builder.
# This file only contains a selection of the most common options. For a full
# list see the documentation:

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))

# -- Project information -----------------------------------------------------

project = 'PyApacheAtlas'
copyright = '2020, Will Johnson'
author = 'Will Johnson'

# The full version, including alpha/beta/rc tags
release = '0.1.0'

# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'nature'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
26 changes: 26 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. PyApacheAtlas documentation master file, created by
sphinx-quickstart on Sat Jul 18 14:15:43 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to PyApacheAtlas's documentation!

.. toctree::
:maxdepth: 2
:caption: Contents:


Indices and tables

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
7 changes: 7 additions & 0 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

.. toctree::
:maxdepth: 4

26 changes: 26 additions & 0 deletions docs/source/pyapacheatlas.auth.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
pyapacheatlas.auth package

pyapacheatlas.auth.base module

.. automodule:: pyapacheatlas.auth.base

pyapacheatlas.auth.basic module

.. automodule:: pyapacheatlas.auth.basic

pyapacheatlas.auth.serviceprincipal module

.. automodule:: pyapacheatlas.auth.serviceprincipal
42 changes: 42 additions & 0 deletions docs/source/pyapacheatlas.core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
pyapacheatlas.core package

pyapacheatlas.core.client module

.. automodule:: pyapacheatlas.core.client

pyapacheatlas.core.entity module

.. automodule:: pyapacheatlas.core.entity

pyapacheatlas.core.typedef module

.. automodule:: pyapacheatlas.core.typedef

pyapacheatlas.core.util module

.. automodule:: pyapacheatlas.core.util

pyapacheatlas.core.whatif module

.. automodule:: pyapacheatlas.core.whatif
18 changes: 18 additions & 0 deletions docs/source/pyapacheatlas.readers.core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
pyapacheatlas.readers.core package

pyapacheatlas.readers.core.column module

.. automodule:: pyapacheatlas.readers.core.column

pyapacheatlas.readers.core.table module

.. automodule:: pyapacheatlas.readers.core.table
29 changes: 29 additions & 0 deletions docs/source/pyapacheatlas.readers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
pyapacheatlas.readers package


.. toctree::
:maxdepth: 4



pyapacheatlas.readers.excel module

.. automodule:: pyapacheatlas.readers.excel

pyapacheatlas.readers.util module

.. automodule:: pyapacheatlas.readers.util

0 comments on commit 8ac6111

Please sign in to comment.