Skip to content

Commit

Permalink
Main Functionality (wjohnson#10)
Browse files Browse the repository at this point in the history
Represents the main functionality of the PyApacheAtlas package.

It closes the following issues:

* Closes wjohnson#1 by supporting column mappings via a table process attribute and maps all related columns (from one or many inputs) to that column mapping.
* Closes wjohnson#3 by supporting a classification column at the Table and Column level.
* Closes wjohnson#4 with the WhatIfValidator to find invalid attributes, missing attributes, or invalid types.
* Closes wjohnson#6 with complete doc strings and types that can be interpreted by vs code and Sphinx docs can be generated.
* Closes wjohnson#7 as the parameters previously specifying `atlas_typedefs` now focus on `relationship_defs` and will always expect to have the correct relationshipAttributeDefs populated from scaffolding.
* Closes wjohnson#9 with support for multiple inputs to table or columns.  Sort of gracefully handles column expressions without sources by choosing the first table process that has the desired target.
  • Loading branch information
wjohnson authored Jul 24, 2020
1 parent d02b81b commit 8ac6111
Show file tree
Hide file tree
Showing 44 changed files with 2,416 additions and 36 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,5 @@ cython_debug/

# Personal edits
hive-reference/
tester.py
*.xlsx
103 changes: 103 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,106 @@
# PyApacheAtlas

A python package to work with the Apache Atlas API and support bulk loading from different file types.

The package currently supports:
* Creating a column lineage scaffolding as in the [Hive Bridge style](https://atlas.apache.org/0.8.3/Bridge-Hive.html).
* Creating and reading from an excel template file
* From Excel, constructing the defined entities and column lineages.
* Table entities
* Column entities
* Table lineage processes
* Column lineage processes
* Supports Azure Data Catalog ColumnMapping Attributes.
* Performing "What-If" analysis to check if...
* Your entities are valid types.
* Your entities are missing required attributes.
* Your entities are using undefined attributes.
* Authentication to Azure Data Catalog via Service Principal.
* Authentication using basic authentication of username and password.

## Quickstart

### Create a Client Connection

Provides connectivity to your Atlas / Data Catalog service. Supports getting and uploading entities and type defs.

```
from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatalas.core import AtlasClient
auth = ServicePrincipalAuthentication(
tenant_id = "",
client_id = "",
client_secret = ""
)
client = AtlasClient(
endpoint_url = "https://MYENDPOINT/api/atlas/v2",
auth = auth
)
```

### Create Entities "By Hand"

You can also create your own entities by hand with the helper `AtlasEntity` class. Convert it with `to_json` to prepare it for upload.

```
from pyapacheatalas.core import AtlasEntity
# Get All Type Defs
all_type_defs = client.get_all_typedefs()
# Get Specific Entities
list_of_entities = client.get_entity(guid=["abc-123-def","ghi-456-jkl"])
# Create a new entity
ae = AtlasEntity(
name = "my table",
typeName = "demo_table",
qualified_name = "somedb.schema.mytable",
guid = -1000
)
# Upload that entity with the client
upload_results = client.upload_entities([ae.to_json()])
```

### Create Entities from Excel

Read from a standardized excel template to create table, column, table process, and column lineage entities. Follows / Requires the hive bridge style of column lineages.

```
from pyapacheatlas.core import GuidTracker, TypeCategory
from pyapacheatlas.scaffolding import column_lineage_scaffold
from pyapacheatlas.scaffolding.templates import excel_template
from pyapacheatlas import from_excel
from pyapacheatlas.readers.excel import ExcelConfiguration
file_path = "./atlas_excel_template.xlsx"
# Create the Excel Template
excel_template(file_path)
# Populate the excel file manually!
# Generate the base atlas type defs
all_type_defs = client.get_typedefs(TypeCategory.ENTITY)
# Create objects for
guid_tracker = GuidTracker()
excel_config = ExcelConfiguration()
# Read from excel file and convert to
entities = from_excel(file_path, excel_config, guid_tracker)
# Prepare a batch by converting everything to json
batch = {"entities":[e.to_json() for e in entities]}
upload_results = client.upload_entities(batch)
print(json.dumps(upload,results,indent=1))
```

## Additional Resources

* Learn more about this package in the github wiki.
* The [Apache Atlas client in Python](https://pypi.org/project/pyatlasclient/)
* The [Apache Atlas REST API](http://atlas.apache.org/api/v2/)
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/img/column_lineage_scaffolding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
54 changes: 54 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))


# -- Project information -----------------------------------------------------

project = 'PyApacheAtlas'
copyright = '2020, Will Johnson'
author = 'Will Johnson'

# The full version, including alpha/beta/rc tags
release = '0.1.0'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'nature'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
26 changes: 26 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. PyApacheAtlas documentation master file, created by
sphinx-quickstart on Sat Jul 18 14:15:43 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to PyApacheAtlas's documentation!
=========================================

.. toctree::
:maxdepth: 2
:caption: Contents:

modules.rst
pyapacheatlas.rst
pyapacheatlas.auth.rst
pyapacheatlas.core.rst
pyapacheatlas.readers.rst
pyapacheatlas.scaffolding.rst
pyapacheatlas.scaffolding.templates.rst

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
7 changes: 7 additions & 0 deletions docs/source/modules.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pyapacheatlas
=============

.. toctree::
:maxdepth: 4

pyapacheatlas
26 changes: 26 additions & 0 deletions docs/source/pyapacheatlas.auth.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
pyapacheatlas.auth package
==========================

pyapacheatlas.auth.base module
------------------------------

.. automodule:: pyapacheatlas.auth.base
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.auth.basic module
-------------------------------

.. automodule:: pyapacheatlas.auth.basic
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.auth.serviceprincipal module
------------------------------------------

.. automodule:: pyapacheatlas.auth.serviceprincipal
:members:
:undoc-members:
:show-inheritance:
42 changes: 42 additions & 0 deletions docs/source/pyapacheatlas.core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
pyapacheatlas.core package
==========================

pyapacheatlas.core.client module
--------------------------------

.. automodule:: pyapacheatlas.core.client
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.core.entity module
--------------------------------

.. automodule:: pyapacheatlas.core.entity
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.core.typedef module
---------------------------------

.. automodule:: pyapacheatlas.core.typedef
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.core.util module
------------------------------

.. automodule:: pyapacheatlas.core.util
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.core.whatif module
--------------------------------

.. automodule:: pyapacheatlas.core.whatif
:members:
:undoc-members:
:show-inheritance:
18 changes: 18 additions & 0 deletions docs/source/pyapacheatlas.readers.core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
pyapacheatlas.readers.core package
==================================

pyapacheatlas.readers.core.column module
----------------------------------------

.. automodule:: pyapacheatlas.readers.core.column
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.readers.core.table module
---------------------------------------

.. automodule:: pyapacheatlas.readers.core.table
:members:
:undoc-members:
:show-inheritance:
29 changes: 29 additions & 0 deletions docs/source/pyapacheatlas.readers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
pyapacheatlas.readers package
=============================

Subpackages
-----------

.. toctree::
:maxdepth: 4

pyapacheatlas.readers.core

Submodules
----------

pyapacheatlas.readers.excel module
----------------------------------

.. automodule:: pyapacheatlas.readers.excel
:members:
:undoc-members:
:show-inheritance:

pyapacheatlas.readers.util module
---------------------------------

.. automodule:: pyapacheatlas.readers.util
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 8ac6111

Please sign in to comment.