We believe that working with data, specifically the exploration and integration part, should be fun! Therefore, our API and Python package is designed to seamlessly support a data scientist's and engineer's daily work.
If you have questions, feel free to reach out!
The Fusionbase python package is open source software released by Fusionbase's Engineering Team. It is available for download on PyPI.
- Homepage: https://fusionbase.com
- HTML documentation: https://developer.fusionbase.com/fusionbase-api/
- Issue tracker: https://github.com/FusionbaseHQ/fusionbase/issues
- Source code repository: https://github.com/FusionbaseHQ/fusionbase
- Contributing: Reach out to us! [email protected]
- Fusionbase Python package: https://pypi.python.org/pypi/fusionbase/
Fusionbase is on PyPI, so you can use pip
to install it.
pip install fusionbase
If you want to use all features and be able to retrieve the data directly as pandas dataframes, you must make sure that pandas and numpy are installed.
pip install pandas
pip install numpy
Fusionbase by default uses the standard JSON library of Python to serialize and locally store data. However, you can use the faster orjson
library as a drop-in replacement.
Therefore, just install orjson
and Fusionbase will automatically detect and use it.
pip install orjson
Got to examples to deep dive into Fusionbase and see various examples on how to use the package.
Here are some Examples for a quick start:
The Data Stream module lets you conveniently access data and metadata of all Data Streams. Each stream can be accessed via its unique stream id or label.
Setup
# Import Fusionbase
from fusionbase import Fusionbase
# Create a new datastream
# Provide your API Key
fusionbase = Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"})
# If you prefer to have extended logging output and information
# Like a progress bar for downloading datastreams etc.
# Turn on the log
fusionbase = Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"}, log=True)
# Get the datastream with the key "28654971"
data_stream_key = "28654971"
data_stream = fusionbase.get_datastream(data_stream_key)
Human readable datastream information
# Print a nice table containing the meta data of the stream
data_stream.pretty_meta_data()
Getting the data
The samples below show how to retrieve the data of a datastream as a list of dictionaries. Each element in the list represents one row within the dataset.
Note that the data can by hierarchical.
# The following returns the full datastream as a list of dictionaries
# It uses a local cache if available
data = data_stream.get_data()
print(data)
# Get always the latest data from Fusionbase
data = data_stream.get_data(live=True)
print(data)
# If you only need a subset of the columns
# You'll gain much performance by only selecting those columns
data = data_stream.get_data(fields=["NAME_OF_COLUMN_1", "NAME_OF_COLUMN_N"])
print(data)
# If you need only an excerpt of the data you can use skip and limit
# The sample below gets the 10 first rows
data = data_stream.get_data(skip=0, limit=10)
print(data)
Get Data as a pandas DataFrame
If you are working with pandas, it is probably the most convenient way to load to data directly as a pandas DataFrame.
# Load the data from Fusionbase, cache it and put it in a pandas DataFrame
df = data_stream.as_dataframe()
print(df)
# Force ignoring the cache and make sure to get the latest data
df = data_stream.as_dataframe(live=True)
print(df)
Storing the data
Large datasets potentially do not fit into the memory. Therefore, it is possible to get the data of a stream directly as partitioned files.
The folder structure is automatically created and always like ./{ID-OF-THE-STREAM}/data/*
from pathlib import Path
# Store as JSON files
data_stream.as_json_files(storage_path=Path("./data/"))
# Store as CSV files
data_stream.as_csv_files(storage_path=Path("./data/"))
# Store as Pickle files
data_stream.as_pickle_files(storage_path=Path("./data/"))
A data service can be seen as an API that returns a certain output for a specific input. For example, our address normalization service parses an address and returns the structured and normalized parts of it.
Setup
# Import Fusionbase
from fusionbase.Fusionbase import Fusionbase
# Create a new dataservice
# Provide your API Key and the Fusionbase API URI (usually: https://api.fusionbase.com/api/v1)
fusionbase = Fusionbase(auth={"api_key": "*** SECRET CREDENTIALS ***"})
data_service_key = "23622632"
data_service = fusionbase.get_dataservice(data_service_key)
Human readable dataservice information:
# Retrieves the metadata from a Service by giving a Service specific key and prints it nicely to console
data_service.pretty_meta_data()
Human readable dataservice definition:
# Retrieve the request definition (such as required parameters) from a Service by giving a Service specific key and print it to console.
data_service.pretty_request_definition()
Invoke a dataservice:
# Invoke a service by providing input data
# The following lines of code are equivalent
# Services can be invoked directly by their parameter names
result = data_service.invoke(address_string="Agnes-Pockels-Bogen 1, 80992 München")
# Or using a list of parameter key and value pairs
payload = [
{
"name": "address_string", # THIS IS THE NAME OF THE INPUT VALUE
"value": "Agnes-Pockels-Bogen 1, 80992 München" # THE VALUE FOR THE INPUT
}
]
result = data_service.invoke(parameters=payload)
print(result)
- orjson is now installed as dependeny
- Minor bug fixes and improvements
- Added an option to leave the 'Auth' parameter None when creating a Fusionbase object if a corresponding environment variable (
FUSIONBASE_API_KEY
) is present
- Some improvements
- Performance improvements
- More flexible
limit
parameter
- Minor improvements and additional test cases
- Hotfix: Fix DataChunker import error
- New methods to store data as files (json, csv and pickle)
- Improve logging features and completely new progress bar
- Major performance improvements
- Leverage async and multiprocessing more
- Add option to use
orjson
for faster json dumps
- Various bug fixes
- Minor fixes and improvements
- Feature: Add top-level authentication (breaking change)
- New API for invoking data services
- New caching method for data services
- Bugfix: Skip and limit parameters now work as intended
- Bugfix: Fix exception handling in update_create method
- Added tests for DataStream and DataService classes
- Bugfix:
fields
parameter inget_data
andget_dataframe
works as intended now.
- Initial release
Contributing to Fusionbase can be in contributions to the code base, sharing your experience and insights in the community on the Forums, or contributing to projects that make use of Fusionbase. Please see the contributing guide for more specifics.
The Fusionbase python package is licensed under the GPL 3.