Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add SQL adapter #779

Merged
merged 54 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
5fd1bac
Add SQL Adapter
Aug 22, 2024
4389b68
some further changes for sql adapter
skarakuzu Jan 28, 2025
6f106a0
Fix rebase error
danielballan Jan 28, 2025
5f8d3ac
few more changes
skarakuzu Jan 28, 2025
35e25f5
Implement from_catalog on SQLAdapter
danielballan Jan 28, 2025
0d4d8f5
fixed sql tests
skarakuzu Jan 29, 2025
c437347
fix typing
skarakuzu Jan 29, 2025
4dbfd20
add TYPE_CHECKING conditional
skarakuzu Jan 29, 2025
f004ae2
Adjust for dataclass structures.
danielballan Feb 11, 2025
f1f7d4c
postgres text fix
skarakuzu Feb 11, 2025
2f3abf7
fix mypy in test
skarakuzu Feb 12, 2025
f6c8cf4
add sql tests
skarakuzu Feb 12, 2025
088ea39
Add comments.
danielballan Feb 12, 2025
0d0022e
Format index creation more nicely
danielballan Feb 12, 2025
40dd070
Clean up arrow to SQL type translation.
danielballan Feb 12, 2025
3c90df6
Use DuckDB for embedded tabular storage.
danielballan Feb 12, 2025
52d5b9e
Remove spurrious file
danielballan Feb 12, 2025
dbfe262
Remame method, and test integration.
danielballan Feb 12, 2025
ba4fde0
Ensure writable directory has scheme 'file:'.
danielballan Feb 12, 2025
0ee14e3
Reinstate original intent of test.
danielballan Feb 12, 2025
0b6bd23
Support SQLite for tabular storage too.
danielballan Feb 13, 2025
9f382a5
Remove unneeded conversion
danielballan Feb 13, 2025
f7f6aee
Remove unneeded conversion
danielballan Feb 13, 2025
159c152
Docstring improvements
danielballan Feb 13, 2025
6bc266b
Copyedit comment
danielballan Feb 13, 2025
6c91fd9
Remove another unnecessary conversion
danielballan Feb 13, 2025
7c748c3
Remove another unneeded conversion
danielballan Feb 13, 2025
781fb13
Remove more needless conversions
danielballan Feb 13, 2025
748cc47
Remove access policy
danielballan Feb 13, 2025
1b3148b
Finish removing access policy
danielballan Feb 13, 2025
3f374d6
mypy fixes and addressed comments
skarakuzu Feb 13, 2025
07ae5a8
typing fix and addressed comments
skarakuzu Feb 13, 2025
eaa792e
mypy fix
skarakuzu Feb 13, 2025
280c115
added CHANGELOG
skarakuzu Feb 13, 2025
1c4bf2b
fixed heavy import test
skarakuzu Feb 13, 2025
2fc12a6
Type-check Connection.
danielballan Feb 14, 2025
75bb797
Validate partition number.
danielballan Feb 14, 2025
a87bccc
Inline small function only used once
danielballan Feb 14, 2025
e009cda
whitespace changes for readability
danielballan Feb 14, 2025
a34c8ac
Update 'See Also' for renamed method.
danielballan Feb 14, 2025
6191c16
Remove unreachable code and consolidate temporaries.
danielballan Feb 14, 2025
e1aaa3d
Tweak server startup messages.
danielballan Feb 14, 2025
2f2d787
Refactor Storage
danielballan Feb 14, 2025
91c4092
Follow-up fix to Storage refactor
danielballan Feb 14, 2025
f6c89aa
Sketch support for multiple tiled partitions.
danielballan Feb 14, 2025
dc19d2c
Use SEQUENCE instead of random dataset_id.
danielballan Feb 14, 2025
5d2c06d
Fix typo
danielballan Feb 14, 2025
e7854b0
Use int32 for dataset ID.
danielballan Feb 14, 2025
50a6d77
added and refactored tests for sql adapter
skarakuzu Feb 14, 2025
cece183
added field specific reading tests for sql adapter
skarakuzu Feb 14, 2025
e506fde
fixed psql test
skarakuzu Feb 14, 2025
b7d0646
more fix
skarakuzu Feb 14, 2025
b420319
Row order is not guaranteed.
danielballan Feb 14, 2025
e00270f
Manage cursor and connection lifecycle.
danielballan Feb 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ Write the date in place of the "Unreleased" in the case a new version is release

## Unreleased


### Added

- Added `SQLAdapter` which can save and interact with table structured data in `sqlite` , `postgresql` and `duckdb` databases using `arrow-adbc` API calls.

### Changed

- Removed pydantic-based definitions of structures, which had duplicated
Expand Down
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ tiled = "tiled.commandline.main:main"

# This is the union of all optional dependencies.
all = [
"adbc_driver_manager",
"adbc_driver_postgresql",
"adbc_driver_sqlite",
"aiofiles",
"aiosqlite",
"alembic",
Expand All @@ -68,6 +71,7 @@ all = [
"dask",
"dask[array]",
"dask[dataframe]",
"duckdb",
"entrypoints",
"fastapi",
"h5netcdf",
Expand Down Expand Up @@ -196,6 +200,9 @@ minimal-server = [
]
# This is the "kichen sink" fully-featured server dependency set.
server = [
"adbc_driver_manager",
"adbc_driver_postgresql",
"adbc_driver_sqlite",
"aiofiles",
"aiosqlite",
"alembic",
Expand All @@ -209,6 +216,7 @@ server = [
"dask",
"dask[array]",
"dask[dataframe]",
"duckdb",
"fastapi",
"h5netcdf",
"h5py",
Expand Down
26 changes: 23 additions & 3 deletions tiled/_tests/adapters/test_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import pytest

from tiled.adapters.arrow import ArrowAdapter
from tiled.structures.core import StructureFamily
from tiled.structures.data_source import DataSource, Management, Storage
from tiled.structures.table import TableStructure

names = ["f0", "f1", "f2"]
Expand All @@ -26,11 +28,29 @@


@pytest.fixture
def adapter() -> ArrowAdapter:
def data_source_from_init_storage() -> DataSource[TableStructure]:
table = pa.Table.from_arrays(data0, names)
structure = TableStructure.from_arrow_table(table, npartitions=3)
assets = ArrowAdapter.init_storage(data_uri, structure=structure)
return ArrowAdapter([asset.data_uri for asset in assets], structure=structure)
data_source = DataSource(
management=Management.writable,
mimetype="application/vnd.apache.arrow.file",
structure_family=StructureFamily.table,
structure=structure,
assets=[],
)
storage = Storage(filesystem=data_uri, sql=None)
return ArrowAdapter.init_storage(
data_source=data_source, storage=storage, path_parts=[]
)


@pytest.fixture
def adapter(data_source_from_init_storage: DataSource[TableStructure]) -> ArrowAdapter:
data_source = data_source_from_init_storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call the fixture data_source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not a fixture named data_source , I do not get what you mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call the fixture data_source, instead of calling it data_source_from_init_storage and then immediately assigning data_source = data_source_from_init_storage

return ArrowAdapter(
[asset.data_uri for asset in data_source.assets],
data_source.structure,
)


def test_attributes(adapter: ArrowAdapter) -> None:
Expand Down
Loading