Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move execute_insert to top module nesting so pickle does not err #87

Open
wants to merge 73 commits into
base: ka/parallel
Choose a base branch
from

Conversation

kenben
Copy link

@kenben kenben commented May 15, 2017

I got a pickle.PicklingError: Can't pickle <function Aggregation.execute_insert at [...]>; moving the execute_insert out of the Class makes it OK to pickle.

potash and others added 30 commits November 17, 2016 18:45
* Support distinct clauses in aggregates

* flake8

* fix parentheses add tests

* split_distinct should return tuple

* fix tests and pep8

* single distinct
This drastically can simplify the writing of categorical comparisons:

    ``categorical(col, op, choices, include_null=True, maxlen=32)``

    Args:
        col: the column name (or equivalent SQL expression)
        op: the SQL operation (e.g., '=' or '~' or 'LIKE')
        choices: A list or dictionary of values. When a dictionary is passed,
            the keys are a short name for the value.
        include_null: Should an extra `{col} is NULL` be added? (default True)
        maxlen: The maximum length of aggregate quantity names (default 32).
            Names longer than this will be truncated.

    Returns: a dictionary of aggregate quantities to be passed to Aggregate()

    A simple helper method to easily create many categorical columns from one
    source column by comparing it against many values. It effectively creates
    many quantities of the form "({col} {op} '{elt}')::INT" for elt in choices.
    The type of the comparison is converted to an integer so it can easily be
    used with 'sum' (for total count) and 'avg' (for relative fraction)
    aggregate functions.

    By default, the aggregates are simply named "{col}_{op}_{choice}", but
    that can easily get long and exceed the maximum column name length. If any
    name ends up longer than ``maxlen`` characters (32 by default), then each
    aggregate name gets truncated with a sequential number appended to ensure
    that they remain identifiable and unique (but note that ordering is not
    preserved).

Use it like:

```py
from collate import collate
from collate.helpers import categorical

collate.Aggregate(categorical('food', '=', ['hamburger','hotdog','sock']), ['sum','avg'])
```
Allow using None values to specify include_null within Categorical
Add multiple comparison Aggregate subclasses
moved SpacetimeAggregation into its own spacetime module and refactored the where filtering to a method in preparation for fixing join table and #42.
* spacetime join table

* join_table arg to execute

* python3 dict compatible
pyup-bot and others added 23 commits March 14, 2017 16:29
* Update sqlalchemy from 1.1.6 to 1.1.7

* Update sqlalchemy from 1.1.6 to 1.1.7
* Update sqlalchemy from 1.1.7 to 1.1.8

* Update sqlalchemy from 1.1.7 to 1.1.8

* Update sphinx from 1.5.3 to 1.5.4
include order in aggregate name and test it
use filter instead of case when
* Add support for restricting the "beginning of time"

Adds a new keyword parameter for SpacetimeAggregations that enables restricting the rows included in their calculations based upon an absolute minimum date.

Adds actual behavior tests for SpacetimeAggregation with testing.postgresql.

This takes a SQL connection to allow validation against a SQL server. By default, `Aggregation.execute()` will call the validate method.  SpacetimeAggregations now raise an error in the case where a date/interval combination happens to cross before the beginning of time (so long as the interval is not all).

If this proves to be annoying, we can perhaps change it to be a warning or even add an optional override to explicitly allow this to occur.  But I think it behooves us to start conservatively.
Scheduled biweekly dependency update for week 16
Allow overriding of choice quoting [Resolves #81]
This is a simple workaround; make  non-lazy. Should fix #82
Don't modify dict during iteration when shortening keys
@@ -328,7 +328,8 @@ def execute_par(self, conn_func, n_jobs=14):

insert_list = [insert for insert in inserts[group]]

out = Parallel(n_jobs=n_jobs, verbose=51)(delayed(Aggregation.execute_insert)(conn_func, insert)
import pdb;pdb.set_trace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrrgh

@@ -7,6 +7,23 @@
from .sql import make_sql_clause, to_sql_name, CreateTableAs, InsertFromSelect


def execute_insert(get_engine, insert):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe place this into the sql.py file?

Copy link
Author

@kenben kenben May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the destiny for ka/parallel? We need to refactor things if we want to merge it into master eventually anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this.

@kenben
Copy link
Author

kenben commented May 16, 2017

I've merged master into this branch, and changed police-eis/pbp_additions to work with this branch.

@kenben kenben requested a review from k1aus May 16, 2017 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants