-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move execute_insert to top module nesting so pickle does not err #87
base: ka/parallel
Are you sure you want to change the base?
Conversation
* Support distinct clauses in aggregates * flake8 * fix parentheses add tests * split_distinct should return tuple * fix tests and pep8 * single distinct
This drastically can simplify the writing of categorical comparisons: ``categorical(col, op, choices, include_null=True, maxlen=32)`` Args: col: the column name (or equivalent SQL expression) op: the SQL operation (e.g., '=' or '~' or 'LIKE') choices: A list or dictionary of values. When a dictionary is passed, the keys are a short name for the value. include_null: Should an extra `{col} is NULL` be added? (default True) maxlen: The maximum length of aggregate quantity names (default 32). Names longer than this will be truncated. Returns: a dictionary of aggregate quantities to be passed to Aggregate() A simple helper method to easily create many categorical columns from one source column by comparing it against many values. It effectively creates many quantities of the form "({col} {op} '{elt}')::INT" for elt in choices. The type of the comparison is converted to an integer so it can easily be used with 'sum' (for total count) and 'avg' (for relative fraction) aggregate functions. By default, the aggregates are simply named "{col}_{op}_{choice}", but that can easily get long and exceed the maximum column name length. If any name ends up longer than ``maxlen`` characters (32 by default), then each aggregate name gets truncated with a sequential number appended to ensure that they remain identifiable and unique (but note that ordering is not preserved). Use it like: ```py from collate import collate from collate.helpers import categorical collate.Aggregate(categorical('food', '=', ['hamburger','hotdog','sock']), ['sum','avg']) ```
Allow using None values to specify include_null within Categorical
Add multiple comparison Aggregate subclasses
moved SpacetimeAggregation into its own spacetime module and refactored the where filtering to a method in preparation for fixing join table and #42.
* spacetime join table * join_table arg to execute * python3 dict compatible
* Update sqlalchemy from 1.1.6 to 1.1.7 * Update sqlalchemy from 1.1.6 to 1.1.7
* Update sqlalchemy from 1.1.7 to 1.1.8 * Update sqlalchemy from 1.1.7 to 1.1.8 * Update sphinx from 1.5.3 to 1.5.4
include order in aggregate name and test it
use filter instead of case when
* Add support for restricting the "beginning of time" Adds a new keyword parameter for SpacetimeAggregations that enables restricting the rows included in their calculations based upon an absolute minimum date. Adds actual behavior tests for SpacetimeAggregation with testing.postgresql. This takes a SQL connection to allow validation against a SQL server. By default, `Aggregation.execute()` will call the validate method. SpacetimeAggregations now raise an error in the case where a date/interval combination happens to cross before the beginning of time (so long as the interval is not all). If this proves to be annoying, we can perhaps change it to be a warning or even add an optional override to explicitly allow this to occur. But I think it behooves us to start conservatively.
Scheduled biweekly dependency update for week 16
Allow overriding of choice quoting [Resolves #81]
This is a simple workaround; make non-lazy. Should fix #82
Don't modify dict during iteration when shortening keys
collate/collate.py
Outdated
@@ -328,7 +328,8 @@ def execute_par(self, conn_func, n_jobs=14): | |||
|
|||
insert_list = [insert for insert in inserts[group]] | |||
|
|||
out = Parallel(n_jobs=n_jobs, verbose=51)(delayed(Aggregation.execute_insert)(conn_func, insert) | |||
import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrrgh
collate/collate.py
Outdated
@@ -7,6 +7,23 @@ | |||
from .sql import make_sql_clause, to_sql_name, CreateTableAs, InsertFromSelect | |||
|
|||
|
|||
def execute_insert(get_engine, insert): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe place this into the sql.py
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the destiny for ka/parallel
? We need to refactor things if we want to merge it into master
eventually anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this.
I've merged |
I got a
pickle.PicklingError: Can't pickle <function Aggregation.execute_insert at [...]>
; moving theexecute_insert
out of the Class makes it OK to pickle.