datascope-tools
might eventually consist of a collection of scripts,
tools, and ideas that we have collected over the years at
Datascope Analytics that help with
our work. This is somewhat inspired by
bit.ly's data_hacks
-
pair-with
: simple script to manipulate.hg/hgrc
files when pairing with other people -
decrapify
: simple script to remove junk files that commonly occur when we're writing code or experimenting with scripts
-
decorators.cache
: While developing scrapers, code that interacts with APIs, and other functions that take a little while to run, I find myself often wanting to cache the results of calls to functions for a little while during development, like this.from datascope_tools import decorators @decorators.cache(expire_after=30*60) def v_important(): time.sleep(42) return 'vnice'
-
iterators.itergrams
: Iterate over consecutive overalapping tuples in an iterable. Doesn't read the entire list into memory so is useful for BIG DATA.from datascope_tools import iterators data = ['you', 'like-a', 'da', 'juice', 'eh?'] for ngram in iterators.itergrams(data, n=3): print(ngram) >> ('you', 'like-a', 'da') >> ('like-a', 'da', 'juice') >> ('da', 'juice', 'eh?')
-
iterators.rank_sorted
: A combination ofenumerate
andsort
iterators that also deals with tied ranks. Say there is a game where you get scored and you want to rank people:scores = [ ('alex', 100), ('bo', 100), ('brian', 100), ('dean', 10), ('horatio', 90), ('irmak', 100), ('jess', 100), ('karlota', 90), ('melba', 90), ('meridith', 100), ('michael', 100), ('mike', 100), ('mollie', 100), ('vlad', 100), ] answer = list(rank_sorted(scores, key=operator.itemgetter(1))) >> [ (1, ('alex', 100)), (1, ('bo', 100)), ... (11, ('horatio', 90)), (11, ('karlota', 90)), (11, ('melba', 90)), (14, ('dean', 10)), ]
-
mixins.UniqueKey
: Sometimes it's nice to have instances of classes as keys in a dictionary or as members of a set. The UniqueKey mixin sets the__hash__
and__eq__
function so that instances will be hashed using a "unique identifier" attribute.class Snowflake(UniqueKey): def __init__(self, snowflake_id): self.id = snowflake_id set([Snowflake('a'), Snowflake('a')]) >> set([Snowflake('a')])