Skip to content

Features, prioritized

brianboyer edited this page Feb 3, 2012 · 62 revisions

Still up for grabs, priority unknown

  • Permissions/set-level security (like Doc Cloud or LDAP?? got another suggestion... project teams? less like a hierarchy, more like a circle or ad-hoc group)
  • Sharing between organizations (not sharing the whole PANDA, just parts)
  • Edit data in PANDA, delete rows, add new columns, etc., read-only lock on a set?
  • Address normalization (solvable with fuzzy search instead?)
  • S, M, L sizing, or something like it
  • Faceted search
  • Fancy query builder like Doc Cloud

Must-have

  • TODO B1 -- Amazon Machine Image
  • Import w/ arbitrary delimiters (not just commas)
  • Import from fixed-width files
  • Comments on a dataset
  • Primitive column types (int, varchar, date, etc.)
  • Meta type columns
    • Address (and address like-stuffs)
  • In-system metrics. A dashboard for the admins of the PANDA instance, so that they can measure how well it's working inside their organization. (sneaky new feature inserted by Brian as the result of an interesting conversation with some of the folks that Knight asks that I speak with)
  • Profile stuff (create users, change my password, etc)
  • DONE A1 -- Store the original file
  • DONE A1 -- Data set metadata (source, provenance)
  • DONE A1 -- Import from CSV
  • DONE A1 -- Async data import (queuing)
  • DONE A1 -- Full-text search on a dataset
  • DONE A2 -- Taxonomy for datasets (categories, tags?)
  • DONE A2 -- Search dataset metadata (help me find a dataset)
  • DONE A2 -- Login/users
  • DONE A3 -- Cumulative data sets via write API
  • DONE A3 -- Cumulative data sets via write API demo
  • DONE A3 -- Cumulative data sets via scraperwiki (??)
  • DONE A3 -- Import from Excel (maybe by explaining people to use CSV, maybe parsing)
  • DONE A4 -- Cumulative data sets via additional file uploads (maybe this is solved with versioning?)
  • DONE A4 -- Encrypted communications (SSL)
  • DONE A4 -- Export a dataset (to csv, xls? etc)
  • DONE A4 -- Browser compatibility w/ recent versions of modern browsers: FF/Chrome/Safari/IE Beta 9
  • DONE A4 -- Documents related to the dataset
  • DONE B1 -- A plan for scaling (how to grow your PANDA)
  • DONE B1 -- Import wizard/walk through UI
  • DONE B1 -- Async data export

Want

  • Document our advanced query language for end users (solr-style)
  • Date range search
  • Related stories on a dataset
  • I18n/L10n
  • Initial demo data
  • Export search results (to csv, etc)
  • Iterative updates to a dataset (quarterly updates, etc. keep the old list)
  • Version tracking for datasets
  • Export a subset of a dataset (fewer columns from a wide set, filtered rows, etc)
  • Google Refine reconciliation endpoint
  • PANDA-hosted Google Refine
  • Import localized number formats (1.000, 1 000, 1,000)
  • IE7 support
  • Fuzzy name search (Abbreviations, Bill/William)
  • Other datasets related to this one (grouping?)
  • Row-level comments
  • Meta type columns
    • Birthdate
    • Phone number
  • Notifications (email? RSS?) for new data sets, new data in sets, etc.

Gravy

  • Number range search
  • Meta type columns
    • Location (lat/lng)
    • URL
    • SSN
    • Money
    • Organization (name, DUNS, etc)
    • User-extensible (make your own, like Illinois school codes)
    • Foreign address
  • Geographic search by shapefiles
  • Geographic search by any drawn shape
  • Geographic search by distance
  • Map the data
  • Geocode addresses
  • Canned/saved searches
  • Import from MDB/Access
  • Import from shapefile
  • Import from DBF
  • Import from Google Refine, carry the audit trail into PANDA
  • Import/export to/from Google Docs
  • Export to Google Fusion Tables
  • Column statistics (std. dev., sum, etc)
  • Sysadmin notifications (you're running out of disk! etc.)
  • Single-click deployment
  • Automatic upgrades (like wordpress)
  • Search by taxonomy
  • De-normalize data / dataset merge (connect a table to its lookup table on import)
  • Fixtures to import (from the IRE data library, etc)
  • P13n, store queries that I like to run, etc

Meh

  • Encrypt all the data
  • Entity relationships (John Smith in dataset A = John Smith in dataset B, for neat stuff like social network analysis)
  • RDF , linked data endpoint
  • Deploy as a hosted service (somebody else can do that once we've written the regular version)
  • Automated server/resource scaling
  • Join datasets at runtime (reinvent SQL)
  • Non-tabular stuff (PDFs, emails, Doc Cloud and Overview Project)