Skip to content

Latest commit

 

History

History
100 lines (81 loc) · 2.36 KB

Syllabus.md

File metadata and controls

100 lines (81 loc) · 2.36 KB

Section 1 Data Representation

  • Value of unstructured data
  • Value of structured data
  • Cost of data modeling
  • Naming resources
  • Locating resources
  • Relating resources
  • Syntax: XML, JSON, YAML, RDF, Python etc.
  • Implementation: some Python

Section 2 Databases

  • Cost of modeling and indexing semantics in database
  • Use relational DB for semantic modeling
  • Document database: MongoDB, Elastic Search
  • Graph database: TinkerPop stack, Neo4j, OrientDB
  • Graph batch processing: Pregel etc
  • Querying triples
  • Implementation: some Python

Section 3 Search

  • Database vs search engine
  • Inverted index and its cost
  • Extend inverted index to model semantic relations
  • Understanding user queries: from keywords to sentences
  • Faceted search: Elastic Search and Solr
  • Graph search
  • Implementation: some Python

Section 4 Data Exchange and Integration

  • Protocol Buffers and Thrift
  • JSON-RPC
  • XMPP and Google Wave Protocol
  • REST API design
  • Most important data APIs
  • Implementation: some Python

Section 5 Inference

  • Cost of inference
  • Just-in-time knowledge
  • Rules
  • Inference as graph operations
  • Inference using database
  • Inference with full-text search engine
  • Inference using functional programming
  • Implementation: some Python

Section 6 Knowledge Extraction

  • Cost of knowledge extraction
  • Data cleaning
  • Shallow parsing
  • Entity extraction
  • Relation extraction
  • Implementation: some Python

Section 7 Visualization

  • Cognitive background
  • Exhibit and others
  • D3 (and other JavaScript lib)
  • NetworkX (and other Python lib)
  • Implementation: some Python

Section 8 User Interaction

  • It's about people, not machine
  • Social machine
  • Text interface
  • Guided data exploration and discovery
  • Faceted Browser
  • Voice interface and personal assistants
  • Implementation: some Python

Section 9: Big Data, or not

  • Measuring semantics in data
  • Small is beautiful in knowledge
  • The knowledge growth principle
  • Small knowledge
  • Big knowledge
  • Big data
  • Datasets: Freebase, DBPedia, LOGD, etc.
  • Platforms: EC2 (boto), and some others
  • Implementation: some Python

Section 10 Lean Application Development

  • Build, measure, learn
  • Lean Canvas
  • MVP
  • Build: mockup strategies, pretotyping, prototyping
  • Measure: key metrics, but not vanity metrics
  • Learn: why^5
  • Semantic Wordpress/Drupal/Wiki, etc,
  • Implementation: some Python and some wiki