On the Computational and Statistical Interface and “Big Data”

http://strataconf.com/big-data-conference-ca-2015/public/schedule/detail/39588

Summary

  • Many conceptual and methematical challenges arising in taking seriously the problem of “Big Data”
  • Facing these challenges will require a rapprochement between computer science and statistics, bringing them together at the level of their foundations

Big Data Phenomenon

Science in confirmatory mode and exploatory mode

Conceptual/Mathematical Isues

  • The need to control statistical risk under constraints on algorithmic runtime
  • Statistical with distributed and streaming data
  • The tradeoff between statistical risk and privacy
  • Many other issues that reuire a blend of statistical thinking and computational thinkinng
    • Statistical thinking: e.g., a focus on sampling, confidence intervals, evaluation, diagnostics, causal inference
    • Computational thinking: e.g., scalability, abstraction

Data as a Resource

Big Data, Big Problems

  • Model complexity
  • Statistical control, involves algorithms, scale poorly
  • Need sophisticated algorithm

Our Approach

  • Take (classical) statistical decision theory

Outline

  • Background on minimax decision theory
  • Privacy constraints
  • Communication constraints
  • Computational constraints (via optimization)

Background

Similar Video