CS59000: Topics in Big Data Systems

Instructor: Mohammad Sadoghi
E-mail: msadoghi@purdue.edu
Office: Lawson 2116N
Office Hours: By email appointments

Lecture Time: TueThr 9:00 am - 10:15 am
Location: Lawson Computer Science Building, Room LWSN-B134

Final Project Progress Meeting: MonWed 1:00 pm - 3:00 pm
Location: Lawson Computer Science Building, Room LWSN-2149C
Note: Each team must attend at least one project meeting per week


Overview

This new graduate seminar course surveys the recent developments in data management such as NoSQL (e.g., distributed key-value stores) and NewSQL (e.g., geo-distributed data stores) with respect to storage architecture (e.g., log-structured, row-oriented, and column-oriented layouts), concurrency controls (e.g., ranging from eventual consistency to full serializability), cloud computing and virtualization, the emerging commodity hardware trends (e.g., many-core and distributed main memory), modern hardware accelerators (e.g., GPUs, FPGAs, and SSDs), and types of workloads (e.g., OLTP and OLAP).


Syllabus

Textbooks:

Required: Optional:

Workload:

The most important component of the course is a semester-long creative project. The project can be done either individually or in groups (no limit on the group size). If the project is done in groups, then the role and contributions of each group member must be clearly articulated. All projects will be supervised by the instructor. The outcome of the final project will be in a form of a short research paper (with the ultimate goal of submitting to a top-tier conference, of course, it is optional and does not affect your final grade). Additionally, each student is expected to present 1 or 2 papers throughout the semester. Students must also submit a short written review for each paper they present (similar to the conference-style review). Students may choose to present additional papers instead of submitting paper reviews. The papers will be selected from the papers listed in the content tab. Students must coordinate with the instructor in advance about the time and the paper they wish to present.

Grading:

The final grade will be based upon the following components:
  • Paper Presentations (20%) + Paper Reviews (10%): 30%
  • Final Project: 70%

Course Policy:

In this class, we adopt the course policy as prescribed here.


Contents

Below you will find a comprehensive collection of papers related to the recent developments in the braod area of distributed transaction processing. For the purpose of this course, (1) we will be discussing only a small subset of these papers and (2) the preferred papers are bolded and colored in dark blue. Students are encourged to identify other relevant papers as well.

  • Eric Brewer. Towards robust distributed systems. PODC'00
  • Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition- tolerant web services. SIGACT News’02
  • Pat Helland. Life beyond distributed transactions: an apostate’s opinion. CIDR’07
  • Dan Pritchett. BASE: An Acid Alternative. Queue’08
  • Hiroshi Wada, Alan Fekete, Liang Zhao, Kevin Lee, and Anna Liu. Data consistency properties and the trade-offs in commercial cloud storage: the consumers’ perspective. CIDR’11
  • Eric Brewer. CAP twelve years later. How the “rules” have changed. Computer’12
  • D. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer’12
  • Peter Bailis and Ali Ghodsi. Eventual consistency today: Limitations, extensions, and beyond. ACM Queue’13
  • Jeffrey Dean and Luiz André Barroso. The tail at scale. Commun. ACM’13
  • Sean Hull. 20 obstacles to scalability. Commun. ACM’13
  • Peter Bailis and Kyle Kingsbury. The network is reliable: An informal survey of real-world communications failures. ACM Queue’14
  • Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Highly available transactions: Virtues and limitations. PVLDB’14
  • Divy Agrawal, Amr El Abbadi, and Kenneth Salem. A taxonomy of partitioned replicated cloud-based database systems. IEEE Data Eng. Bull.’15
  • Jose M. Faleiro and Daniel J. Abadi. FIT: A distributed database performance tradeoff. IEEE Data Eng. Bull.’15
  • Rachid Guerraoui, Matej Pavlovic, Dragos-Adrian Seredinschi: Trade-offs in Replicated Systems. IEEE Data Eng. Bull.'16
  • Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. The challenges of global-scale data management. SIGMOD’16
  • Andrew Pavlo and Matthew Aslett. What’s really new with NewSQL? SIGMOD Rec’16
  • Per-Åke Larson, Spyros Blanas, Cristian Diaconu,Craig Freedman, Jignesh M.Patel, and MikeZwilling. High-performance concurrency control mechanisms for main-memory databases. PVLDB’11
  • David B. Lomet, Alan Fekete, Rui Wang, and Peter Ward. Multi-version concurrency via timestamp range conflict management. ICDE’12
  • Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. Speedy transactions in multicore in-memory databases. SOSP ’13
  • Hyuck Han, Seongjae Park, Hyungsoo Jung, Alan Fekete, Uwe Röhm, and Heon Y. Yeom. Scalable serializable snapshot isolation for multicore systems. ICDE’14
  • Mohammad Sadoghi, Mustafa Canim, Bishwaranjan Bhattacharjee, Fabian Nagel, and Kenneth A. Ross. Reducing database locking contention through multi-version concurrency. PVLDB’14
  • Jose Faleiro, Alexander Thomson, and Daniel J Abadi. Lazy evaluation of transactions in database systems. SIGMOD’14
  • Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, Michael Stonebraker: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. PVLDB'14
  • Jose M. Faleiro and Daniel J. Abadi. Rethinking serializable multiversion concurrency control. Proc. VLDB’15
  • Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. Fast serializable multi-version concurrency control for main-memory database systems. SIGMOD’15
  • John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Çetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, Michael Stonebraker, Kristin Tufte, Hao Wang: S-Store: Streaming Meets Transaction Processing. PVLDB'15
  • H. Kimura. FOEDUS: OLTP engine for a thousand cores and NVRAM. SIGMOD'15.
  • Yuan Yuan, Kaibo Wang, Rubao Lee, Xiaoning Ding, Jing Xing, Spyros Blanas, Xiaodong Zhang: BCC: Reducing False Aborts in Optimistic Concurrency Control with Low Cost for In-Memory Databases. PVLDB'16
  • C. Yao, D. Agrawal, G. Chen, Q. Lin, B. C. Ooi, W. F. Wong, and M. Zhang. Exploiting single-threaded model in multi-core in-memory systems. TKDE’16
  • Xiangyao Yu, Andrew Pavlo, Daniel Sanchez, and Srinivas Devadas. TicToc: Time traveling optimistic concurrency control. SIGMOD’16
  • Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, and Ippokratis Pandis. ERMIA: fast memory-optimized database system for heterogeneous workloads. SIGMOD’16
  • Kun Ren, Jose M. Faleiro, and Daniel J. Abadi. Design principles for scaling multi-core OLTP under high contention. SIGMOD’16
  • Tianzheng Wang, Hideaki Kimura: Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores. PVLDB'17
  • Leslie Lamport. Time, clocks, and the ordering of events in a distributed system.Commun. ACM’78
  • Jim Gray. The transaction concept: Virtues and limitations. VLDB’81
  • A. Chan and R. Gray. Implementing distributed read-only transactions. TSE’85
  • C. Mohan, Bruce G. Lindsay, Ron Obermarck: Transaction Management in the R* Distributed Database Management System. TODS'86
  • Jim Gray and Leslie Lamport. Consensus on transaction commit. TODS’06
  • Philip A. Bernstein, Alan Fekete, Hongfei Guo, Raghu Ramakrishnan, and Pradeep Tamma. Relaxed-currency serializability for middle-tier caching and replication. SIGMOD’06
  • Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex Rasin, Stanley B. Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. H-store: a high-performance, distributed main memory transaction processing system. PVLDB’08
  • Evan P.C. Jones, Daniel J. Abadi, and Samuel Madden. Low overhead concurrency control for partitioned main memory databases. SIGMOD’10
  • Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. G-Store: a scalable data store for transactional multi key access in the cloud. SoCC’10
  • Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. Schism: a workload-driven approach to database replication and partitioning. VLDB’10
  • Philip A. Bernstein, Colin W. Reid, Ming Wu, and Xinhao Yuan. Optimistic concurrency control by melding trees. PVLDB’11
  • Jason Baker, Chris Bond, James C. Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. CIDR 2011
  • Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD’12
  • Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. ElasTraS: An elastic, scalable, and self- managing transactional database for the cloud. TODS’13
  • Justin J. Levandoski, David B. Lomet, Mohamed F. Mokbel, and Kevin Zhao. Deuteronomy: Transaction support for cloud data. In CIDR 2011
  • James A. Cowling, Barbara Liskov: Granola: Low-Overhead Distributed Transaction Coordination. USENIX ATC'12
  • Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and David Maier. Blazes: Coordination analysis for distributed programs. ICDE’14
  • Hatem A. Mahmoud, Vaibhav Arora, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. MaaT: Effective and scalable coordination of distributed transactions in the cloud. PVLDB’14
  • Sudip Roy, Lucja Kot, Gabriel Bender, Bailu Ding, Hossein Hojjat, Christoph Koch, Nate Foster, and Johannes Gehrke. The homeostasis protocol: Avoiding transaction coordination through program analysis. SIGMOD’15
  • Akon Dey, Alan Fekete, and Uwe Röhm. Scalable distributed transactions across heterogeneous stores. ICDE’15
  • Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Feral concurrency control: An empirical investigation of modern application integrity. SIGMOD’15
  • Simon Loesing, Markus Pilman, Thomas Etter, Donald Kossmann: On the Design and Scalability of Distributed Shared-Data Databases. SIGMOD'15
  • Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, Miguel Castro: No compromises: distributed transactions with consistency, availability, and performance. SOSP'15
  • Philip A. Bernstein and Sudipto Das. Scaling optimistic concurrency control by approximately partitioning the certifier and log. IEEE Data Eng. Bull.’15
  • Mohammad Sadoghi, Martin Jergler, Hans-Arno Jacobsen, Richard Hull, and Roman Vaculín. Safe distribution and parallel execution of data-centric workflows over the publish/subscribe abstraction. TKDE’15
  • Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Scalable atomic visibility with RAMP transactions. TODS’16
  • Chang Yao, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, and Sai Wu. Adaptive logging: Optimizing logging and recovery costs in distributed in-memory databases. SIGMOD’16
  • Tamer Eldeeb, Phil Bernstein. Transactions for Distributed Actors in the Cloud. MSR TechReport'16
  • Rachael Harding, Dana Van Aken, Andrew Pavlo, Michael Stonebraker. An Evaluation of Distributed Concurrency Control. VLDB'17
  • S.B. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR’85
  • Maurice Herlihy, Jeannette M. Wing: Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst.'90
  • Divyakant Agrawal et al. Consistency and orderability: semantics-based correctness criteria for databases. TODS’93
  • Rajeev Rastogi, Sharad Mehrotra, Yuri Breitbart, Henry F. Korth, and Avi Silberschatz. On correctness of non-serializable executions. PODS’93
  • H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil. A critique of ANSI SQL isolation levels. SIGMOD’95
  • Atul Adya. Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions. PhD Thesis. MIT'99
  • Werner Vogels. Eventually consistent. Queue’08
  • Alan David Fekete and Krithi Ramamritham. Consistency models for replicated data. In Replication: Theory and Practice’10
  • Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. SOSP ’11
  • Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and William R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. CIDR’11
  • David Bermbach and Stefan Tai. Eventual consistency: How soon is eventual? An evaluation of Amazon S3’s consistency behavior. MW4SOC’11
  • Lisa Glendenning, Ivan Beschastnikh, Arvind Krishnamurthy, Thomas E. Anderson: Scalable consistency in Scatter. SOSP'11
  • Sebastian Burckhardt, Daan Leijen, Manuel Fähndrich, and Mooly Sagiv. Eventually consistent transactions. ESOP’12
  • Neil Conway, William R Marczak, Peter Alvaro, Joseph M Hellerstein, and David Maier. Logic and lattices for distributed programming. SoCC’12
  • Kamal Zellag and Bettina Kemme. How consistent is your cloud application? SoCC’12
  • Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. The potential dangers of causal consistency and an explicit solution. SoCC ’12
  • Peter Alvaro, Peter Bailis, Neil Conway, and Joseph M. Hellerstein. Consistency without borders. SoCC’13
  • Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Bolt-on causal consistency. SIGMOD’13
  • Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Coordination avoidance in database systems. PVLDB’14
  • Valter Balegas, Sérgio Duarte, Carla Ferreira, Rodrigo Rodrigues, Nuno Preguiça, Mahsa Najafzadeh, and Marc Shapiro. Putting consistency back into eventual consistency. EuroSys’15
  • Marek Zawirski, Nuno Preguiça, Sérgio Duarte, Annette Bieniusa, Valter Balegas, and Marc Shapiro. Write fast, read in the past: Causal consistency for client-side applications. Middleware’15
  • Paolo Viotti, Marko Vukolic: Consistency in Non-Transactional Distributed Storage Systems. ACM Comput. Surv.'16
  • Natacha Crooks, Youer Pu, Nancy Estrada, Trinabh Gupta, Lorenzo Alvisi, Allen Clement: TARDiS: A Branch-and-Merge Approach To Weak Consistency. SIGMOD'16
  • Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. Transactional storage for geo-replicated systems. SOSP ’11
  • Stacy Patterson, Aaron J. Elmore, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. Serializability, not serial: Concurrency control and availability in multi-datacenter datastores. PVLDB’12
  • C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguiça, and R. Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. OSDI’12
  • Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. MDCC: multi-data center consistency. EuroSys ’13
  • Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. Message futures: Fast commitment of transactions in multi-datacenter environments. CIDR’13
  • Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, and Harsha V. Madhyastha. Spanstore: Cost-effective geo-replicated storage spanning multiple cloud services. SOSP ’13
  • Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. Transaction chains: achieving serializability with low latency in geo-distributed storage systems. SOSP’13
  • Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, David G. Andersen. Stronger semantics for low-latency geo-replicated storage. NSDI’13
  • Hatem A. Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi: Low-Latency Multi-Datacenter Databases using Replicated Commit. PVLDB'13
  • Gene Pang, Tim Kraska, Michael J. Franklin, and Alan Fekete. PLANET: making progress with commit processing in unpredictable environments. SIGMOD’14
  • Dan Dobre, Paolo Viotti, Marko Vukolic: Hybris: Robust Hybrid Cloud Storage. SoCC'14
  • Faisal Nawab, Vaibhav Arora, Divyakant Agrawal, and Amr El Abbadi. Chariots: A scalable shared log for data management in multi-datacenter cloud environments. EDBT’15
  • Faisal Nawab, Vaibhav Arora, Divyakant Agrawal, and Amr El Abbadi. Minimizing commit latency of transactions in geo-replicated data stores. SIGMOD’15
  • Alexander Thomson and Daniel J. Abadi. CalvinFS: Consistent WAN replication and scalable metadata management for distributed file systems. FAST’15
  • Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo Wu, Sandeep Govind Dhoot, Abhilash Rajesh Kumar, Ankur Agiwal, Sanjay Bhansali, Mingsheng Hong, Jamie Cameron, Masood Siddiqi, David Jones, Jeff Shute, Andrey Gubarev, Shivakumar Venkatara- man, and Divyakant Agrawal. Mesa: a geo-replicated online data warehouse for Google’s advertising system. Commun. ACM’16
  • Bettina Kemme and Gustavo Alonso. Don’t be lazy, be consistent: Postgres-R, A new way to implement database replication. In VLDB 2000
  • Bettina Kemme and Gustavo Alonso. A new approach to developing and implementing eager database replication protocols. TODS’00
  • Bettina Kemme, Fernando Pedone, Gustavo Alonso, André Schiper, and Matthias Wiesmann. Using optimistic atomic broadcast in transaction processing systems. IEEE Trans. Knowl. Data Eng.’03
  • K. Daudjee and K. Salem. Lazy database replication with ordering guarantees. In ICDE’04
  • Yasushi Saito and Marc Shapiro. Optimistic replication. ACM Comput. Surv.’05
  • Leslie Lamport. Fast Paxos. Distributed Computing’06
  • Yi Lin, Bettina Kemme, Ricardo Jiménez-Peris, et al. Snapshot isolation and integrity constraints in replicated databases. TODS’09
  • Flavio Paiva Junqueira, Benjamin C. Reed, Marco Serafini: Zab: High-performance broadcast for primary-backup systems. DSN'11
  • Marc Shapiro, Nuno M. Preguiça, Carlos Baquero, and Marek Zawirski. Conflict-free replicated data types. SSS’11
  • Mihaela A. Bornea, Orion Hodson, Sameh Elnikety, and Alan Fekete. One-copy serializability with snapshot isolation under the hood. ICDE’11
  • Hyungsoo Jung, Hyuck Han, Alan Fekete, and Uwe Röhm. Serializable snapshot isolation for replicated databases in high-update scenarios. PVLDB’11
  • Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums. PVLDB’12
  • Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. USENIX ATC’14
  • Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Quantifying eventual consistency with PBS. Commun. ACM’14
  • Carlos Eduardo Benevides Bezerra, Fernando Pedone, Robbert van Renesse: Scalable State-Machine Replication. DSN'14
  • Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports: Building consistent transactions with inconsistent replication. SOSP'15
  • Valter Balegas, Cheng Li, Mahsa Najafzadeh, Daniel Porto, Allen Clement, Sérgio Duarte, Carla Ferreira, Johannes Gehrke, João Leitão, Nuno M. Preguiça, Rodrigo Rodrigues, Marc Shapiro, Viktor Vafeiadis: Geo-Replication: Fast If Possible, Consistent If Necessary. IEEE Data Eng. Bull.’16
  • Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, Beng Chin Ooi, Hoang Tam Vo, Sai Wu, and Quanqing Xu. ES2: A cloud data storage system for supporting both OLTP and OLAP. ICDE’11
  • A. Kemper and T. Neumann. HyPer: A hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. ICDE’11
  • Joy Arulraj, Andrew Pavlo, and Prashanth Menon. Bridging the archipelago between row-stores and column-stores for hybrid workloads. SIGMOD’16
  • Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, and Mustafa Canim. L-Store: A real-time OLTP and OLAP system. CoRR’16
  • Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data. OSDI’06
  • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: amazon’s highly available key-value store. SOSP’07
  • Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans- Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. PNUTS: yahoo!’s hosted data serving platform. PVLDB’08
  • Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. LogBase: A scalable log-structured database system in the cloud. PVLDB’12
  • James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google’s Globally-Distributed Database. OSDI’12
  • Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. F1: A distributed SQL database that scales. PVLDB’13
  • Lin Qiao et al. On brewing fresh Espresso: LinkedIn’s distributed data serving platform. SIGMOD’13
  • John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, Ryan Stutsman: The case for RAMCloud. Commun. ACM'11
  • Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John K. Ousterhout, Mendel Rosenblum: Fast crash recovery in RAMCloud. SOSP'11
  • Viktor Leis, Alfons Kemper, and Thomas Neumann. Exploiting hardware transactional memory in main-memory databases. ICDE’14
  • Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, Orion Hodson: FaRM: Fast Remote Memory. NSDI'14
  • Hyeontaek Lim, Dongsu Han, David G. Andersen, Michael Kaminsky. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. NSDI'14
  • Stephen M. Rumble, Ankita Kejriwal, John K. Ousterhout: Log-structured memory for DRAM-based storage. FAST'14
  • Andrew Pavlo. Emerging hardware trends in large-scale transaction processing. IEEE Internet Computing’15
  • Marius Poke, Torsten Hoefler: DARE: High-Performance State Machine Replication on RDMA Networks. HPDC'15
  • Mohammadreza Najafi, Mohammad Sadoghi, Hans-Arno Jacobsen: The FQP Vision: Flexible Query Processing on a Reconfigurable Computing Fabric. SIGMOD Record'15
  • Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, Xiaodong Zhang: Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores. PVLDB'15
  • Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. HydraDB: A resilient RDMA-driven key-value middleware for in-memory cluster computing. SC’15
  • Alexander Matveev, Nir Shavit, Pascal Felber, Patrick Marlier. Read-log-update: a lightweight synchronization mechanism for concurrent programming. SOSP'15
  • Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, Haibo Chen. Fast in-memory transaction processing using RDMA and HTM. SOSP'15
  • Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen, Haibo Chen. Fast and general distributed transactions using RDMA and HTM. EuroSys'16
  • Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, Seongil O, Sukhan Lee, Pradeep Dubey. Achieving One Billion Key-Value Requests per Second on a Single Server. IEEE Micro'16
  • Mohammad Sadoghi, Kenneth A. Ross, Mustafa Canim, Bishwaranjan Bhattacharjee: Exploiting SSDs in operational multiversion databases. VLDB J.'16
  • Peter Bailis, Joy Arulraj, and Andrew Pavlo. Research for practice: distributed consensus and implications of NVM on database management systems. Commun. ACM’16
  • Zsolt István, David Sidler, and Gustavo Alonso, Marko Vukolić. Consensus in a Box: Inexpensive Coordination in Hardware. NSDI'16

Tentative Schedule

January 12, 2017:
  • ExpoDB: An Exploratory Data Science Platform [Instructor, Slides]
January 17, 2017:
  • CAP twelve years later. How the “rules” have changed. Computer’12 [Anshu Maheshwari, Slides]
  • Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer’12 [Anshu Maheshwari, Slides]
January 19, 2017:
  • FIT: A distributed database performance tradeoff. IEEE Data Eng. Bull.’15 [Thamir Qadah, Slides]
  • Eventual consistency today: Limitations, extensions, and beyond. ACM Queue’13 [Nomchin Banga, Slides]
January 24, 2017:
  • Rethinking serializable multiversion concurrency control. VLDB’15 [Suyash Gupta, Slides]
January 26, 2017:
  • Fast serializable multi-version concurrency control for main-memory database systems. SIGMOD’15 [Samodya Abeysiriwardane, Slides]
January 31, 2017:
  • TicToc: Time traveling optimistic concurrency control. SIGMOD’16 [Shreejit Nair, Slides]
February 2, 2017:
  • S-Store: Streaming Meets Transaction Processing. PVLDB'15 [Thamir Qadah, Slides]
February 7, 2017:
  • Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. PVLDB'14 [Vaibhav Jain, Slides]
February 9, 2017:
  • Calvin: fast distributed transactions for partitioned database systems. SIGMOD’12 [Ishan Chawla, Slides]
February 14, 2017:
  • Transaction Processing in ExpoDB: A Holistic View [Instructor]
  • L-Store: A real-time OLTP and OLAP system. CoRR’16 [Instructor]
February 16, 2017:
  • An Evaluation of Distributed Concurrency Control. VLDB'17 [Thamir Qadah, Slides]
February 21, 2017:
  • ExpoDB: Transaction Architecture [Nomchin Banga]
February 23, 2017:
  • FOEDUS: OLTP engine for a thousand cores and NVRAM. SIGMOD'15 [Aman Preet Singh, Slides]
February 28, 2017:
  • Write-Behind Logging. PVLDB'16 [Devesh Kumar Singh, Slides]
March 2, 2017:
  • Adaptive logging: Optimizing logging and recovery costs in distributed in-memory databases. SIGMOD’16 [Shreejit Nair, Slides]
March 6-10 2017: Progress Meetings (MonWed)/No Class [Midterm Project Review]

March 13-17 2017: No Class/No Progress Meetings [SPRING VACATION]

March 21, 2017:
  • TARDiS: A Branch-and-Merge Approach To Weak Consistency. SIGMOD’16 [Samodya Abeysiriwardane, Slides]
March 23, 2017:
  • ExpoDB: Distributed Transaction Architecture [Anshu Maheshwari]
March 28, 2017:
  • Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores. PVLDB'17 [Qing Wei, Slides]
March 30, 2017:
  • No compromises: distributed transactions with consistency, availability, and performance. SOSP'15 [Junlin Zhu, Slides]
April 4, 2017:
  • Speedy transactions in multicore in-memory databases. SOSP ’13 [Akshada Kulkarni, Slides]
  • Scalable atomic visibility with RAMP transactions. SIGMOD'14/TODS'16 [Suyash Gupta, Slides]
April 6, 2017:
  • Transactional storage for geo-replicated systems. SOSP ’11 [Menghan Li]
April 11, 2017:
  • MDCC: multi-data center consistency. EuroSys ’13 [Kartik Killawala, Slides]
April 13, 2017:
  • ERMIA: fast memory-optimized database system for heterogeneous workloads. SIGMOD’16 [Fengjian Pan, Slides]
  • Making geo-replicated systems fast as possible, consistent when necessary. OSDI’12 [Devesh Kumar Singh, Slides]
April 17-21 2017: Focus exclusively on completing the final projects [No Class/No Progress Meetings]

April 24-28 2017: Final Project Presentations on TueThr/No Progress Meetings on MonWed [Last Week of Class]
  • April 25, 2017: Final presentations for storage projects such as LSA, LSM, Github-like within Apache Spark ecosystem.
  • April 27, 2017: Final presentations for distributed transaction projects such as cache-conscious 2PL/2PC, optimized 3PC, parallel logging, and recovery within Apache Spark ecosystem.

Announcements

April 12, 2017: Final project report (i.e., a short research paper) is due on May 5, 2017. Please email the report/slides to the instructor and commit all your code/readme files to the main branch of the ExpoDB Git repo. It is important that we all converge into a unified branch that includes a coherent work of all groups.

April 12, 2017: Final project presentations will take place on April 25 (storage-related projects) and April 27 (transaction-related projects).

February 11, 2017: We will have two progress meeting weekly on Mondays and Wednesdays from 1-3pm in the room 2149C. Every team must attend at least one of these progress meetings each week. Each team needs to report their progress (plus discussing new ideas and challenges, etc.) during these meetings.

January 22, 2017: Please submit a one-page final project proposal (as a team) by Thursday, February 9 at the beginning of the class.

December 20, 2016: Welcome to CS 590. The first class will be on Thursday, January 12, 2017 (no class on January 10).


Handouts

Course materials/grades will be made available on your Blackboard account.