Jun Yang
D308 Levine Science Research Center
Box 90129
Duke University
Durham, North Carolina 27708-0129
Tel: 919-660-6587
Fax: 919-660-6519
Web: http://www.cs.duke.edu/~junyang/
Email: <cs.duke.edu, junyang>
Research Interests
- Database and data-intensive systems.
Education
Professional Experience
- Bishop-MacDermott Family Professor, Duke University, July 2020 - present.
- Chair, Computer Science
Department, Duke
University, July 2020 - June 2023.
- Associate Chair, Computer Science
Department, Duke
University, July 2017 - June 2020.
- Professor, Computer Science
Department, Duke
University, July 2014 - present.
- Associate Professor, Computer Science
Department, Duke
University, July 2008 - June 2014.
- Assistant Professor, Computer Science
Department, Duke
University, August 2001 - June 2008.
- Member of Technical Staff, Radik Software, August 2000 - August 2001.
- Software Engineer, ESS Technology,
Inc., August 1999 - August 2000.
- Research Assistant, Computer Science
Department, Stanford University, September 1995 - August 2000.
- Instructor, Computer Science
Department, Stanford University, Spring 1999.
- Teaching Assistant, Computer Science
Department, Stanford University, Spring 1998.
- Research Intern, IBM Almaden
Research Center, June 1996 - September 1996.
- Programmer, College of
Natural Resources, UC Berkeley, June 1994 - August 1995.
- Lab Assistant, UC Berkeley,
Computer Science Division, Spring 1994.
- Tutor, San Joaquin Delta
College, February 1992 - June 1993.
Publications
Published work:
- Haibo Xiu, Pankaj K. Agarwal, and Jun Yang. "PARQO: penalty-aware robust plan selection in query optimization." Proceedings of the VLDB Endowment, 17(13), 2024.
- Rickard Stureborg, Jenna Nichols, Bhuwan Dhingra, Jun Yang, Walter Orenstein, Robert A. Bednarczyk, and Lavanya Vasudevan. "Development and validation of VaxConcerns: a taxonomy of vaccine concerns and misinformation
with crowdsource-viability." Vaccine, 2024. [link]
- Vincent Capol, Yuxi Liu, Haibo Xiu, and Jun Yang. "CrypQ: a database benchmark based on dynamic, ever-evolving Ethereum data." In Proceedings of the 2024 TPC Technology Conference on Performance Evaluation and Benchmarking, Guangzhou, China, August 2024. [link]
- Jun Yang, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Sudeepa Roy, and Kristin Stephens-Martinez. "What teaching databases taught us about researching databases: extended talk abstract." In Proceedings of the 2024 International Workshop on Data Systems Education: Bridging
Education Practice with Education Research, pages 1-6, Santiago, Chile, June 2024.
- Rickard Stureborg, Sanxing Chen, Roy Xie, Aayushi Patel, Christopher Li, Chloe Zhu, Tingnan Hu, Jun Yang, and Bhuwan Dhingra. "Tailoring vaccine messaging with common-ground opinions." In Findings of the Association for Computational Linguistics: NAACL 2024, pages 2553-2575, Mexico City, Mexico, June 2024.
- Yihao Hu, Amir Gilad, Kristin Stephens-Martinez, Sudeepa Roy, and Jun Yang. "Qr-Hint: actionable hints towards correcting wrong SQL queries." In Proceedings of the 2024 ACM SIGMOD International Conference on Management of Data, Santiago, Chile, June 2024.
- Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang. "On reporting durable patterns in temporal proximity graphs." In Proceedings of the 2024 ACM Symposium on Principles of Database Systems, Santiago, Chile, June 2024.
- Pankaj Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang. "Computing data distribution from query selectivities." In Proceedings of the 2024 International Conference on Database Theory, Paestum, Italy, March 2024.
- Hanze Meng, Zhengjie Miao, Amir Gilad, Sudeepa Roy, and Jun Yang. "Characterizing and verifying queries via CInsGen." In Proceedings of the 2023 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, June 2023. Demonstration track.
- Rickard Stureborg, Bhuwan Dhingra, and Jun Yang. "Interface design for crowdsourcing hierarchical multi-label text annotations." In Proceedings of the 2023 International Conference on Human Factors in Computing Systems, Hamburg, Germany, April 2023.
- Sudeepa Roy and Jun Yang, ed. Special Issue on Widening the Impact of Data Engineering through Innovations in Education,
Interfaces, and Features, IEEE Data Engineering Bulletin, September 2022. 45(3). [link]
- Xiao Hu, Stavros Sintos, Junyang Gao, Pankaj Agarwal, and Jun Yang. "Computing complex temporal join queries efficiently." In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 2022.
- Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj Agarwal, Debmalya Panigrahi, Sudeepa Roy, and Jun Yang. "Selectivity functions of range queries are learnable." In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 2022.
- Amir Gilad, Zhengjie Miao, Sudeepa Roy, and Jun Yang. "Understanding queries by conditional instances." In Proceedings of the 2022 ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 2022.
- Yihao Hu, Zhengjie Miao, Zhiming Leong, Haechan Lim, Zachary Zheng, Sudeepa Roy, Kristin Stephens-Martinez, and Jun Yang. "I-Rex: an interactive relational query debugger for SQL." In Proceedings of the 2022 ACM Technical Symposium on Computer Science Education, Providence, Rhode Island, USA, March 2022. Demonstration track.
- Guoliang Li, Guo Yu, Jun Yang, and Ju Fan, ed. Special Topic on New Techniques of Database Systems, Journal of Software (Ruanjian
Xuebao), March 2022. 33(3).
- Chengkai Li and Jun Yang, ed. Special Issue on Data Engineering Challenges in Combating Misinformation, IEEE Data
Engineering Bulletin, September 2021. 44(3). [link]
- Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang. "Dynamic enumeration of similarity joins." In Proceedings of the 2021 International Colloquium on Automata, Languages, and Programming, pages 11:1-11:19, Glasgow, Scotland, July 2021.
- Junyang Gao, Yifan Xu, Pankaj Agarwal, and Jun Yang. "Efficiently answering durability prediction queries." In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data, pages 591-604, Xi'an, China, June 2021.
- Junyang Gao, Stavros Sintos, Pankaj K. Agarwal, and Jun Yang. "Durable top-k instant-stamped temporal records with user-specified scoring functions." In Proceedings of the 2021 International Conference on Data Engineering, pages 720-731, Chania, Greece, April 2021.
- Chenghong Wang, David Pujol, Yanping Zhang, Johes Bater, Matthew Lentz, Ashwin Machanavajjhala, Kartik Nayak, Lavanya Vasudevan, and Jun Yang. "Poirot: private contact summary aggregation." In Proceedings of the 2020 Privacy Preserving Machine Learning (NeurIPS Workshop), Virtual, December 2020.
- Yanping Zhang, Chenghong Wang, David Pujol, Johes Bater, Matthew Lentz, Ashwin Machanavajjhala, Kartik Nayak, Lavanya Vasudevan, and Jun Yang. "Poirot: private contact summary aggregation." In Proceedings of the 2020 ACM Conference on Embedded Networked Sensor Systems, Yokohama, Japan, November 2020. Poster track. Part of the research poster track on COVID-19 pandemic response.
- Zhengjie Miao, Tiangang Chen, Alexander Bendeck, Kevin Day, Sudeepa Roy, and Jun Yang. "I-Rex: an interactive relational query explainer for SQL." Proceedings of the VLDB Endowment, 13(12):2997-3000, August 2020. Demonstration description.
- Brett Walenz, Stavros Sintos, Sudeepa Roy, and Jun Yang. "Learning to sample: counting with complex queries." Proceedings of the VLDB Endowment, 13(3):390-402, 2019.
- Stavros Sintos, Pankaj Agarwal, and Jun Yang. "Selecting data to clean for fact checking: minimizing uncertainty vs. maximizing surprise." Proceedings of the VLDB Endowment, 12(13):2408-2421, 2019.
- Junyang Gao, Xian Li, Yifan Ethan Xu, Bunyamin Sisman, Xin Luna Dong, and Jun Yang. "Efficient knowledge graph accuracy evaluation." Proceedings of the VLDB Endowment, 12(11):1679-1691, 2019. [report]
- Naeemul Hassan, Chengkai Li, Jun Yang, and Cong Yu, ed. Special Issue on Combating Digital Misinformation and Disinformation, ACM Journal
of Data and Information Quality, July 2019. 11(3). [link]
- Zhengjie Miao, Sudeepa Roy, and Jun Yang. "RATest: explaining wrong relational queries using small examples." In Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pages 1961-1964, Amsterdam, Netherlands, June 2019. Demonstration track. [paper]
- Zhengjie Miao, Sudeepa Roy, and Jun Yang. "Explaining wrong queries using small examples." In Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data, pages 503-520, Amsterdam, Netherlands, June 2019. [paper]
- Guoliang Li, Jun Yang, João Gama, Juggapong Natwichai, and Yongxin Tong, ed. Proceedings of the 2019 International Conference on Database Systems for Advanced
Applications, Chiang Mai, Thailand, April 2019. Lecture Notes in Computer Science 11447. Springer. ISBN: 978-3-030-18578-7.
- Matthias Boehm, Arun Kumar, and Jun Yang. Data management in machine learning systems. Morgan & Claypool Publishers, February 2019. [paper]
- Bill Adair, Chengkai Li, Jun Yang, and Cong Yu. "Automated pop-up fact-checking: challenges and progress." In Proceedings of the 2019 Computation+Journalism Symposium, Miami, Florida, USA, February 2019. Informal publication. [paper]
- Jun Yang, Pankaj K. Agarwal, Sudeepa Roy, Brett Walenz, You Wu, Cong Yu, and Chengkai Li. "Query perturbation analysis: an adventure of database researchers in fact-checking." IEEE Data Engineering Bulletin, 41(3):28-42, 2018. Invited contribution. [paper]
- Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, and Jun Yang. "Interactive summarization and exploration of top aggregate query answers." Proceedings of the VLDB Endowment, 11(13):2196-2208, 2018. [paper]
- Junyang Gao, Pankaj Agarwal, and Jun Yang. "Durable top-k queries on temporal data." Proceedings of the VLDB Endowment, 11(13):2223-2235, 2018. [paper]
- Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, and Jun Yang. "QAGView: interactively summarizing high-valued aggregate query answers." In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data, pages 1709-1712, Houston, Texas, USA, June 2018. Demonstration track. [paper]
- Bill Adair, Chengkai Li, Jun Yang, and Cong Yu. "Progress toward “the holy grail”: the continued quest to automate fact-checking." In Proceedings of the 2017 Computation+Journalism Symposium, Evanston, Illinois, USA, October 2017. Informal publication.
- Brett Walenz, Sudeepa Roy, and Jun Yang. "Optimizing iceberg queries with complex joins." In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, pages 1243-1258, Chicago, Illinois, USA, May 2017. [paper]
- Arun Kumar, Matthias Boehm, and Jun Yang. "Data management in machine learning: challenges, techniques, and systems." In Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, pages 1717-1722, Chicago, Illinois, USA, May 2017. [paper]
- Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu, ed. Proceedings of the 2017 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, May 2017.
- Risi Thonangi and Jun Yang. "On log-structured merge for solid-state drives." In Proceedings of the 2017 International Conference on Data Engineering, pages 683-694, San Diego, California, USA, April 2017. [paper]
- Botong Huang and Jun Yang. "Cümülön-D: data analytics in a dynamic spot market." Proceedings of the VLDB Endowment, 10(8):865-876, April 2017. [paper]
- You Wu, Junyang Gao, Pankaj K. Agarwal, and Jun Yang. "Finding diverse, high-value representatives on a surface of answers." Proceedings of the VLDB Endowment, 10(7):793-804, March 2017. [paper]
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. "Computational fact checking through query perturbations." ACM Transactions on Database Systems, 42(1):4:1-4:41, March 2017. [paper]
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Top-k preferences in high dimensions." IEEE Transactions on Knowledge and Data Engineering, 28(2):311-325, 2016. Invited as a special selection from ICDE 2014. [paper]
- Brett Walenz and Jun Yang. "Perturbation analysis of database queries." Proceedings of the VLDB Endowment, 9(14):1635-1646, September 2016. [paper and report]
- Brett Walenz, Junyang Gao, Emre Sonmez, Yubo Tian, Yuhao Wen, Charles Xu, Bill Adair, and Jun Yang. "Fact checking congressional voting claims." In Proceedings of the 2016 Computation+Journalism Symposium, Stanford, California, USA, September 2016. Informal publication. [paper]
- Naeemul Hassan, Bill Adair, James T. Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. "The quest to automate fact-checking." In Proceedings of the 2015 Computation+Journalism Symposium, New York City, New York, USA, October 2015. Informal publication. [paper]
- Botong Huang, Nicholas W. D. Jarrett, Shivnath Babu, Sayan Mukherjee, and Jun Yang. "Cümülön: matrix-based data analytics in the cloud with spot instances." Proceedings of the VLDB Endowment, 9(3):156-167, September 2015. [paper and report]
- You Wu, Boulos Harb, Jun Yang, and Cong Yu. "Efficient evaluation of object-centric exploration queries for visualization." Proceedings of the VLDB Endowment, 8(12):1752-1763, August 2015. [paper]
- Jun Yang, ed. Special Issue on Visionary Ideas in Data Management, ACM SIGMOD Record, June 2015. 44(2). [link]
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. "Toward computational fact-checking." Proceedings of the VLDB Endowment, 7(7):589-600, 2014. [paper]
- Naeemul Hassan, Afroza Sultana, You Wu, Gensheng Zhang, Chengkai Li, Jun Yang, and Cong Yu. "Data in, fact out: automated monitoring of facts by FactWatcher." Proceedings of the VLDB Endowment, 7(13), 2014. Demonstration track. Winner of the Excellent Demonstration Award. [paper]
- Brett Walenz, You Wu, Seokhyun Song, Emre Sonmez, Eric Wu, Kevin Wu, Pankaj K. Agarwal, Jun Yang, Naeemul Hassan, Afroza Sultana, Gensheng Zhang, Chengkai Li, and Cong Yu. "Finding, monitoring, and checking claims computationally based on structured data." In Proceedings of the 2014 Computation+Journalism Symposium, New York City, New York, USA, October 2014. Informal publication, with contents drawn from SIGMOD 2014 and VLDB 2014 demos. [paper]
- Bill Adair, Jun Yang, and the uclaim/icheck Team. "Turning computers into fact-checkers." American Journalism Review, October 2014. Invited contribution. [link and paper]
- Botong Huang, Nicholas W. D. Jarrett, Shivnath Babu, Sayan Mukherjee, and Jun Yang. "Cumulon: cloud-based statistical analysis from users' perspective." IEEE Data Engineering Bulletin, 37(3):77-89, September 2014. Invited contribution. [paper]
- Rada Chirkova and Jun Yang, ed. Proceedings of the 2014 International Workshop on Bringing the Value of Big Data to
Users, Hangzhou, China, September 2014. [link]
- You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. "iCheck: computationally combating “lies, d—ned lies, and statistics”." In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, Utah, USA, June 2014. Demonstration track. [paper]
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Top-k preferences in high dimensions." In Proceedings of the 2014 International Conference on Data Engineering, Chicago, Illinois, USA, March 2014. Results in this paper are subsumed by those in the TKDE 2016 paper by the same authors.
- Afroza Sultana, Naeemul Hassan, Chengkai Li, Jun Yang, and Cong Yu. "Incremental discovery of prominent situational facts." In Proceedings of the 2014 International Conference on Data Engineering, Chicago, Illinois, USA, March 2014. [paper and slides]
- Risi Thonangi and Jun Yang. "Permuting data on random-access block storage." Proceedings of the VLDB Endowment, 6(9):721-732, 2013. [errata, paper, and report]
- Botong Huang, Shivnath Babu, and Jun Yang. "Cumulon: optimizing statistical data analysis in the cloud." In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York City, New York, USA, June 2013. [paper and slides]
- Yi Zhang, Kristian Lum, and Jun Yang. "Failure-aware cascaded suppression in wireless sensor networks." IEEE Transactions on Knowledge and Data Engineering, 25(5):1042-1055, May 2013. [paper and supplemental]
- Pankaj K. Agarwal, Lars Arge, Sathish Govindarajan, Jun Yang, and Ke Yi. "Efficient external memory structures for range-aggregate queries." Computational Geometry: Theory and Applications, 46(3):358-370, April 2013. [paper]
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Subscriber assignment for wide-area content-based publish/subscribe." IEEE Transactions on Knowledge and Data Engineering, 24(10):1833-1847, 2012. Invited as a special selection from ICDE 2011. [paper and supplemental]
- S. N. Lahiri, XuanLong Nguyen, Jun Yang, Zhengyuan Zhu, and P. Banerjee. "Wireless sensor networks: statistical issues and challenges." Journal of the Indian Statistical Association, 50(1–2):151-191, 2012.
- Rada Chirkova and Jun Yang. "Materialized views." Foundations and Trends in Databases, 4(4):295-405, 2012. [paper]
- Risi Thonangi, Shivnath Babu, and Jun Yang. "A practical concurrent index for solid-state drives." In Proceedings of the 2012 International Conference on Information and Knowledge Management, pages 1332-1341, Maui, Hawaii, USA, October 2012. Databases track. [paper and report]
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. "On “one of the few” objects." In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 1487-1495, Beijing, China, August 2012. [paper and report]
- Yi Zhang and Jun Yang. "Optimizing I/O for big array analytics." Proceedings of the VLDB Endowment, 5(8):764-775, June 2012. [paper]
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Processing a large number of continuous preference top-k queries." In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 397-408, Scottsdale, Arizona, USA, May 2012. [paper]
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Processing and notifying range top-k subscriptions." In Proceedings of the 2012 International Conference on Data Engineering, pages 810-821, Washington DC, USA, April 2012. [paper and report]
- Yi Zhang, Kamesh Munagala, and Jun Yang. "Storing matrices on disk: theory and practice revisited." Proceedings of the VLDB Endowment, 4(11):1075-1086, August 2011. [paper and report]
- James S. Clark, Pankaj K. Agarwal, David M. Bell, Paul G. Flikkema, Alan Gelfand, Xuanlong Nguyen, Eric Ward, and Jun Yang. "Inferential ecosystem models, from network data to prediction." Ecological Applications, 21(5):1523-1536, July 2011.
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Subscriber assignment for wide-area content-based publish/subscribe." In Proceedings of the 2011 International Conference on Data Engineering, pages 267-278, Hannover, Germany, April 2011. Results in this paper are subsumed by those in the TKDE 2012 paper by the same authors. [paper and report]
- Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. "Computational journalism: a call to arms to database researchers." In Proceedings of the 2011 Conference on Innovative Data Systems Research, Asilomar, California, USA, January 2011. Outrageous ideas and vision track. Third-place winner of the Best Outrageous Ideas
and Vision Track Paper Competition sponsored by the Computing Community Consortium. [paper and slides]
- Lei Chen, Changjie Tang, Jun Yang, and Yunjun Gao, ed. Proceedings of the 2010 International Conference on Web-Age Information Management, Jiuzhaigou, Sichuan, China, July 2010. Lecture Notes in Computer Science 6184. Springer. ISBN: 978-3-642-14245-1.
- Yi Zhang, Weiping Zhang, and Jun Yang. "I/O-efficient statistical computing with RIOT." In Proceedings of the 2010 International Conference on Data Engineering, pages 1157-1160, Long Beach, California, USA, March 2010. Demonstration track. [paper and poster]
- Jun Yang, Kamesh Munagala, and Adam Silberstein. "Data aggregation in sensor networks." In Encyclopedia of Database Systems. Ling Liu and M. Tamer Özsu, ed. Springer. 2009. Invited contribution.
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. "Generating wide-area content-based publish/subscribe workloads." In Proceedings of the 2009 Workshop on Networking Meets Databases, Big Sky, Montana, USA, October 2009. [paper]
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Input-sensitive scalable continuous join query processing." ACM Transactions on Database Systems, 34(3):1-41, August 2009. [paper]
- Fei Chen, Byron J. Gao, AnHai Doan, Jun Yang, and Raghu Ramakrishnan. "Optimizing complex extraction programs over evolving text data." In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pages 321-334, Providence, Rhode Island, USA, June 2009. [paper]
- Risi Thonangi, Hao He, AnHai Doan, Haixun Wang, and Jun Yang. "Weighted proximity best-joins for information retrieval." In Proceedings of the 2009 International Conference on Data Engineering, pages 234-245, Shanghai, China, March 2009. [paper]
- Yi Zhang, Herodotos Herodotou, and Jun Yang. "RIOT: I/O-efficient numerical computing without SQL." In Proceedings of the 2009 Conference on Innovative Data Systems Research, Asilomar, California, USA, January 2009. [paper and slides]
- Badrish Chandramouli and Jun Yang. "End-to-end support for joins in large-scale publish/subscribe systems." In Proceedings of the 2008 International Conference on Very Large Data Bases, pages 434-450, Auckland, New Zealand, August 2008. Infrastructure track. [paper]
- Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, and Ying Zheng. "ProSem: scalable wide-area publish/subscribe." In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1315-1318, Vancouver, Canada, June 2008. Demonstration track. Acceptance rate: 31.9 percent. [paper]
- Junyi Xie, Jun Yang, Yuguo Chen, Haixun Wang, and Philip S. Yu. "A sampling-based approach to information recovery." In Proceedings of the 2008 International Conference on Data Engineering, pages 476-485, Cancun, Mexico, April 2008. Short presentation track. Acceptance rate: 19.2 percent of 715. Full paper. [paper]
- Fei Chen, AnHai Doan, Jun Yang, and Raghu Ramakrishnan. "Efficient information extraction over evolving text data." In Proceedings of the 2008 International Conference on Data Engineering, pages 943-952, Cancun, Mexico, April 2008. Acceptance rate: 12.1 percent of 715. [paper]
- Magdalena Balazinska, Amol Deshpande, Alexandros Labrinidis, Qiong Luo, Samuel Madden, and Jun Yang. "Report on the fourth international workshop on data management for sensor networks
(DMSN 2007)." ACM SIGMOD Record, 36(4):53-55, 2007.
- Adam Silberstein, Gavino Puggioni, Alan E. Gelfand, Kamesh Munagala, and Jun Yang. "Suppression and failures in sensor data: a Bayesian approach." In Proceedings of the 2007 International Conference on Very Large Data Bases, pages 842-853, Vienna, Austria, September 2007. Infrastructure track. Acceptance rate: 45 out of 275. [paper]
- Badrish Chandramouli, Jeff M. Phillips, and Jun Yang. "Value-based notification conditions in large-scale publish/subscribe systems." In Proceedings of the 2007 International Conference on Very Large Data Bases, pages 878-889, Vienna, Austria, September 2007. Infrastructure track. Acceptance rate: 45 out of 275. [paper]
- Magdalena Balazinska, Amol Deshpande, Qiong Luo, and Jun Yang, ed. Proceedings of the 2007 International Workshop on Data Management for Sensor Networks, Vienna, Austria, September 2007.
- Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. "BLINKS: ranked keyword searches on graphs." In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pages 305-316, Beijing, China, June 2007. Acceptance rate: 70 out of 480. [paper and report]
- Badrish Chandramouli, Christopher N. Bond, Shivnath Babu, and Jun Yang. "Query suspend and resume." In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pages 557-568, Beijing, China, June 2007. Acceptance rate: 70 out of 480. [paper and report]
- Adam Silberstein and Jun Yang. "Many-to-many aggregation for sensor networks." In Proceedings of the 2007 International Conference on Data Engineering, pages 986-995, Istanbul, Turkey, April 2007. Acceptance rate: 122 out of 659. [paper and report]
- Badrish Chandramouli, Christopher Bond, Shivnath Babu, and Jun Yang. "On suspending and resuming dataflows." In Proceedings of the 2007 International Conference on Data Engineering, pages 1289-1291, Istanbul, Turkey, April 2007. Poster track. Acceptance rate: 60(+122) out of 659. Results in this paper are subsumed
by those in the SIGMOD 2007 paper by the same authors.
- Adam Silberstein, Gregory Filpus, Kamesh Munagala, and Jun Yang. "Data-driven processing in sensor networks." In Proceedings of the 2007 Conference on Innovative Data Systems Research, pages 10-21, Asilomar, California, USA, January 2007. Acceptance rate: 34 out of 98. [paper]
- Junyi Xie and Jun Yang. "A survey of join processing in data streams." In Data Streams: Models and Algorithms. Charu C. Aggarwal, ed. Springer. November 2006. Invited contribution. [paper]
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Scalable continuous query processing by tracking hotspots." In Proceedings of the 2006 International Conference on Very Large Data Bases, pages 31-42, Seoul, Korea, September 2006. Core database track. Acceptance rate: 46 out of 334. Results in this paper are subsumed
by those in the 2009 TODS paper by the same authors. [paper and report]
- Adam Silberstein, Kamesh Munagala, and Jun Yang. "Energy-efficient monitoring of extreme values in sensor networks." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 169-180, Chicago, Illinois, USA, June 2006. Acceptance rate: 58 out of 446. [paper]
- Adam Silberstein, Rebecca Braynard, and Jun Yang. "Constraint chaining: on energy-efficient continuous monitoring in sensor networks." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 157-168, Chicago, Illinois, USA, June 2006. Acceptance rate: 58 out of 446. [paper]
- Badrish Chandramouli, Junyi Xie, and Jun Yang. "On the database/network interface in large-scale publish/subscribe systems." In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 587-598, Chicago, Illinois, USA, June 2006. Acceptance rate: 58 out of 446. [paper and report]
- Paul G. Flikkema, Pankaj K. Agarwal, James S. Clark, Carla Schlatter Ellis, Alan Gelfand, Kamesh Munagala, and Jun Yang. "Model-driven dynamic control of embedded wireless sensor networks." In Proceedings of the 2006 International Conference on Computational Science, pages 409-416, Reading, United Kingdom, May 2006.
- Haixun Wang, Hao He, Jun Yang, Philip S. Yu, and Jeffrey Xu Yu. "Dual labeling: answering graph reachability queries in constant time." In Proceedings of the 2006 International Conference on Data Engineering, Atlanta, Georgia, USA, April 2006. Acceptance rate: 89 out of 456. [paper]
- Adam Silberstein, Rebecca Braynard, and Jun Yang. "Energy-efficient continuous isoline queries in sensor networks." In Proceedings of the 2006 International Conference on Data Engineering, Atlanta, Georgia, USA, April 2006. Poster track. Results in this paper are subsumed by those in the SIGMOD 2006 paper
by the same authors. [paper]
- Adam Silberstein, Rebecca Braynard, Carla Ellis, Kamesh Munagala, and Jun Yang. "A sampling-based approach to optimizing top-k queries in sensor networks." In Proceedings of the 2006 International Conference on Data Engineering, Atlanta, Georgia, USA, April 2006. Acceptance rate: 89 out of 456. [paper]
- Badrish Chandramouli, Jun Yang, and Amin Vahdat. "Distributed network querying with bounded approximate caching." In Proceedings of the 2006 International Conference on Database Systems for Advanced
Applications, pages 374-388, Singapore, April 2006. Acceptance rate: 24.5 percent. [paper and report]
- Pankaj K. Agarwal, Junyi Xie, Jun Yang, and Hai Yu. "Monitoring continuous band-join queries over dynamic data." In Proceedings of the 2005 International Symposium on Algorithms and Computation, pages 349-359, Sanya, Hainan, China, December 2005. [paper]
- Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. "Compact reachability labeling for graph-structured data." In Proceedings of the 2005 International Conference on Information and Knowledge Management, pages 594-601, Bremen, Germany, November 2005. Acceptance rate: 76 out of 425. [paper and report]
- Kamesh Munagala, Jun Yang, and Hai Yu. "Online view maintenance under a response-time constraint." In Proceedings of the 2005 European Symposium on Algorithms, pages 677-688, Palma de Mallorca, Spain, October 2005. [paper]
- Wenfei Fan, Zhaohui Wu, and Jun Yang, ed. Proceedings of the 2005 International Conference on Web-Age Information Management, Hangzhou, China, October 2005. Lecture Notes in Computer Science 3739. Springer. ISBN: 3-540-29227-6.
- Junyi Xie, Jun Yang, and Yuguo Chen. "On joining and caching stochastic streams." In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pages 359-370, Baltimore, Maryland, USA, June 2005. Acceptance rate: 65 out of 431. [paper and report]
- Adam Silberstein, Hao He, Ke Yi, and Jun Yang. "BOXes: efficient maintenance of order-based labeling for dynamic XML data." In Proceedings of the 2005 International Conference on Data Engineering, pages 285-296, Tokyo, Japan, April 2005. Acceptance rate: 67 out of 521. [paper and report]
- Hao He, Junyi Xie, Jun Yang, and Hai Yu. "Asymmetric batch incremental view maintenance." In Proceedings of the 2005 International Conference on Data Engineering, pages 106-117, Tokyo, Japan, April 2005. Acceptance rate: 67 out of 521. [paper]
- Junfei Geng and Jun Yang. "AutoBib: automatic extraction of bibliographic information on the Web." In Proceedings of the 2004 International Database Engineering and Applications Symposium, pages 193-204, Coimbra, Portugal, July 2004. [paper]
- Ke Yi, Hao He, Ioana Stanoi, and Jun Yang. "Incremental maintenance of XML structural indexes." In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 491-502, Paris, France, June 2004. Acceptance rate: 69 out of 431. [paper]
- Adam Silberstein and Jun Yang. "NeXSort: sorting XML in external memory." In Proceedings of the 2004 International Conference on Data Engineering, pages 695-706, Boston, Massachusetts, USA, April 2004. Acceptance rate: 63 out of 441. [paper and report]
- Hao He and Jun Yang. "Multiresolution indexing of XML for frequent queries." In Proceedings of the 2004 International Conference on Data Engineering, pages 683-694, Boston, Massachusetts, USA, April 2004. Acceptance rate: 63 out of 441. [paper and report]
- Jun Yang and Jennifer Widom. "Incremental computation and maintenance of temporal aggregates." The VLDB Journal, 12(3):262-283, 2003. [paper]
- Zhiyuan Chen, Li Chen, Jian Pei, Yufei Tao, Haixun Wang, Wei Wang, Jiong Yang, Jun Yang, and Donghui Zhang. "Recent progress on selected topics in database research: a report by nine young chinese
researchers working in the united states." Journal of Computer Science and Technology, 18(5):538-552, September 2003.
- Pankaj K. Agarwal, Lars Arge, Jun Yang, and Ke Yi. "I/O-efficient structures for orthogonal range-max and stabbing-max queries." In Proceedings of the 2003 European Symposium on Algorithms, pages 7-18, Budapest, Hungary, September 2003.
- Xiao Huang, Qiang Xue, and Jun Yang. "TupleRank and implicit relationship discovery in relational databases." In Proceedings of the 2003 International Conference on Web-Age Information Management, pages 445-457, Chengdu, China, August 2003. Acceptance rate: 30 out of 258. [paper and report]
- Ke Yi, Hai Yu, Jun Yang, Gangqiang Xia, and Yuguo Chen. "Efficient maintenance of materialized top-k views." In Proceedings of the 2003 International Conference on Data Engineering, pages 189-200, Bangalore, India, March 2003. Acceptance rate: 51 out of 378. [paper and report]
- Jun Yang. "Temporal data warehousing." Ph.D. Dissertation, Stanford University, August 2001.
- Jun Yang and Jennifer Widom. "Incremental computation and maintenance of temporal aggregates." In Proceedings of the 2001 International Conference on Data Engineering, pages 51-60, Heidelberg, Germany, April 2001. Acceptance rate: 14 percent. Results in this paper are subsumed by those in the 2003
VLDB Journal paper by the same authors
- Wilburt Juan Labio, Jun Yang, Yingwei Cui, Hector Garcia-Molina, and Jennifer Widom. "Performance issues in incremental warehouse maintenance." In Proceedings of the 2000 International Conference on Very Large Data Bases, pages 461-472, Cairo, Egypt, September 2000. Acceptance rate: 53 out of 351.
- Jun Yang, Huacheng C. Ying, and Jennifer Widom. "TIP: a temporal extension to informix." In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, page 596, Dallas, Texas, USA, May 2000. Demonstration track.
- Jun Yang, Huacheng C. Ying, and Jennifer Widom. "TIP: a temporal extension to informix." In Proceedings of the 2000 International Conference on Extending Database Technology, Konstanz, Germany, March 2000. Demonstration track. An improved version was shown in SIGMOD 2000.
- Jun Yang and Jennifer Widom. "Temporal view self-maintenance." In Proceedings of the 2000 International Conference on Extending Database Technology, pages 395-412, Konstanz, Germany, March 2000. Acceptance rate: 16.7 percent.
- Hector Garcia-Molina, Wilburt Juan Labio, and Jun Yang. "Expiring data in a warehouse." In Proceedings of the 1998 International Conference on Very Large Data Bases, pages 500-511, New York City, New York, USA, August 1998. Acceptance rate: 16 percent.
- Jun Yang and Jennifer Widom. "Maintaining temporal views over non-temporal information sources for data warehousing." In Proceedings of the 1998 International Conference on Extending Database Technology, pages 389-403, Valencia, Spain, March 1998. Acceptance rate: 32 out of 191.
- Laura M. Haas, Donald Kossmann, Edward L. Wimmers, and Jun Yang. "Optimizing queries across diverse data sources." In Proceedings of the 1997 International Conference on Very Large Data Bases, pages 276-285, Athens, Greece, August 1997. Acceptance rate: 16 percent.
- Laura M. Haas, Donald Kossmann, Edward L. Wimmers, and Jun Yang. "An optimizer for heterogeneous systems with non-standard data and search capabilities." IEEE Data Engineering Bulletin, 19(4):37-44, December 1996.
- Steve G. Steinberg, Jun Yang, and Katherine A. Yelick. "Performance modeling and composition: a case study in cell simulation." In Proceedings of the 1996 International Parallel Processing Symposium, pages 68-74, Honolulu, Hawaii, USA, April 1996. Acceptance rate: 35 percent.
Technical reports:
- Kevin A. Walsh, Amin Vahdat, and Jun Yang. "Enabling wide-area replication of database services with continuous consistency." Technical Report, Duke University, February 2002. [report]
- Jun Yang, Jennifer Widom, and Paul Brown. "Implementing parameterized range types in an extensible DBMS." Technical Report, Stanford University, November 2000.
Funding
Current funding:
- Principal investigator. III: Medium: Responsive Optimization for Algorithmic Decision Systems. NSF IIS. July 2024 - June 2027. With Pankaj K. Agarwal and Kamesh Munagala.
- Co-PI. III: Medium: Ask the Experts: Generating Question-Answer
Pairs for Addressing Information Deficits about Vaccines. NSF IIS. September 2022 - August 2026. With Bhuwan Dhingra (PI) and Lavanya Vasudevan.
- Principal investigator. Using Natural Language Processing and Other Tools to Address Misinformation. Google. October 2021 - present. With Bill Adair, Bhuwan Dhingra, Lavanya Vasudevan, and others.
Past funding:
- Principal investigator. III: Small: HNRQ: Helping Novices Learn and Debug Relational Queries. NSF IIS. September 2020 - August 2023. With Sudeepa Roy and Kristin Stephens-Martinez.
- Principal investigator. III: Small: Durability Queries in Databases. NSF IIS. September 2018 - August 2022. With Pankaj K. Agarwal.
- Co-PI. RAPID: Poirot: From Contact Tracing to Private Exposure Detection. NSF CNS Secure and Trustworthy Computing. May 2020 - April 2022. With Kartik Nayak (PI), Ashwin Machanavajjhala, and Lavanya Vasudevan.
- Co-PI. Convergence Accelerator Phase I (RAISE): Credible Open Knowledge Network. NSF OIA. September 2019 - May 2022. With Chengkai Li (Lead PI; University of Texas at Arlington), Ashwin Machanavajjhala, and others.
- Principal investigator. Google Cloud Platform Education Grant (CompSci 316, Fall 2021). Google. August 2021 - December 2021.
- Principal investigator. III: Small: Collaborative Research: Towards End-to-End Computer-Assisted Fact-Checking. NSF IIS. September 2017 - August 2021. With Chengkai Li (Lead PI; University of Texas at Arlington) and Mark Tremayne (University of Texas at Arlington).
- Co-investigator. Duke Tech & Check Cooperative. Knight Foundation, Facebook, and Newmark Foundation. September 2017 - August 2020. With Bill Adair (PI) and others.
- Principal investigator. Data and Technology for Fact-checking. Bass Connections, Information, Society and Culture Track, Duke University. June 2018 - May 2020. With Bill Adair and Pankaj K. Agarwal.
- Principal investigator. Google Cloud Platform Education Grant (CompSci 316, Fall 2019). Google. August 2019 - December 2019.
- Principal investigator. III: Medium: Collaborative Research: Perturbation Analysis of Database Queries. NSF IIS. September 2014 - August 2019. With Pankaj K. Agarwal, James T. Hamilton (Stanford), and Chengkai Li (University of Texas at Arlington).
- Principal investigator. Supplemental Award for III: Medium: Collaborative Research: Perturbation Analysis
of Database Queries. NSF REU Program. March 2015.
- Principal investigator. Google Cloud Platform Education Grant (CompSci 316, Fall 2018). Google. August 2018 - December 2018.
- Principal investigator. III: Small: DBMS+: Management System for the Next-Generation Database. NSF IIS. September 2016 - August 2018. With Shivnath Babu (former PI).
- Principal investigator. Google Cloud Platform Education Grant (CompSci 316, Fall 2017). Google. August 2017 - December 2017.
- Co-PI. III: Student Travel Fellowships for SIGMOD 2017. NSF IIS. April 2017 - January 2018. With Sudeepa Roy (PI) and Ashwin Machanavajjhala.
- Principal investigator. III: Small: Cumulon: Easy and Efficient Statistical Computing in the Cloud. NSF IIS. September 2013 - August 2017. With Shivnath Babu, Sayan Mukherjee, and Michael Ward.
- Principal investigator. Google Cloud Platform Education Grant (CompSci 316, Fall 2016). Google. August 2016 - December 2016.
- Principal investigator. AWS in Education Grant (CompSci 316, Fall 2015). Amazon. August 2015 - August 2016.
- Co-investigator. ClaimBuster. Knight Foundation Prototype Fund. November 2015 - May 2016. With Chengkai Li (University of Texas at Arlington) and others.
- Principal investigator. AWS in Education Grant (CompSci 216, Spring 2015). Amazon. January 2015 - January 2016. With Ashwin Machanavajjhala.
- Principal investigator. Supporting Data-Intensive Applications with Google Cloud. Google Research Cloud Credits Award. September 2014 - September 2015.
- Principal investigator. Towards Computational Fact-Checking and Lead-Finding. Google Faculty Research Award. June 2014 - August 2015.
- Principal investigator. AWS in Education Grant (CompSci 316, Fall 2014). Amazon. August 2014 - August 2015.
- Principal investigator. III-Small: RIOT: Statistical Computing with Efficient, Transparent I/O. NSF IIS Division. September 2009 - August 2014.
- Principal investigator. Supplemental Award for III-Small: RIOT: Statistical Computing with Efficient, Transparent
I/O. NSF REU Program. June 2010.
- Principal investigator. Computational Journalism: Bringing Together Social and Computer Scientists to Help
Journalism. Trinity College Bi-Annual Interdepartmental Collaboration Mini-Grants Program, Duke
University. July 2013 - June 2014. With Pankaj K. Agarwal and James T. Hamilton.
- Principal investigator. AWS in Education Grant (CompSci 290.01, Spring 2014). Amazon. December 2013 - December 2014. With Ashwin Machanavajjhala.
- Principal investigator. Provisioning and Optimization for Statistical Workloads in a Cloud. Amazon Web Services. July 2012 - July 2014. With Shivnath Babu.
- Principal investigator. Windows Azure Educator Grant (CompSci 316, Fall 2013). Microsoft. August 2013 - January 2014.
- Principal investigator. RIOT: Transparent Scalability for Statistical Analysis of Massive Datasets. HP Labs Innovation Research Program. August 2010 - July 2011.
- Investigator. Modeling Immunity for Biodefense. BAA-NIAID-DAIT-NIHAI2009074. September 2010 - August 2015. With Thomas B. Kepler and others.
- Principal investigator. III-COR: Scalable Publish/Subscribe: Unifying Data Processing and Dissemination. NSF IIS Division. September 2007 - August 2011. With Pankaj K. Agarwal.
- Principal investigator. Supplemental Award for III-COR: Scalable Publish/Subscribe: Unifying Data Processing
and Dissemination. NSF REU Program. June 2009.
- Co-PI. Doctoral Program in Management and Analysis of Large Data Acquired from Sensors. Department of Education GAANN Program. May 2007. With Pankaj K. Agarwal and others.
- Co-PI. Integration of IBM Management Software with Campus Blade Clusters in Support of Duke
Academic Infrastructure. IBM Shared University Research (SUR) Program. June 2006. With Richard Lucic and others.
- Co-PI. COLLABORATIVE RESEARCH: DDDAS-TMRP: Dynamic Sensor Networks---Enabling the Measurement,
Modeling, and Prediction of Biophysical Change in a Landscape. NSF CNS DDDAS Program. January 2006 - December 2011. With James S. Clark and others.
- Co-PI. Supplemental Award for COLLABORATIVE RESEARCH: DDDAS-TMRP: Dynamic Sensor Networks---Enabling
the Measurement, Modeling, and Prediction of Biophysical Change in a Landscape. NSF REU Program. July 2006. With James S. Clark and others.
- Investigator. Multiscale Integrative Immunology for Adjuvant Development. NIH-NIAID-DAIT-BAA-05-10. September 2005 - August 2010. With Thomas B. Kepler and others.
- Principal investigator. CAREER: Techniques and Applications of Derived Data Maintenance. NSF CAREER Program. September 2003 - August 2008.
- Principal investigator. Supplemental Award for CAREER: Techniques and Applications of Derived Data Maintenance. NSF REU Program. June 2006.
Honors and Awards
- Bass Society of Fellows, Duke University, July 2020 - June 2025.
- Google Faculty Research Awards, June 2014 and August 2021.
- ACM Distinguished Member, October 2019.
- Distinguished Reviewer Award, Proceedings of the VLDB Endowment (PVLDB), August 2019.
- Excellent Demonstration Award at
the 2014 International Conference on Very Large Data Bases (VLDB 2014), September 2014.
- David and Janet Vaughan Brooks Teaching Award, Trinity
College of Arts and Sciences, Duke University, April 2013.
- Third-place winner of the Best Outrageous Ideas and
Vision Track Paper Competition at the 2011 Conference on Innovative Data Systems Research (CIDR 2011),
sponsored by the Computing Community Consortium, January 2011.
- IBM Faculty Award, January 2006.
- Recognized for excellence in teaching by the Teaching
Excellence Committee, Department of Computer Science, Duke
University, January 2004.
- NSF CAREER Award, September 2003.
- Highest Achievement Award, Computer Science Division,
UC Berkeley, May 1995.
- UC Berkeley Chancellor's Scholarship, 1993 - 1995.
- Dean's Honor List For Top 4% Students, UC
Berkeley, February 1994, July 1994, and February 1995.
- Chinese-American Institute of Engineers And Scientists
Scholarship, June 1994.
- Chuck Miller Scholarship, February 1994.
- National Individual Champion of Mathematics Competition
of American Math Association of Two-Year Colleges, 1991 - 1992 and 1992 - 1993.
- Outstanding Student's Honor, Delta College Academic
Senate, April 1993.
- California Math Council of Community Colleges
Scholarship, 1993 and 1994.
- Delta College Foundation Scholarship, Memorial
Scholarship, Academic Excellence Scholarship, etc., 1993.
- First Prizes, National Math Competition of Chinese High
Schools, 1989 and 1990.
- First Prizes, Computer Programming Contest of Chengdu,
China, 1988, 1989, and 1990.
External Presentations and Demonstrations
- "What Teaching Databases Taught me about Researching Databases," Invited talk at the Northwest Database Society Seminar
Series, University of Washington, Seattle (04/11/2024), invited talk
at the DATA Lab Seminar Series, Northeastern University
(05/10/2024), and keynote talk at 3rd International Workshop on Data
Systems Education (DataEd) at SIGMOD 2024, Santiago, Chile
(06/09/2024), April 2024 - June 2024.
- Panel on Law and AI in China and the US, Duke
University Law School, March 18, 2024.
- Panel on Good Reviewing Habits, VLDB 2023 PhD Workshop,
Vancouver, Canada, August 28, 2023.
- "Adventure of a Computer Scientist in Fact-Checking," distinguished lecture, School of Computer Science,
Georgia Institute of Technology, January 17, 2023.
- Round table on Systems for ML at the 2021 International Conference on Very Large Data Bases (VLDB 2021), August 2021.
- "Adventure of a Computer Scientist in Fact-Checking," seminar for Duke University Scholars, April 2021.
- "From Answering Questions to Questioning Answers," talk at the Natural and Applied Sciences Research
Colloquia Series, Duke Kunshan University, November 2020.
- "Squash, Gardener and the Future of AI in Political Fact-Checking," talk with Bill Adair at the +DS (+DataScience) Virtual
Learning Experiences Series, Duke University, October 2020.
- "Computational Fact Checking through Query Perturbations," talk at the Workshop on AI and Information Disorder,
Global Forum on AI for Humanity, Paris, France, October 8, 2019.
- "Adventure of a Computer Scientist in Fact-Checking," talks at the School of Journalism and Communication,
Renmin University (11/21/2018), and the School of Communication,
Hong Kong Baptist University (5/7/2019), November 2018 - May 2019.
- "From Answering Questions to Questioning Answers," talks at the School of Information, Renmin University
(11/21/2018), Department of Computer Science, Hong Kong Baptist
University (5/8/2019), and Big Data Institute and Department of
Computer Science and Engineering, Hong Kong University of Science
and Technology (5/9/2019), November 2018 - May 2019.
- "Automated Pop-Up Fact-Checking: Challenges and Progress," presentation and system demonstration at
the 2019 Computation+Journalism Symposium (COMPJ 2019), February 1, 2019 - February 2, 2019.
- "Do Numbers Lie," talk at TEDxDuke 2018, March 4, 2018.
- "Data Analytics in a Public Cloud: A User-Centric Perspective," presentation at the Huawei Innovation Research Program Exploratory collocated with
the 2017 International Conference on Very Large Data Bases (VLDB 2017), August 2017.
- "Cumulon-D: Data Analytics in a Dynamic Spot Market," presentation at the 2017 International Conference on Very Large Data Bases (VLDB 2017), August 2017.
- "Data Management in Machine Learning: Challenges, Techniques, and Systems," tutorial the 2017 ACM SIGMOD International Conference on Management of Data (SIGMOD 2017) with Arun Kumar and Matthias Boehm, May 2017.
- "Do Numbers Lie," Science Cafe presentation (with Brett Walenz) at North Carolina Museum of Natural
Sciences, October 27, 2016.
- "Cumulon: Matrix-based Data Analytics in the Cloud with Spot Instances," presentation at the 2016 International Conference on Very Large Data Bases (VLDB 2016), September 2016.
- "From Answering Questions to Questioning Answers (and Questions): Toward Computational
Fact-Checking," talk at Duke Computer Science Summer Undergraduate Researchers Lunch
Series, July 2016.
- "Cumulon: Simplifying Matrix-Based Data Analysis in the Cloud," talks at University of Texas at Arlington, University of Texas at Dallas, University
of North Texas, and Wuhan University, April 2016.
- "From Answering Questions to Questioning Answers (and Questions): Toward Computational
Fact-Checking," talk at Tsinghua University, June 2015.
- "Computational Journalism and Big Data," presentation at the Workshop on Journalism
and Public Policy for Nanjing Media Professionals, Media Fellows Program, Sanford
School of Public Policy, Duke University, December 2014.
- "Can Technology Change Fact-Checking?" panel at the American Press Institute "Truth
in Politics 2014" Summit, December 2014.
- "Thoughts on TAR and Recent Computing Advances," presentation at the Duke Law School Master of Judicial Studies Program, June 2014.
- "From Answering Questions to Questioning Answers (and Questions): Toward Computational
Fact-Checking," talk at MIT, Big Data Initiative, May 2014.
- "Big and Useful: What's in the Data for Me?" panel at the 2013 International Conference on Very Large Data Bases (VLDB 2013), August 2013.
- "Big Data: Not Just about the Size," presentation at the Forum of Future Data, Wuyishan, China, July 2012.
- "Problems in Computational Journalism," presentation at HP Labs, Beijing, China, June 2012.
- "Fun with Arrays and Matrices in RIOT," informal talk at Stanford InfoLab lunch, August 2011.
- "Computational Journalism: A Call to Arms to Database Researchers," presentation at the 2011 Conference on Innovative Data Systems Research (CIDR 2011), January 2011.
- "Scalable Continuous Query Processing and Result Dissemination," seminar at HP Labs, Beijing, China, August 2010.
- "Data-Driven Processing in Sensor Networks," seminar at Stanford University, January 2009.
- "A Sampling-Based Approach to Information Recovery," presentation at the 2008 Annual Meeting of the Institute for Operations Research and the Management
Sciences (INFORMS 2008), October 2008.
- "Thoughts on Data Sharing: A Database Researcher's Perspective," presentation at the Primate Life History Working Group Meeting, NESCent (National
Evolutionary Synthesis Center), August 2007.
- "Query Suspend and Resume," presentation at the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), June 2007.
- "Data-Driven Processing in Sensor Networks," seminars at University of Pennsylvania, University of Waterloo, and New England Database
Society, April 2007 - October 2007.
- "Scalable Continuous Query Processing and Result Dissemination," seminars at IBM T. J. Watson Research Center, University of Maryland at College Park,
University of Pittsburgh/Carnegie Mellon University Joint Database Seminar, Brown
University, University of Illinois at Urbana-Champaign, and University of California
at Berkeley, February 2006 - December 2006.
- "Continuous Query Processing over Networked Data," presentation at IBM Research Triangle Park University Day, October 2006.
- Panel discussion at SIGMOD '06 Life after Graduation Symposium, June 2006.
- "Scalable Continuous Query Processing and Result Dissemination," talk at the 2006 Southeast Workshop on Data and Information Management (SEWDIM 2006), March 2006.
- "Querying Networked Data," presentation at IBM Research Triangle Park University Day, October 2005.
- "An Overview of Database Research at Duke," presentation at inDuke Meeting, Duke University, May 2005.
- "Caching for Network Querying," presentation at SIGMOD '05 Program Committee Workshop, Stanford, California, February 2005.
- "Layers and Boxes: Efficient and Maintainable Indexes for XML," seminar at IBM T. J. Watson Research Center, July 2004.
- "AutoBib: Automatic Extraction of Bibliographic Information on the Web," presentation at the 2004 International Database Engineering and Applications Symposium (IDEAS 2004).
- "Post-Web-Age Information Management," panel discussion at the 2003 International Conference on Web-Age Information Management (WAIM 2003).
- "TupleRank and Implicit Relationship Discovery in Databases," presentation at the 2003 International Conference on Web-Age Information Management (WAIM 2003).
- "Problems in Database View Maintenance and Web Data Extraction," seminar at University of North Carolina at Greensboro, April 2003.
- "Efficient Maintenance of Materialized Top-k Views," presentation at the 2003 International Conference on Data Engineering (ICDE 2003).
- "Incremental Computation and Maintenance of Temporal Aggregates," presentation at the 2001 International Conference on Data Engineering (ICDE 2001).
- "Query Processing in Kidar," guest lecture for a course on database system
implementation at Stanford University, Stanford,
California, November 2000.
- "Performance Issues in Incremental Warehouse Maintenance," presentation at the 2000 International Conference on Very Large Data Bases (VLDB 2000).
- "TIP: A Temporal Extension to Informix," system demonstration at the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000).
- "Temporal Data Warehousing," colloquia at
Brown University, Cornell University, Duke University, Harvard
University, Santa Clara University, State University of New York at
Stony Brook, University of California at Santa Barbara, University of
California at Santa Cruz, University of Southern California, Yale
University, and IBM Almaden Research Center, February 2000 - May 2000.
- "TIP: A Temporal Extension to Informix," presentation and system demonstration at
Stanford Database Workshop, Stanford, California, March 2000.
- "TIP: A Temporal Extension to Informix," presentation and system demonstration at
Informix Corporation, Oakland, California, March 2000.
- "Temporal View Self-Maintenance," presentation at the 2000 International Conference on Extending Database Technology (EDBT 2000).
- "TIP: A Temporal Extension to Informix," system demonstration at the 2000 International Conference on Extending Database Technology (EDBT 2000).
- "Maintaining Temporal Views Over Non-Temporal Information Sources For Data Warehousing," presentation at the 1998 International Conference on Extending Database Technology (EDBT 1998).
- "Performance Modeling and Composition: A Case Study in Cell Simulation," presentation at the 1996 International Parallel Processing Symposium (IPPS 1996).
Teaching
- COMPSCI 216 (formerly CPS 290.01), Duke University: Everything Data. Spring 2014 and Spring 2015.
- COMPSCI 316 (formerly CPS 116), Duke University: Introduction to Database Systems. Fall 2002, Fall 2003, Fall 2004, Fall 2005, Fall 2006, Fall 2007, Fall 2008, Fall 2009, Fall 2011, Fall 2012, Fall 2013, Fall 2014, Fall 2015, Fall 2016, Fall 2017, Fall 2018, Fall 2019, Fall 2021, and Fall 2023.
- COMPSCI 516 (formerly CPS 216), Duke University: Advanced Database Systems. Fall 2001, Spring 2003, Spring 2004, Spring 2005, and Spring 2024.
- CompSci 590.1, Duke University: Data Cleaning and Integration. Spring 2017.
- CPS 296.1, Duke University: Project in Computational Journalism. Spring 2012.
- CPS 296.1, Duke University: Database and Programming Languages: Crossing the Chasm. Spring 2010.
- CPS 296.3, Duke University: Information Management and Mining. Spring 2009.
- CPS 399.28, Duke University: Research Seminar and Project in Databases. Spring 2008.
- CPS 296.4, Statistical and Applied Mathematical Sciences Institute, cross-listed at Duke, North
Carolina State, and UNC Chapel Hill: Sensor Networks for Environmental Monitoring. Fall 2007.
- CPS 296.1, Duke University: Sensor Data Processing. Spring 2007.
- CPS 296.1, Duke University: Topics in Database Systems. Spring 2002.
- CPS 300, Duke University: Introduction to Graduate Study. Fall 2008, Fall 2009, Fall 2010, and Fall 2011.
- CS 145, Stanford University: Introduction to Databases. Spring 1999.
Student Advising
Former Postdoctoral Advisee(s):
- Amir Gilad. Co-advised with Ashwin Machanavajjhala and Sudeepa Roy. First
employment: Assistant Professor at the Hebrew University of Jerusalem.
- Xiao Hu. Co-advised with Pankaj K. Agarwal. First employment:
Assistant Professor at the University of Waterloo.
Current Ph.D. student(s):
- Yihao Hu.
- Ph.D. preliminary exam: Toward Efficient Debugging of SQL Semantics. Spring 2023.
- Ph.D. research initiation project: Generating Hints for Debugging Wrong Queries. 2022.
- Yuxi Liu. Co-advised with Sudeepa Roy.
- Ph.D. research initiation project: Strategies for Updating Selectivity Estimators. 2023.
- Rickard Stureborg. Co-advised with Bhuwan Dhingra.
- Ph.D. preliminary exam: Repurposing Human-Centered Resources to Improve Large Language Models. Spring 2023.
- Ph.D. research initiation project: A Taxonomy for Vaccine Concerns. 2022.
- Haibo Xiu. Co-advised with Sudeepa Roy.
- Ph.D. research initiation project: Robust Query Optimization by Understanding the Uncertainty of Selectivity Estimation. 2023.
Graduated Ph.D. students:
- Junyang Gao. First employment: Google.
- Ph.D. dissertation defense: Durability Queries on Temporal Data. June 2020.
- Ph.D. preliminary exam: Durability Queries on Temporal Data. Fall 2018.
- Ph.D. research initiation project: Durable Claims from Structured Data. 2016.
- Brett Walenz. First employment: Google.
- Ph.D. dissertation defense: Perturbation Analysis of Database Queries. May 2019.
- Ph.D. preliminary exam: Perturbation Analysis of Database Queries. Summer 2016.
- Ph.D. research initiation project: Perturbation Analysis of SQL Queries. 2014.
- Mayuresh Kunjir. Co-advised with Shivnath Babu. First employment: Qatar Computing Research Institute.
- Ph.D. dissertation defense: Automating Memory Management in Data Analytics. March 2019.
- Ph.D. preliminary exam: Managing Heterogeneity in Multi-Tenant Data-Parallel Clusters. Spring 2015. (Served as committee member, not as primary advisor.)
- Ph.D. research initiation project: Fair Cache Allocation for Multi-tenant Data-Parallel Workloads. 2013. (Served as committee member, not as primary advisor.)
- Botong Huang. Co-advised with Shivnath Babu. First employment: Microsoft.
- Ph.D. dissertation defense: Cumulon: Simplified Matrix-Based Data Analytics in the Cloud. February 2016.
- Ph.D. preliminary exam: Cumulon: Optimizing Statistical Analysis in the Cloud. Spring 2013.
- Ph.D. research initiation project: Data Parallel Statistical Computing in the Cloud. 2012.
- Risi Thonangi (Rishi). First employment: VMware.
- Ph.D. dissertation defense: Optimizing Database Algorithms for Random-Access Block Devices. July 2015.
- Ph.D. preliminary exam: Searching, Sorting, Permuting and Beyond on Flash. Spring 2011.
- Ph.D. research initiation project: Investigating Concurrency Control for Flash-Efficient Indexes. 2009.
- You Wu (Will). Co-advised with Pankaj K. Agarwal. First employment: Google Inc.
- Ph.D. dissertation defense: Computational Journalism: from Answering Questions to Questioning Answers and Raising
Good Questions. July 2015.
- Ph.D. preliminary exam: Computational Journalism: From Answering Questions to Questioning Answers and Raising
Good Questions. Spring 2013.
- Ph.D. research initiation project: Extended Promotion Analysis and its Applications in Computational Journalism. 2012.
- Albert Yu. Co-advised with Pankaj K. Agarwal. First employment: Amazon.
- Ph.D. dissertation defense: Algorithms for Continuous Queries: A Geometric Approach. May 2013.
- Ph.D. preliminary exam: Algorithmic Challenges in Content-based Publish-Subscribe Systems. Spring 2010.
- Ph.D. research initiation project: Network Design for Wide-Area Publish/Subscribe. 2008.
- Yi Zhang. First employment: Google Inc.
- Ph.D. dissertation defense: Transparent and Efficient I/O for Statistical Computing. March 2012.
- Ph.D. preliminary exam: RIOT: A Framework for Efficient Statistical Computing. Fall 2009.
- Ph.D. research initiation project: Failure-Aware Spatial Suppression in Sensor Networks. 2007.
- Badrish Chandramouli. First employment: Microsoft Research.
- Ph.D. dissertation defense: Unifying Databases and Internet-Scale Publish/Subscribe. July 2008.
- Ph.D. preliminary exam: Supporting Better Scalability and Richer Subscription
Models in Wide-Area Publish/Subscribe. Summer 2006.
- Ph.D. research initiation project: Distributed Network Querying: Reducing Costs by Providing
Approximate Answers. 2004. Duke CS Outstanding PhD Research Initiation Project Award.
- Junyi Xie. First employment: Oracle Corp.
- Ph.D. dissertation defense: Handling Resource Constraints and Scalability in Continuous Query Processing. September 2007.
- Ph.D. preliminary exam: Optimizing Continuous Queries Over Data Streams. Fall 2004.
- Ph.D. research initiation project: Building DRAM-Based High Performance Intermediate Memory Systems. 2002. (Served as committee member, not as primary advisor.)
- Hao He. IBM Ph.D. Fellowship, 2006-2007; first employment: Google Inc.
- Ph.D. dissertation defense: Query Processing and Indexing Techniques on Semi-Structured Data. July 2007.
- Ph.D. preliminary exam: Query Processing and Indexing Techniques on Graph-Structured Data. Spring 2006.
- Ph.D. research initiation project: A Workload-Aware Update-Efficient Index for XML. 2003.
- Adam Silberstein. First employment: Yahoo! Research.
- Ph.D. dissertation defense: Query Processing Methods for Wireless Sensor Networks. February 2007.
- Ph.D. preliminary exam: Query Processing and Optimization in Sensor Networks. Spring 2005.
- Ph.D. research initiation project: Sorting XML in External Memory. 2004.
Current M.S. Student(s):
- Kushagra Ghosh.
- Yang Li.
- Sharan Sokhi.
- Qianyu Yang (Ethan).
Graduated M.S. students:
- Meng Hanze. First employment: PhD student at the University of British Columbia. Characterizing and Verifying Queries Via CINSGEN. Spring 2024.
- Lei Luo. Improving Fact-Checking Retrieval System using Language Models. Spring 2022.
- Chang Xu. Judgment Prediction based on Legal Text Analysis. Spring 2022.
- Qianqian Che. First employment: Tencent. A QA-based Approach to Classifying Vaccine-related Misinformation. Spring 2021.
- Qiulin Li. First employment: Amazon. Improving I-Rex: An Interactive Relational Query Explainer for SQL. Spring 2021.
- Qingying Luo. First employment: Amazon. An Iterative Procedure for Detecting Anti-Vaccinations Subreddits and Sources. Spring 2021.
- Dongfan Zhang. First employment: Amazon. Automating Collection of Anti-Vaccination Data on Facebook. Spring 2021.
- Tiangang Chen. First employment: Amazon. I-Rex: An Interactive Relational Query Explorer. Spring 2020.
- Xiaoming Liu. First employment: Google. Mining Semantic Patterns from Text. Spring 2020.
- Xiaoyu Yanglian (Liana). First employment: Amazon. Generating Interesting Streak-Based Claims from Sports Data. Spring 2020.
- Yanlin Yu. First employment: Facebook. Supporting Domain-Specific Complex Natural Language Queries. Spring 2020.
- Yuhao Wen. First employment: Oracle. Interactive Summarization and Exploration of Top Aggregate Query Answers. Summer 2019.
- Xinghao Cheng. First employment: Facebook. Infrastructure Options for Real-Time Fact-Checking. Spring 2019.
- Junbo Li. First employment: Indeed.com. Adapting the Transformer Model for Fact-Checking. Spring 2019.
- Wenqian Tong. First employment: Google. Parallelizing Factlet Mining from Duke Basketball Game Statistics using Apache Spark. Spring 2019.
- Qian Wang (Bruce). First employment: Salesforce. Optimization of Factlet Mining from Duke Basketball Game Statistics. Spring 2019.
- Sitong Che. First employment: Microsoft. Mining Interesting and Diverse Factlets from Data. Fall 2018.
- Rohit Paravastu. First employment: WealthGuard. Detecting Natural-Language Claims Checkable on Relational Databases. Fall 2012.
- Rozemary Scarlat. First employment: Microsoft. FirstPass: Crowdsourced Initial Document Analysis. Fall 2012.
- Yunjia Zhou. First employment: Salesforce.com. Exploring One-of-the-Few Claims from Data. Spring 2012.
- Pradeep K. Gunda. Scalable Lineage Tracking in Workflows. Fall 2007.
- Wenbin Pan. On Author Name Disambiguation in Citation Databases. Fall 2004.
- Zhihui Wang. Multiple-View Maintenance with Semantic Caching. Summer 2003.
- Jing Zhang. Implementing a File System on Top of a DBMS. Summer 2003.
- Junfei Geng. Automatic Extraction and Integration of Bibliographic Information on the Web Using
Hidden Markov Models. Spring 2003.
- Xiao F. Huang (Andy). TupleRank and Implicit Relationship Discovery in Databases. Spring 2003.
- Parag G. Palekar. Analysis of an Incremental Algorithm for Mining Frequent Itemsets. Fall 2002.
Undergraduate theses supervised:
- Felicia Chen. Understanding the Landscape of Vaccine Misinformation. Spring 2020. Graduated with High Distinction.
- Tyler Brock. Amboseli Baboon Research Ranker. Spring 2007. Graduated with Distinction.
- Christopher N. Bond. Query Suspend and Resume. Spring 2005. Graduated with High Distinction.
Undergraduate research internship:
- Aayushi Patel, Christopher Li, Qinyu Zhu (Chloe), and Tingnan Hu. Using Large Language Models to Generate Vaccine Interventions. Summer 2023.
- Aakash Kothapally, Dev Seth, Isa Mellody, and Shuaichen Liao. Identifying Vaccine Misinformation in Text. Summer 2021.
- James Lin, Allen Pan, and Zachary Zheng. Helping Novices Debug Relational Queries (HNRQ). Summer 2021.
- Alexander Bendeck, Kevin Day, and Jeffrey Luo. Helping Novices Debug Relational Queries. Summer 2020.
- Jianchao Geng (Frank), Javan Jiang (JJ), Min Soo Kim, Sanha Lim, and Jackson Proudfout. Scaling Up Live Pop-Up Fact Checking. Summer 2019.
- Wenqin Wang. Data and Technology for Fact-Checking. Fall 2018.
- Caroline Wang, Ethan Holland, and Lucas Fagan. Data and Technology for Fact-Checking. Summer 2018.
- Tim Overeem. iCheck: Computational Fact-Checking. Fall 2017.
- Aditya Srinivasan. iCheck: Computational Fact-Checking. Fall 2017.
- Yuxiang He. iCheck: Computational Fact-Checking. Fall 2016 - Spring 2017.
- Emre Sonmez. iCheck: Computational Fact-Checking. Summer 2014 - Spring 2017.
- Yuansong Feng. iCheck: Computational Fact-Checking. Spring 2017.
- Dhrumil Patel. iCheck: Computational Fact-Checking. Spring 2017.
- Seokhyun Song (Alex). iCheck: Computational Fact-Checking. Summer 2014.
- Jiaqi Yan. RIOT: Statistical Computing with Efficient, Transparent I/O. Summer 2010 - Spring 2012. Duke CSURF Fellow.
- Weiping Zhang. RIOT: Statistical Computing with Efficient, Transparent I/O. Summer 2009 - Spring 2011.
- Gregory Filpus. Suppression Schemes for Sensor Data Collection. Summer 2006.
- Congyi Wu. Tracking Lineage for Computational Workflows. Summer 2006.
Undergraduate independent studies:
- Sharan Sokhi. Improving UI for I-Rex: An Interactive Relational Query Explainer for SQL. Fall 2023.
- Zachary Zheng. Scaling up I-Rex: An Interactive Relational Query Explainer for SQL. Fall 2023.
- James Leong. Helping Novices Debug Relational Queries (HNRQ). Fall 2021.
- John Kang. A Method for Scraping Anti-Vax Articles from the Internet. Fall 2020.
- Ann Bailey and Max Bartelett. Volunteer Database Consulting for a Non-Profit Organization. Spring 2020.
- Alexander Bendeck and Kevin Day. Helping Novices Debug Relational Queries. Spring 2020.
- Jianchao Geng (Frank), Min Soo Kim, Emily Liu, Andres Montoya, Matthew O'Boyle, Rahul Sengottuvelu, Charlie Todd, Siyi Xu, David Yoon, Arthur Zhao, and Daniel Zhou. Data and Technology for Fact-Checking. Spring 2020.
- David Cheng, Maya Choudhury, Joyce Er, Jianchao Geng (Frank), Javan Jiang (JJ), Kamran Kara-Pabani, Grant Kim, Emily Liu, Andres Montoya, Matthew O'Boyle, Rahul Sengottuvelu, Siyi Xu, and David Yoon. Data and Technology for Fact-Checking. Fall 2019.
- Archana Ahlawat, David Cheng, Sherry Feng, Yuanhao Guan, Sherry Hu, Matthew O'Boyle, Ali Soyupak, Jason Wang, and Arthur Zhao. Data and Technology for Fact-Checking. Spring 2019.
- Archana Ahlawat, Fangge Deng, Sherry Feng, Matthew O'Boyle, Connie Wu, Fengyu Xie (Harry), and Liuyi Zhu. Data and Technology for Fact-Checking. Fall 2018.
- Wenqin Wang. Data and Technology for Fact-Checking. Spring 2018.
- Jordan Ly. Relational Algebra Interpreter. Spring 2016.
- Charles Xu and Yubo Tian. iCheck: Computational Fact-Checking. Fall 2015.
- Ouwen Huang. Surgical Data Visualization. Spring 2015.
- Alan M. Ni, Benjamin M. Schwab, and Jay M. Wang. SMS-based Networking for Data-based Apps. Fall 2014.
- Eric Wu and Kevin Wu. Perturbation Analysis of College Basketball Predictions. Fall 2014.
- Seokhyun Song (Alex). iCheck: Computational Fact-Checking. Summer 2014.
- Andrew Shim. Computational Journalism. Spring 2013 - Spring 2014.
- Peggy Li. Computational Fact-Checking. Spring 2014.
- Eric Wu and Kevin Wu. Music Databasing. Spring 2014.
- Jiaqi Yan. Efficient Out-of-Core Data Analysis. Fall 2010 - Spring 2011.
- Kevin Jang. Efficient Out-of-Core Data Analysis. Fall 2010.
- Perry Zheng. Managing Structure-Rich Data. Fall 2009 - Spring 2010.
- Weiping Zhang. RIOT: Statistical Computing with Efficient, Transparent I/O. Spring 2010.
- Ashley DeMass. Database Support for Wireless Sensor Networks. Fall 2008.
- Congyi Wu. Object-Oriented Schema and Data Editing on a Relational Backend. Spring 2008.
Ph.D. defense committee (not as primary advisor):
- Shweta J. Patwa. Synthesizing Linked Data and Detecting Per-Query Gaps Under Differential Privacy. Summer 2023.
- Alexander J. Steiger. Algorithms for Rectangular Robot Motion Planning. Summer 2023.
- Erin C. Taylor. Algorithms for Clustering, Partitioning, and Planning. Summer 2023.
- Chenghong Wang. Encrypted Data Management Systems with Tunable Privacy. Summer 2023.
- David A. Pujol. Fairness in Differentially Private Data Release. Fall 2022.
- Zhengjie Miao. Simplifying Human-in-the-loop Data Science Pipeline: Explanations, Debugging, and
Data Preparation. Summer 2022.
- Aaron Lowe. Flood Risk Analysis on Terrains. Summer 2021.
- Stavros Sintos. Efficient Algorithms for Querying Large and Uncertain Data. Summer 2020.
- Pengfei Zheng. Artificial Intelligence for the Understanding of Large Complex Datacenters. Spring 2020.
- Ioannis Kotsogiannis. Query Answering in Multi-Relational Databases under Differential Privacy. Summer 2019.
- Prajakta Kalmegh. Detecting and Reducing Resource Interferences in Data Analytics Frameworks. Spring 2019.
- Pulkit Misra. Enhancing Transactional Key-Value Storage Systems in Datacenters using Precise Clocks
and Software-Defined Storage. Spring 2019.
- Zilong Tan. Approximate Inference for High-Dimensional Latent Variable Models. Fall 2018.
- Xi He. Policy Driven Data Sharing with Provable Privacy Guarantees. Summer 2018.
- Afroza Sultana. Efficient Evaluation of Contextual and Reverse Pareto-Optimality Queries. Summer 2018. University of Texas at Arlington.
- Yan Chen. Applying Differential Privacy with Sparse Vector Technique. Spring 2018.
- Peter Gilbert. Assuring Data Authenticity While Preserving User Choice in Mobile Sensing. Fall 2017.
- Jiangwei Pan. Algorithms for Geometric Matching, Clustering, and Covering. Summer 2016.
- Wuzhou Zhang. Geometric Computing over Uncertain Data. Spring 2015.
- Qing Duan. Real-Time and Data-Driven Operation Optimization and Knowledge Discovery for an Enterprise
Information System. Spring 2014.
- Blake Hechtman. Exploiting Parallelism in GPUs. Spring 2014.
- Nedyalko Borisov. Integrated Management of the Persistent-Storage and Data-Processing Layers in Data-intensive
Computing Systems. Summer 2012.
- Sharathkumar Raghvendra. Geometric Approximation Algorithms - A Summary Based Approach . Summer 2012.
- Herodotos Herodotou. Automatic Tuning of Data-Intensive Analytical Workloads. Spring 2012.
- Sam Slee. Developing Scalable Abilities for Self-Reconfigurable Robots. Fall 2010.
- Songyun Duan. Simplifying System Management through Automated Forecasting, Diagnosis, and Configuration
Tuning. Spring 2010.
- Fareed Zaffar. Foresight: Countering Malware Through Cooperative Forensics Sharing. Summer 2008.
- Joseph Volpe. Mechanistic and Genetic Biases in Human Immunoglobulin Heavy Chain Development. Spring 2008.
- Laura Grit. Extensible Resource Management for Networked Virtual Computing. Fall 2007.
- Dazhi Wang. Service Reliability: Models, Algorithms and Applications. Summer 2007.
- Angela Dalton. Data Fidelity Mechanisms for Enhancing Energy Management in Context-Aware Systems. Fall 2006.
- Ke Yi. I/O Efficient Algorithms for Processing Massive Spatial
Data. Summer 2006.
- Hai Yu. Geometric Algorithms for Time-Varying Data. Summer 2006.
- Rebecca Braynard. Wireless MAC Layer Flexibility for Extending Effective System Lifetime. Spring 2006.
- Justin Moore. Automated Cost-Aware Data Center Management. Spring 2006.
- Patrick Reynolds. Using Causal Paths to Improve Performance and Correctness in Distributed Systems. Spring 2006.
- Dejan Kostic. High Bandwidth Data Dissemination for Large-Scale Distributed Systems. Summer 2005.
- Yun Fu. Resource Allocation for Global-Scale Network Services. Fall 2004.
- Sathish Govindarajan. Spatial Data Structures and Algorithms for Large Scale Applications. Fall 2004.
- Lipyeow Lim. Online Methods for Database Optimization. Fall 2004.
- Rajiv Wickremesinghe. Methods and Models for Data-Intensive Computing. Fall 2004.
- Heng Zeng. Explicit Energy Resource Management as a First Class Operating System Resource. Spring 2004.
- Ronald P. Doyle. Model-Based Adaptive Resource Provisioning in a Web Service Utility. Fall 2003.
Ph.D. preliminary exam committee (not as primary advisor):
- Yingfan Wang. Towards a Comprehensive Evaluation of Dimension Reduction Methods for Transcriptomic
Data Visualization. Spring 2023.
- Yanping Zhang. Data-aware Indexes for Growing Databases under MPC and Differential Privacy. Spring 2023.
- Alexander J. Steiger. Efficient Solutions for Geometric Proximity Problems. Summer 2021.
- Erin C. Taylor. Efficient Algorithms for Geometric Partitioning Problems. Summer 2021.
- Shweta J. Patwa. Reconstructing Linked Data from Cardinality and Integrity Constraints. Spring 2021.
- David A. Pujol. Budget Sharing for Multi-Analyst Differential Privacy. Spring 2021.
- Chenghong Wang. DP-Sync: Hiding Update Patterns in Secure Outsourced
Databases with Differential Privacy. Spring 2021.
- Zhengjie Miao. Explanations in Data Analysis Pipeline. Spring 2020.
- Aaron Lowe. Hydrological Analysis on Terrains. Fall 2018.
- Pulkit Misra. Architecting Storage Systems for Cloud Application. Fall 2018.
- Pengfei Zheng. Scalable Machine Learning for Datacenter Performance Management. Spring 2018.
- Stavros Sintos. Efficient Algorithms for Querying Large, Complex Data. Fall 2017.
- Prajakta Kalmegh. Minimizing Resource Interferences in a Cluster Computing Framework. Spring 2017.
- Zilong Tan. Fast Algorithms for Learning in High Dimensions. 2017.
- Yan Chen. Private Data Management with Verification. Summer 2016.
- Yuzhang Han. Management of JVM-Based Memory-Intensive Parallel Databases. 2016.
- Benjamin Stoddard. Active Task Lifelogging via Sensor Readings. Summer 2015.
- Xi He. Private Mobility Data Synthesis and Management. Spring 2015.
- Mayuresh Kunjir. Managing Heterogeneity in Multi-Tenant Data-Parallel Clusters. Spring 2015.
- Jiangwei Pan. Algorithms for Geometric Covering and Clustering. Spring 2014.
- Wuzhou Zhang. Geometric Computing over Uncertain Data. Summer 2013.
- Andrew Brown. Cloud Platform Trust Logic. Fall 2012.
- Vamsidhar Thummala. Balancing Energy, Performance, and Stability Tradeoffs Under Uncertainty. Spring 2011.
- Herodotos Herodotou. Optimizing Analytical Workloads in Data-Intensive Computing Systems. Fall 2010.
- Nedyalko Borisov. Integrated Management of the Persistent-Storage and Data-Processing Layers in Data-intensive
Computing Systems. Spring 2010.
- Sharathkumar Raghvendra. Geometric Summaries. Spring 2009.
- Sam Slee. Developing Scalable Abilities for Self-Reconfigurable Robots. Spring 2009.
- Songyun Duan. Automated Forecasting and Diagnosis of System Failures. Spring 2008.
- Anita Lungu. Verification-Aware Processor Design. Spring 2007.
- Aydan Jumerefendi. System Support for Strong Accountability. Fall 2006.
- Joseph Volpe. Investigation of the IgH Locus and Analysis of the Antigen Receptors That It Forms. Fall 2005. Bioinformatics and Genome Technology.
- Dazhi Wang. Service Availability Modeling. Spring 2005.
- Rebecca Braynard. Asynchronous and Asymmetric Communication for Balancing Energy Consumption in Sensor
Networks. Fall 2004.
- Justin Moore. Balancing Site Goals and Service Goals in Datacenter Management. Fall 2004.
- Ke Yi. Index Structures for Large Databases: Theory and Practice. Spring 2004.
- Dejan Kostic. High Bandwidth Data Dissemination for Large-Scale Distributed Systems. Fall 2003.
- Patrick Reynolds. Measurement and Causality in Black-Box Distributed Systems. Fall 2003.
- Lipyeow Lim. Online Methods for Database Optimization. Spring 2003.
- Yun Fu. Resource Allocation for Global-Scale Network Services. Fall 2002.
- Sathish Govindarajan. Handling Large Spatial Data: Approximation and Data Structures. Summer 2002.
- Rajiv Wickremesinghe. Data Intensive Computation in a Compute/Storage Hierarchy. Spring 2002.
- Ronald P. Doyle. Internet Service Delivery Architecture: Implications of the Resource Grid Model. Fall 2001.
Ph.D. research initiation project committee (not as primary advisor):
- Fangzhu Shen. Shapley Values for Measuring Responsibilities for Influence Propagation in a Network. 2024.
- Rahul Raychaudhury. Fast Algorithms for Piercing Boxes. 2023.
- Yingfan Wang. Understanding How Dimension Reduction Tools Work: An
Empirical Approach to Deciphering t-SNE, UMAP, TriMap, and PaCMAP
for Data Visualization. 2022.
- Yanping Zhang. Secure Growing Databases for Ad-hoc Queries using MPC and Differential Privacy. 2022.
- Shweta J. Patwa. Synthesizing Linked Data Under Cardinality and Integrity Constraints. 2020.
- Zhengjie Miao. Explaining Wrong Queries Using Small Examples. 2019.
- David A. Pujol. Fair Decision making Using Privacy Protected Data. 2019.
- Alexander J. Steiger. Finding Diverse High-Value Points in Query Ranges. 2019.
- Yuchao Tao. Computing Local Sensitivities of Counting Queries with Joins. 2019.
- Erin C. Taylor. Efficient Algorithms for Optimal Clustering Under Stability. 2019.
- Chenghong Wang. Crypt-Epsilon: Crypto-Assisted Differential Privacy on Untrusted Servers. 2019.
- Sudarshan Balaji. DrFM: Distributed Flash Management. 2017.
- Andrew Lee. Finding Small Summarizations in Large Databases. 2017.
- Pulkit Misra. Enabling Lightweight Transactions with Precision Time. 2017.
- Prajakta Kalmegh. Most-in-First-Out (MIFO) : A Query-Semantic Aware Scheduling Policy on Cluster Computing
Frameworks. 2014 - 2016.
- Aaron Lowe. Approximate Range Counting Under Uncertainty. 2016.
- Chaofan Chen. Toward Workload-Aware Heterogeneous Schema Design for Time Series Data in NoSQL Databases. 2015.
- Amir R. Ilkhechi. A General Benchmark Framework for Comparing Resource Schedulers Designed for Multi-tenant
Scenarios. 2015.
- Stavros Sintos. Range-Max Queries on Uncertain Data. 2015.
- Zilong Tan. Meeting Performance Goals in a Fair Multi-Tenant Cluster. 2015.
- Pengfei Zheng. A Statistical Causal Inference Framework For Understanding Stragglers at Datacenter
Scale. 2015.
- Yan Chen. Model Selection for Query Answering Problems under Differential Privacy. 2014.
- Yifei Ding. Multitenancy Models for HBase. 2013.
- Abhishek Dubey. Recommending Schema and Physical Design for NoSQL Databases. 2013.
- Yuzhang Han. Query Optimization for Multi-Tenant Clusters. 2013.
- Xi He. Differentially Private Synthesis of Location Traces. 2013.
- Mayuresh Kunjir. Fair Cache Allocation for Multi-tenant Data-Parallel Workloads. 2013.
- Jiangwei Pan. Dynamic Algorithms for Geometric Hitting Set and Set Cover Problems. 2013.
- Benjamin Stoddard. Differentially Private Training Corpus Synthesis. 2013.
- Mahanth Gowda. Cooperative Packet Recovery in Enterprise WLANs. 2012.
- Jie Li. Evaluating Starfish in the Real World. 2012.
- Wuzhou Zhang. Nearest Neighbor Searching Under Uncertainty. 2012.
- Nedyalko Borisov. Diagnosing Query Slowdowns in Database and SAN Environments. 2008.
- Vamsidhar Thummala. iTuned: An Auto-Tuner for Database Configuration Parameters. 2008.
- Songyun Duan. Proactive Performance Problem Identification and Diagnosis. 2006.
- Kuan-Ming Liu. Predicting Protein Functions by Integrating Biological Database from Multiple Knowledge
Domains. 2006.
- Anita Lungu. Integrating Biological Information Across Domains. 2006.
- Sita Badrish. Energy-Efficient Handling of Disk Accesses. 2004.
- Aydan Jumerefendi. Trust But Verify: Accountability for Internet Services. 2004.
- Haoying Li. Global Maximum Stereo Matching. 2004.
- Piyush Shivam. Distributed Data Staging for Performability. 2004.
- Kashi Vishwanath. Scalability Issues in ModelNet. 2003.
- Danxia Xie. Distributed Synthetic Energy Management for Sensor Networks. 2003.
- Ke Yi. External Memory Orthogonal Range and Stabbing Aggregate Queries on Semigroups. 2003.
- Hai Yu. Kinetic Fair-Split Trees and Proximity Problems. 2003.
- Junyi Xie. Building DRAM-Based High Performance Intermediate Memory Systems. 2002.
M.S. committee (not as primary advisor):
- Siyu Chen. Improving Distributed Transactional Storage Performance through Remote Direct Memory
Access. Spring 2019.
- Sudarshan Balaji. A Case for Distributed Flash Management. Fall 2018.
- Andrew Lee. A Visualizer for Explanations through Intervention. Summer 2018.
- Benjamin Stoddard. Ayumu: Efficient Lifelogging with Focused Tasks. Fall 2016.
- Yuzhang Han. Workload Execution Planning on Distributed NoSQL Data Processing Systems. Spring 2015.
- Yifei Ding. Multitenancy for Big Data (No SQL) Systems. Summer 2014.
- Bharath K. Chelepalli. Graph De-Anonymization. Spring 2013.
- Mahanth Gowda. Cooperative Packet Recovery in Enterprise WLANs. Spring 2013.
- Jie Li. Evaluating Starfish in the Real World. Spring 2013.
- Fan Yang. Prediction-Based Mobility Monitor with Adaptive Sensing for Smartphone. Summer 2012.
- Xixi Wang. Declarative Data Stream Analysis on Storm. Spring 2012.
- Gang Luo. Processing SQL-Like Declarative Queries in a MapReduce Framework. Summer 2011.
- Liang Dong. Optimization Opportunities for MapReduce Workloads. Spring 2011.
- Xuting Zhao. Workload-Aware Data-Placement and Scheduling Policies to Improve MapReduce Performance
under Cluster Hot Spots. Spring 2011.
- Kuan-Ming Liu. Combining Feature Selection Strategies with Bayesian Learning Models to Categorize
Gene Expression Profiles. Summer 2008.
- Yuqing Pan (Gary). Wireless Pulse Oximeter Sensor Project. Spring 2008. Electrical and Computer Engineering.
- Dongdong Zhao. An Evaluation of Techniques for Self-Healing in Application and Database Servers. Spring 2008.
- Jennifer Burge. Trading Information for Energy in Sensor Networks. Fall 2007.
- Sita Badrish. Energy Efficient Handling of Disk Accesses Using Economic Models. Fall 2005.
- Haoying Li. Just-in-Time Constraints for Dynamic-Programming Stereo. Summer 2005.
Undergraduate thesis committee (not as primary advisor):
- Andrea Scripa. A Survey of Text Mining Techniques for Short Texts. Spring 2012.
- Katherine Trushkowsky. CoBib: An Architecture for a Collaborative Database. Spring 2007. Graduated with High Distinction.
- Sanjay Ginde, David Goldberg, and Chris Zeiders. OogP2P Framework. 2004.
Internships for high school students:
- Zian Chen (Stephen). Scaling up I-Rex: An Interactive Relational Query Explainer for SQL. Summer 2023. East Chapel Hill High School, Chapel Hill, NC.
- Aakash Kothapally. Scaling Up Live Pop-Up Fact Checking. Summer 2019. NC School of Science & Math, Durham, NC.
- Andrew Mu. Scaling Up Live Pop-Up Fact Checking. Summer 2019. East Chapel Hill High School, Chapel Hill, NC.
- Jonathan Xu. Scaling Up Live Pop-Up Fact Checking. Summer 2019. East Chapel Hill High School, Chapel Hill, NC.
- Dylan Dsouza. Evaluating radb, a Relational Algebra Interpreter. Summer 2017. Rising senior at Enloe Magnet High School, Raleigh, NC.
- Brandon Wu. Evaluating radb, a Relational Algebra Interpreter. Summer 2017. Rising senior at Enloe Magnet High School, Raleigh, NC.
Activities
Service to the professional community:
- Trustee of the VLDB Endowment, January 2024 - December 2029.
- Associate Editor, ACM SIGMOD International Conference on Management of Data (SIGMOD), January 2024 - present.
- Review Quality Co-Chair, the 2025 International Conference on Data Engineering (ICDE 2025).
- Editor, Foundations and Trends (FnT) in Databases, July 2023 - present.
- Member, PVLDB Advisory Committee, September 2023 - present.
- Associate Editor, ACM Transactions on Database Systems (TODS), February 2015 - present.
- Editor-in-Chief, Proceedings of the VLDB Endowment (PVLDB), Vol. 16, April 2022 - September 2023.
- Associate Editor, Proceedings of the VLDB Endowment (PVLDB), April 2020 - March 2021 and April 2021 - March 2022.
- Guest Editor, IEEE Data Engineering Bulletin (DEBULL), January 2021 - September 2021 and January 2022 - September 2022.
- Associate Editor, ACM SIGMOD Record (SIGMODREC), January 2015 - present.
- Member, ACM SIGMOD Research Highlight Award Committee, 2019 - 2022.
- Program Committee, the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD 2020).
- Co-Chair, 2019 American Statistical Association (ASA)
President's Initiative on The Role of Statistics and Computer
Science in Fake News.
- Program Committee Member, 2019 Summer School Series on
Methods for Computational Social Science, GESIS Leibniz Institute for
Social Sciences.
- Program Committee Member, SIGKDD 2019 Workshop on
Truth Discovery and Fact Checking: Theory and Practice.
- Steering Committee Member, 2019 International Workshop on
Misinformation, Computational Fact-Checking and Credible Web.
- Program Committee Co-Chair, the 2019 International Conference on Database Systems for Advanced Applications (DASFAA 2019).
- Core Program Committee, the 2019 ACM SIGMOD International Conference on Management of Data (SIGMOD 2019).
- Program Vice Chair (on Data Science), the 2019 International Conference on Data Engineering (ICDE 2019).
- Panel Co-Chair, the 2019 International Conference on Data Engineering (ICDE 2019).
- Guest Editor, Special Issue on Combating Digital
Misinformation, ACM Journal of Data and Information Quality (JDIQ), September 2017 - July 2019.
- Program Committee, the 2018 Workshop on Data Management for End-to-End Machine Learning (DEEM 2018).
- Program Committee, the 2018 International Workshop on the Web and Databases (WEBDB 2018).
- Program Committee, the 2018 International Conference on World Wide Web (WWW 2018).
- General Co-Chair, the 2017 ACM SIGMOD International Conference on Management of Data (SIGMOD 2017).
- Tutorial Program Committee, the 2016 ACM SIGMOD International Conference on Management of Data (SIGMOD 2016).
- Guest Editor, Special Issue on Visionary Ideas in Data
Management, ACM SIGMOD Record (SIGMODREC), October 2014 - July 2015.
- Subject Area Editor (Database and Knowledge-Based Systems), Journal of Computer Science and Technology (JCST), December 2011 - December 2018.
- Program Committee, the 2016 Workshop on Human-In-the-Loop Data Analytics (HILDA 2016).
- Senior Program Committee, the 2015 International Conference on Information and Knowledge Management (CIKM 2015).
- Best Paper Selection Committee, the 2015 National Database Conference of China (NDBC 2015).
- Program Committee Group Leader, the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD 2015).
- General Co-Chair, the 2015 International Conference on Web-Age Information Management (WAIM 2015).
- Program Committee, the 2014 International Conference on Information and Knowledge Management (CIKM 2014).
- Review Board, Proceedings of the VLDB Endowment, August 2008 - March 2012, April 2013 - March 2015, and April 2018 - March 2019.
- Program Committee Co-Chair, the 2014 International Workshop on Bringing the Value of Big Data to Users (DATA4U 2014).
- Best Paper Selection Committee, the 2014 National Database Conference of China (NDBC 2014).
- Program Committee, the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD 2014).
- Program Committee, the 2014 International Workshop on Exploratory Search in Databases and the Web (EXPLOREDB 2014).
- Senior Program Committee, the 2013 International Conference on Information and Knowledge Management (CIKM 2013).
- Demonstration Program Committee Co-Chair, the 2013 International Conference on Very Large Data Bases (VLDB 2013).
- Best Paper Selection Committee, the 2013 National Database Conference of China (NDBC 2013).
- Program Committee Area Chair (Streams, Sensor Networks, Complex Event Processing),
the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 2013).
- Associate Editor, IEEE Transactions on Knowledge and Data Engineering (TKDE), March 2009 - March 2013.
- Publicity Co-Chair, the 2013 International Conference on Database Systems for Advanced Applications (DASFAA 2013).
- Panel Co-Chair, the 2013 International Conference on Data Engineering (ICDE 2013).
- Senior Program Committee, the 2012 International Conference on Information and Knowledge Management (CIKM 2012).
- Best Paper Selection Committee, the 2012 National Database Conference of China (NDBC 2012).
- Program Committee, the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD 2012).
- Program Committee, the 2012 International Conference on Data Engineering (ICDE 2012).
- Program Committee, the 2011 International Conference on Data Engineering (ICDE 2011).
- Program Committee, the 2011 Conference on Innovative Data Systems Research (CIDR 2011).
- Program Committee, the 2010 International Conference on Very Large Data Bases (VLDB 2010).
- Program Committee, the 2010 International Workshop on Data Management for Sensor Networks (DMSN 2010).
- Program Committee Co-Chair, the 2010 International Conference on Web-Age Information Management (WAIM 2010).
- Program Committee, the 2010 International Conference on Data Engineering (ICDE 2010).
- Program Committee, the 2010 International Workshop on Ranking in Databases (DBRANK 2010).
- Program Committee, the 2009 International Workshop on Cloud Data Management (CLOUDDB 2009).
- Program Committee, the 2009 IFIP/ACM International Conference on Distributed Systems Platforms (MIDDLEWARE 2009).
- Program Committee, the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD 2009).
- Program Committee, the 2009 ACM Workshop on Data Engineering for Wireless and Mobile Access (MOBIDE 2009).
- Program Committee, the 2009 International Workshop on Scalable Stream Processing Systems (SSPS 2009).
- Regional Chair (America), the 2009 International Conference on Database Systems for Advanced Applications (DASFAA 2009).
- Program Committee, the 2009 International Conference on World Wide Web (WWW 2009).
- Program Committee, the 2009 International Workshop on Ranking in Databases (DBRANK 2009).
- Program Committee, the 2009 International Conference on Data Engineering (ICDE 2009).
- Program Committee, the 2009 Conference on Innovative Data Systems Research (CIDR 2009).
- Steering Committee Member, International Conference on Web-Age Information Management (WAIM), September 2008 - present.
- General Co-Chair and Program Committee Member, the 2008 International Workshop on Data Management for Sensor Networks (DMSN 2008).
- Program Committee, the 2008 International Conference on Information and Knowledge Management (CIKM 2008).
- Program Committee, the 2008 ACM Workshop on Data Engineering for Wireless and Mobile Access (MOBIDE 2008).
- Program Committee, the 2008 International Conference on Web-Age Information Management (WAIM 2008).
- Program Committee, the 2008 IEEE International Conference on Computational Science and Engineering (CSE 2008).
- Program Committee, the 2008 International Workshop on Scalable Stream Processing Systems (SSPS 2008).
- Program Committee, the 2008 International Conference on Very Large Data Bases (VLDB 2008).
- Program Committee, the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD 2008).
- Program Committee, the 2008 International Conference on Data Engineering (ICDE 2008).
- Program Committee Co-Chair, the 2007 International Workshop on Data Management for Sensor Networks (DMSN 2007).
- Demonstration Program Committee, the 2007 International Conference on Very Large Data Bases (VLDB 2007).
- Program Committee, the 2007 International Conference on Scalable Information Systems (INFOSCALE 2007).
- Program Committee, the 2007 International Symposium on Large Spatio-Temporal Databases (SSTD 2007).
- Program Committee, the 2007 Joint Conference of the Asia-Pacific Web Conference and the International
Conference on Web-Age Information Management (APWEBWAIM 2007).
- Program Committee, the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 2007).
- Program Committee, the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD 2007) Ph.D. Workshop on Innovative Database Research.
- Program Committee, the 2007 Workshop on Networking Meets Databases (NETDB 2007).
- Program Committee, the 2007 International Workshop on Scalable Stream Processing Systems (SSPS 2007).
- Program Committee, the 2007 International Conference on Data Engineering (ICDE 2007).
- Program Committee, the 2006 International Conference on Information and Knowledge Management (CIKM 2006).
- Program Committee, the 2006 International Conference on Geosensor Networks (GSN 2006).
- Program Committee, the 2006 International Workshop on Data Management for Sensor Networks (DMSN 2006).
- Program Committee, the 2006 International XML Database Symposium (XSYM 2006).
- Program Committee, the 2006 International Conference on Very Large Data Bases (VLDB 2006) Ph.D. Workshop.
- Program Committee Co-Chair, the 2006 Southeast Workshop on Data and Information Management (SEWDIM 2006).
- Program Committee, the 2006 International Conference on Web-Age Information Management (WAIM 2006).
- Program Committee, the 2005 International Conference on Data Mining (ICDM 2005).
- Program Committee, the 2005 ACM International Workshop on Web Information and Data Management (WIDM 2005).
- Program Committee, the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD 2005).
- Program Committee, the 2005 International XML Database Symposium (XSYM 2005).
- Program Committee, the 2005 International Conference on Very Large Data Bases (VLDB 2005) Ph.D. Workshop.
- Program Committee, the 2005 International Conference on Database Systems for Advanced Applications (DASFAA 2005).
- Publications Chair, the 2005 International Conference on Web-Age Information Management (WAIM 2005).
- Program Committee, the 2004 International Conference on Data Mining (ICDM 2004).
- Program Committee, the 2004 International Conference on Very Large Data Bases (VLDB 2004).
- Program Committee, the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2004).
- Demonstration Program Committee, the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004).
- Participant of the Summer Workshop on Developing the Field of Computational Journalism,
Center for Advanced Study in Behavioral Sciences, Stanford, California, July 2009.
- Panelist for NSF, 2003, 2004, 2005, 2008, 2009, 2010, 2011, 2013, 2014 (twice), and
2016..
- Panelist for Department of Homeland Security, 2006.
- Expert Panelist on Cancer Reporting Information Technology, Office of the Assistant
Secretary for Planning and Evaluation, Department of Health and Human Services, 2008 - 2009.
- Reviewer for Research Grants Council of Hong Kong, 2010, 2012.
- Reviewer for Natural Sciences and Engineering Research Council of Canada, 2008.
- Reviewer for Netherlands Organisation for Scientific Research, 2006.
- Associate Information Director, ACM SIGMOD, 2003 - present.
- Started Carolina
Database Research Group (CDB) in 2003 with a group of database
researchers in North Carolina and continue to be one of the main
organizers.
- Publicity Chair, the 2004 International Conference on Mobile Data Management (MDM 2004).
-
Reviewers for journals:
ACM Transactions on Database Systems (TODS),
The VLDB Journal (VLDBJ),
IEEE Transactions on Knowledge and Data Engineering (TKDE),
ACM Transactions on Programming Languages and Systems (TOPLAS),
ACM SIGMOD Record (SIGMODREC),
The Computer Journal (CJ),
Information and Computation (IC),
Information Processing Letters (IPL),
IEEE Transactions on Mobile Computing (TMC),
Data and Knowledge Engineering (DKE),
IEEE Internet Computing (INTERNET),
Information and Software Technology (IST),
Journal of Systems and Software (JSS),
Knowledge and Information Systems (KAIS),
Ad Hoc and Sensor Wireless Networks (AHSWN),
Journal of Research and Practice in Information Technology (JRPIT),
Journal of Computer Science and Technology (JCST),
Distributed and Parallel Databases (DPDB),
International Journal of Computer Systems Science and Engineering (CSSE),
LNCS Journal on Data Semantics (JODS),
Electronics and Telecommunications Research Institute Journal (ETRI),
Proceedings of the IEEE (PIEEE).
-
Reviewers for conferences:
ACM SIGMOD International Conference on Management of Data (SIGMOD),
International Conference on Very Large Data Bases (VLDB),
International Conference on Data Engineering (ICDE),
ACM Symposium on Principles of Database Systems (PODS),
International Conference on World Wide Web (WWW),
International Conference on Information and Knowledge Management (CIKM),
International Workshop on the Web and Databases (WEBDB),
ACM Symposium on Cloud Computing (SOCC),
International Symposium on Theoretical Aspects of Computer Science (STACS),
European Symposium on Algorithms (ESA),
International Conference on Distributed Computing Systems (ICDCS),
International Conference on Mobile Systems, Applications, and Services (MOBISYS),
USENIX Annual Technical Conference (USENIX),
ACM Symposium on Parallel Algorithms and Architectures (SPAA).
- Designer of the ACM SIGMOD logo,
IEEE Data Engineering logo,
Stanford InfoLab's old logo,
VLDB 2011 logo, and a number
of others.
Service to Duke University and the Department of Computer Science:
- Member of Ph.D. Admissions Committee, Department of
Computer Science, Duke University, 2001 - 2002, 2004 - 2005, 2005 - 2006, 2009 - 2010, 2011 - 2012, and 2023 - 2024.
- Member of Communications Committee, Department of
Computer Science, Duke University, 2005 - 2011, 2019 - 2020, and 2023 - 2024.
- Member of the Faculty Governance Committee, Duke Initiative for Science and Society, Duke University, October 2020 - present.
- Member of the Oversight Committee, the Lane Family Ethics in
Technology Program, Duke University, January 2019 - present.
- Member, Ad Hoc Committee on Linguistics at Duke, Duke University, 2023.
- Chair, Department of Computer Science, Duke University, July 2020 - June 2023.
- Member of Information Technology Advisory Council (ITAC), Duke University, Duke University, September 2013 - August 2022.
- Member, Search Committee for the Dean of Trinity
College of Arts and Sciences, Duke University, 2022.
- Chair of the Task Force on Teaching in AY20-21,
Department of Computer Science, Duke University, May 2020 - August 2020.
- Chair of Strategic Planning Committee, Department of Computer Science, Duke University, August 2019 - June 2020.
- Associate Chair, Department of Computer Science, Duke University, August 2017 - June 2020.
- Chair of Undergraduate Program Committee, Department of
Computer Science, Duke University, August 2017 - June 2020.
- Member of Strategic Planning Committee, Department of Computer Science, Duke University, January 2017 - May 2017.
- Member of Graduate Program Committee, Department of
Computer Science, Duke University, 2012 - 2017.
- Chair of Faculty Search Committee, Department of Computer Science, Duke University, 2014 - 2015.
- Chair of Graduate Recruiting/Admissions Committee,
Department of Computer Science, Duke University, 2007 - 2008, 2008 - 2009, and 2013 - 2014.
- Member of Infrastructure Committee, Department of Computer
Science, Duke University, 2006 - 2010 and 2012 - 2013.
- Director of Graduate Studies, Department of Computer Science, Duke University, July 2008 - June 2012.
- Member of the inDuke Steering Committee, Department of
Computer Science and School of Engineering, Duke University, 2005 - 2010.
- Colloquium Chair, Department of Computer Science, Duke University, 2002 - 2003 and 2004 - 2007.
- Triangle Computer Science Distinguished Lecture Series
Chair, Department of Computer Science, Duke University, 2002 - 2007.
- Member of Faculty Search Committee, Department of
Computer Science, Duke University, 2001 - 2005 and 2006 - 2007.
Other activities:
- Member of UC Berkeley Putnam Math Competition
Team, 1993 - 1994.
- Member of UC Berkeley Regents' and Chancellor's
Scholars Association, 1993 - 1995.