Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


Great Innovative Idea: A Task-Centric Framework to Revolutionize Big Data Systems Research

September 11th, 2018 / in Announcements, CCC, Great Innovative Idea / by Helen Wright

The following Great Innovative Idea is from Da Yan, tenure-track assistant professor in the Department of Computer Sciences at the University of Alabama at Birmingham (UAB). Yan presented his poster, A Task-Centric Framework to Revolutionize Big Data Systems Research, at the Computing Community Consortium (CCC) Early Career Researcher Symposium, August 1-2, 2018.

The Idea

Big Data frameworks such as Apache Hadoop and Apache Spark are becoming increasingly popular due to their emphasis on ease of programming, but they are dominantly designed for data-intensive iterative computations, and there lacks an efficient solution to compute-intensive Big Data analytics. Based on my insight that compute-intensive problems are often solved by divide and conquer (e.g., a recursive algorithm), a general task-centric framework, called T-thinker, is developed for compute-intensive Big Data problems. The framework effectively utilizes the CPU cores in a cluster by properly dividing a problem over a big dataset into tasks over smaller subsets of the dataset, and by overlapping CPU processing with network communication (e.g., for requesting a subset of dataset).

Impact

Many compute-intensive applications can be built on top of T-thinker for efficient parallel execution, such as community detection, subgraph matching, training decision trees, frequent pattern mining, facility location problems and matrix computations. A successful example is the graph mining system G-thinker open-sourced at http://www.cs.uab.edu/yanda/gthinker/. T-thinker will greatly benefit researchers and practitioners who need compute-intensive tools for processing Big Data (which is currently lacking).

Other Research

Dr. Da Yan’s research interests include Big Data analytics systems, algorithms for processing Big Data, parallel/distributed computing, data mining and machine learning.

Researcher’s Background

Dr. Da Yan is currently a tenure-track Assistant Professor at the Department of Computer Science, the University of Alabama at Birmingham. He is the sole winner of Hong Kong 2015 Young Scientist Award in Physical/Mathematical Science, and the recipient of DASFAA 2011 Best Paper Award. He has developed a comprehensive platform of systems, collectively called BigGraph@CUHK, for data-intensive iterative big graph analytics. These systems are orders of magnitude faster than their competitors, and have been used by other researchers in their work published in top venues such as SIGMOD, ICDE, IEEE Cluster, etc. Dr. Yan regularly publishes in 1st-tier conferences and journals like SIGMOD, PVLDB, SIGKDD, ICDE, WWW, TKDE, TPDS, SoCC, EuroSys, etc. He was invited as the 1st author to write 2 books in Foundations and Trends in Databases and Springer Briefs in Computer Science, respectively, and a book chapter in Encyclopedia of Big Data Technologies. He also regularly serves as the reviewers of top journals including TODS, VLDBJ, TKDE, TPDS, WWWJ, TNSE, etc., and serves in the program committees of top conferences such as SIGMOD 2019, PVLDB 2018, IJCAI 2017, ICPP 2018, ICA3PP 2017, IRI 2017-2018, ICPADS 2016, etc. Dr. Yan is the leading program co-chair of the BIOKDD 2018 workshop held in conjunction with SIGKDD 2018, and serves in the PCs of a number of other workshops on database and data mining research. Dr. Yan’s research is sponsored by NSF, Microsoft Azure, and South Big Data Hub.

Links

http://www.cs.uab.edu/yanda

To view more Great Innovative Ideas, please click here.

Great Innovative Idea: A Task-Centric Framework to Revolutionize Big Data Systems Research

Comments are closed.