Integrating Data, Integrating Science

Stay connected



Share on facebook
Share on twitter
Share on linkedin

CIS Colloquium, Oct 22, 2014, 11:00AM – 12:00PM, SERC 306

Integrating Data, Integrating Science

Zachary Ives , University of Pennsylvania

Data is the lifeblood of modern policy-making, decision-making, and scientific discovery. Hence it should be no surprise that “big data” analysis is in high demand. Yet in a tremendous number of settings, “big data” is more conceptual than real. The relevant data is actually spread across a plethora of Web pages, databases, spreadsheets, and PDFs, some of which may not even be publicly available, and many of which use different notation and make different baseline assumptions. How do we put all of this data together in order to reason about relationships and make decisions or discoveries? This requires both technical solutions as well as changes in culture. Science makes an excellent venue for studying both the technical and nontechnical aspects of large scale collaborative data sharing. In this talk will describe our research on the Q System, a scalable, incremental, community-driven platform for integrating data as needed, using techniques such as algorithmic matching, machine learning, and collaborative filtering. Then I will outline our efforts to bring these ideas to the neuro-science community, in the platform, and the lessons learned in incentivizing data sharing. I will finish by describing our vision for where community-scale data sharing is headed. Joint work with Allen Zhepeng Yan, Nan Zheng, Partha Talukdar, Cong Yu, Fernando Pereira, Sudipto Guha, Brian Litt, Joost Wagenaar, Greg Worrell, Ben Brinkmann.

Dr. Zachary Ives is a Professor and the Markowitz Faculty Fellow at the University of Pennsylvania. His research in terests include data integration and sharing, “big data”, sensor networks, bioinformatics, and data provenance and authoritativeness. He is a recipient of the NSF CAREER award, an alumnus of the DARPA Computer Science Study Panel and Information Science an d Technology advisory panel, and a former Visiting Scientist at Google. He has been awarded the Christian R. and Mary F. Lindback Foundation Award for Distinguished Teaching, a Best Paper Runner-up for ICDE 2012, and a 2013 ICDE Influential Paper Award. He serves as the undergraduate curriculum chair for Penn’s Singh Program in Networked and Social Systems Engineering, and is a co-author of the textbook Principles of Data Integration. He is an Associate Editor for the VLDB Journal and the IEEE Transaction s and Data and Knowledge Engineering, and is Program Co-Chair of ACM SIGMOD 2015.