Data Science & Analytics: What Is in It for iSchools?
Jian Qin, Syracuse University
Kevin Crowston, Syracuse University
The ability to manage, discover, and utilize data has become a critical enabling power for organizations to maintain a high level of productivity and effectiveness in the data-driven research and decision making environment. No matter whether the data comes from scientific research, business planning and transactions, or community social networks, there is an increasing demand for data scientists and analysts from research and development (R&D) engagements ranging from academic institutions, government research centers, to corporate world. E-Science and Big Data are creating new jobs that require highly trained data professionals to manage, collaborate, and support data-driven R&D.
New academic programs and curricula in data science and data curation have been developed at UIUC, UNC, Rutgers, and Syracuse University to address such a demand for data professionals. While these efforts clearly represent an emerging field of education and research as well as job market for our graduates, they are more or less in the exploration stage and much to be researched and developed. At this critical juncture, it is imperative that we provide a platform to share our experience, successful or otherwise, and brainstorm for innovative ideas in the iSchool community. We believe that such information sharing and brainstorming will offer an opportunity for the community to collectively define the role and contribution that iSchools may play and make, which could be a branding opportunity as well for the iSchools.
Questions to be addressed include but not limited to: What is data science and analytics? What does it mean to iSchools? What can be learned from our explorations in data curation and data science research and education activities? Can iSchools make a significant contribution, or better yet, lead the research and learning in this emerging domain? These are the fundamental questions for us as a community to address and develop strategies.
This workshop is designed as an interactive platform for information sharing and brainstorming. It includes a panel with 3-4 speakers to sharing their curriculum development and pedagogy in data curation and management, an interactive brainstorming session to identify research areas that build on iSchools’ strengths, and a report session to summarize the focused group discussion results.
Half Day Workshop
Abstract: Data science refers to the constellation of tools and techniques for making sense of increasingly large volumes of data. Data scientists employ a grab bag of tools and techniques including programming, probability and statistical analysis, machine learning, high performance computing, database, visualization tools and semantic web. Students in information schools appreciate the starting point of data science, namely that data are valuable. However, there is a profound gap between most student's (and faculty's) starting skills sets and the broad skill set expected for data scientists. One particular gap is in mathematical ability, especially the ability to create statistical and other models to analyze data. A second gap is in programming ability, which underlies the ability to use many of the more advanced analytic tools. For iSchools to make an impact in data science will certainly require new courses (and possibly new faculty or at least some faculty retraining to teach those courses). It may further requiring recruiting a different set of students.
9:15-10:30: Panel: Data science and analytics research and education in iSchools (Moderator: Kevin Crowston)
- Michael Lesk, Rutgers
- Catherine (Cathy) Blake, UIUC
- Javed Mostafa, UNC Chapel Hill
- Jian Qin, Syracuse University
Abstract: Evaluation of libraries has always been a difficult area; historically most users left no traces of whether they gained from any particular book or service. Web analytics permit digital libraries to measure and improve their offerings. They can show where users are located, how long they spend on each page, and which pages encourage them to stay on the library website. They can be part of a recommender system.
Students need to learn techniques: how to gather analytics data, how to pose questions, and how to visualize the answers. Perhaps more important, they need to understand the policy issues raised by analytics. Do questions impact user privacy? Is the goal of a library website to be "sticky" or does a quick departure imply an efficient interaction? Should users be treated differently depending on their location?
Analytics are widely used in commercial websites. Librarians should know how they work, whether or not the library goal is to imitate the commercial services.
Socio-technical Data Analytics (SoDA) Group
Catherine (Cathy) Blake, UIUC
Abstract: The information lifecycle can help frame our ongoing conversations surrounding big data, data science, e-science, data curation and data analytics. Although the earlier data curation stages of the lifecycle are currently well represented in iSchools, the latter stages – in particular reuse and analysis activities – are less emphasized. Faculty in the sociotechnical data analytics (SODA) group at the Graduate School of Library and Information Science (GSLIS) design, develop, and evaluate new technologies in order to better understand the dynamic interplay between information, people and information systems. This presentation will provide examples from the SODA group to demonstrate the interplay between research and teaching needs, and how the SODA group activities complement existing data curation efforts within the Center for Informatics in Science and Scholarship (CIRSS).
Javed Mostafa, UNC Chapel Hill
Abstract: While in recent times there has been significant attention paid to curation of scholarly data, the applications that make the data available and valuable, namely software for data digitization, preservation, and access services, (we collectively refer to these DPA software) have not received a similar level of consideration. It is our position that teaching of data curation must be accompanied with teaching of software and their complex dimensions. A basic understanding of software development is a necessary dimension but not sufficient. Along with that students must also gain knowledge of how complex software are selected for local contexts (with consideration to local demands), adopted to match local platforms, and maintained over long-term. With increasing relevance of DPA software, LIS professionals are now in greater demands to support software development and maintenance activities, particularly in large academic libraries.
Another Kind of Analysis in Data Science and Analytics
Jian Qin, Syracuse University
Abstract: Research workflows and data flows vary greatly between science disciplines and result in different requirements for data management and use as well as the tools for performing the tasks of data management and use. The ability to communicate effectively with scientists or researchers at large to understand their workflows and data flows and translate this knowledge into user/system requirements is one of the most important qualities in a data professional. Such communication is not simply an interview of scientists, but rather, it is a "real-time" (in the brain) processing of information that requires knowledge and skills in data modeling and structures, metadata standards, and essential information technology and most importantly, ability to analyze and generalize the information that will be used to develop technical solutions. Future information professionals specializing in data science and analytics should build strong technical communication and analysis skills that will enable them to work effectively no matter how the workflows and data flows change in different disciplines.
10:30-11:00: Coffee break
11:00-12:00: Brainstorming (moderator: Jeff Stanton): iSchools’ strengths in data science and analytics research and education, audience divided into focused groups
12:00-12:30: Reports from groups and discussion (Moderator: Jeff Stanton): Data science education and research—envisioning the near future
We expect that this workshop will produce the following outcomes:
- A collection of reports on current activities and practices in research and educational programs for data science and analytics, which will be made available on iConference website and widely distributed; and
- A report summarizing the research areas identified from the interactive brainstorming session to be submitted to D-Lib Magazine for publication.