Position Statements

Bietz, Matthew
UC Irvine

Many universities offer an elective undergraduate class in "music appreciation" (and frequently a similar course in "art appreciation"). Music Appreciation classes recognize that while music is a significant part of our daily lives, most people have had little if any training in the history and theory of music. At a high level, the goal of Music Appreciation is to produce better-informed consumers of music by providing an introduction to basic concepts in music theory, situating different musical genres in their historical context, giving a glimpse of the breadth of musical possibilities, and providing a language to describe and discuss what they are hearing. The Music Appreciation course provides a model for a similar class in Data Appreciation. The course would be a survey course targeted at undergraduates with the goal of sensitizing them to basic concepts in information and data science.

 

The course would involve looking across different domains (scientific, engineering, humanities, etc.) to provide comparative understandings of how data is used and structured. The course would introduce both theoretical concerns (e.g. what is the truth claim in different kinds of data) and practical concerns (e.g. how to structure and apply metadata, or how to approach data quality). Through structured interactions with and around real data sets, students would get a chance to work through problems in data creation, storage, analysis, use, and reuse. Topics would also include the "data deluge," scientific controversies (e.g. Social Studies of Science approaches), and social aspects of data. While this course is not designed to produce professional data managers or highly-trained analysts, by raising the awareness of data science among users of data, the iSchools can make a significant impact outside their own walls. One desirable outcome is a set of practitioners across many domains who are better able to articulate their data needs and recognize that there are professionals who can help meet them.

Crowston, Kevin
Syracuse University

Data science refers to the constellation of tools and techniques for making sense of increasingly large volumes of data. Data scientists employ a grab bag of tools and techniques including programming, probability and statistical analysis, machine learning, high performance computing, database, visualization tools and semantic web. Students in information schools appreciate the starting point of data science, namely that data are valuable. However, there is a profound gap between most student's (and faculty's) starting skills sets and the broad skill set expected for data scientists. One particular gap is in mathematical ability, especially the ability to create statistical and other models to analyze data. A second gap is in programming ability, which underlies the ability to use many of the more advanced analytic tools. For iSchools to make an impact in data science will certainly require new courses (and possibly new faculty or at least some faculty retraining to teach those courses). It may further requiring recruiting a different set of students.

 

Curty, Renata
Syracuse University

Data science and analytics can be understood as a practical-oriented branch of E-Science, which primarily deals with big datasets available through network environments. Attention to this matter have emerged based on the vast R&D data available in sparse sources and different formats, and the increasing demand for scientific data outputs curation, management and analysis. Thus, iSchools are foreseeing a new opportunity driven by the industry/market to offer specific professionalization in this area. However still not clear is how data science content and curricula would differ or be complementary to degrees and certificates currently offered by iSchools (eg. Library and Information Science, Information Management). Without advocating pro old labels, neither stepping in the progressive librarianship discussion: What expertise "data science managers" would be expected to acquire that would stand them out in the job market?

 

John D'Ignazio
Syracuse University

With their historic focus on technology, people, and the information-related artifacts that move between them, iSchools are uniquely situated to add perspective and value in the burgeoning area of data science and analytics. This is the case not only because research and education that can be attributed to these schools, with contribution from both library science, information retrieval, and management information systems, offers society meaningful insight even though the concept of which they largely concern themselves, information, is both uncertain and contested. Data also suffers from this uncertainty: it has at least four definitions described in Luciano Floridi's entry on the subject in the International Encyclopedia of the Social Sciences, 2nd Ed., starting with a Latin translation as "things given or granted." It is the history of the field that despite this ambiguity, it is able to offer pragmatic development or theoretical insight that helps clarify information system inputs, outputs, and transferred artifacts such that they become of increasing value in human civilization as functioning resources. Blurring the distinction of Floridi's definitions, one finds another case for iSchool value-add in contributing to data science and analytic research and education. These definitions indicate that data is at a lower level of function in society, compared with prior resources of interest, ranging from large scale information technology implementation to collections of scientific publications. Data operates closer to the fundamental ways people and technology operate but at a scale that is well beyond human-lived experience. If this is the case, then iSchool solutions in understanding and developing systems of content structures, as well as understanding the worth of such phenomena for individuals interacting in organizations and society, will be critical to advance this complex and abstract domain.

 

Haslhofer, Bernhard
Cornell Information Science

I think there are several aspects of "data science" iSchools should care about: first there is the "infrastructure" aspect, which is all about retrieving, extracting, and publishing data. The Web plays a major role in this and scholars as well as data professionals should therefore know what the Web is and how it works, how data can be represented and exchanged, and how they can access make use of the diversity of Web APIs. Then there is the "analytics" aspect, which is highly domain-dependent. Besides teaching machine-learning methods, we should also consider novel methods, such as human computation, and teach how to apply all these methods on large data using cloud services. Finally, there is the "copyright and data policy" aspect. iSchools should teach about the various data licensing options we currently have and, in my opinion, emphasize and demonstrate the value of "open data". So far I was mainly teaching the "infrastructure" aspect (e.g. http://bit.ly/info4302). I am interested in how this can fit into a broader "data science and analytics" curriculum and hope that I can find some answers to this question at this workshop.

 

Kitlas, Joshua
Syracuse University

Big data has brought big challenges and big opportunities. At Google's 2010 Atmosphere convention, CEO Eric Schmidt said, "There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days." Data stores of this size are requiring a new set of professionals with new sets of skills. Industry and academia have appointed Data Scientists as the caretakers of this new discipline. Data Scientists have the skill sets to adequately manage and make sense of large data sets and will serve as its' stewards, curators, and investigators.

 

As a professional and academic field, Data Science remains largely unformed. In a commercial definition, Data Science enables the creation of data products. In an academic definition, it thought of as a way to enlarge the major areas of technical work of the field of statistics. These definitions vary greatly and have even deeper ambiguity when consulting pseudo business and science resources. Called a 'third pillar' of scientific inquiry, referred to as a return to 'what was once called data analysis', and even a transformed scientific method, the deeper you look, the broader the definition.

Data Scientists, and Data Science, are a moving target and lacking a clear and appropriate definition.

 

http://kitlas.com

 

Lesk, Michael
Rutgers

Evaluation of libraries has always been a difficult area; historically most users left no traces of whether they gained from any particular book or service. Web analytics permit digital libraries to measure and improve their offerings. They can show where users are located, how long they spend on each page, and which pages encourage them to stay on the library website. They can be part of a recommender system.

 

Students need to learn techniques: how to gather analytics data, how to pose questions, and how to visualize the answers. Perhaps more important, they need to understand the policy issues raised by analytics. Do questions impact user privacy? Is the goal of a library website to be "sticky" or does a quick departure imply an efficient interaction? Should users be treated differently depending on their location?

 

Analytics are widely used in commercial websites. Librarians should know how they work, whether or not the library goal is to imitate the commercial services.

 

Li, Ben
University of Oulu

Which successful scientific, business, state, or other enterprise does not embody a "data-driven research and decision-making environment"? To research managers and leaders who now rely (directly or through subordinates) on instant global search engines, lack of accessible ICTs scaled up and out to meet ever diversifying information sources understandably appears as a technological problem of "Big Data", rather than as a social problem of inadequate skills. But most of humanity's achievements occurred before silicon ICTs and modern universities were invented, lead by inspired explorers with occasional but regular help from Big Data professionals called "statisticians" and "librarians" who combine broad views of information to meet immediate local objectives. In the last two decades, we've replaced those roles and skills--with policy-based evidence and Google's generic information filters--and are somehow surprised that data discovery and analysis have become grinding.

 

Two paths are evident. Defined in terms of lack of capable data workers, higher education must understand why the rest of the world seems to be getting along fine with its messy Big Data opportunities. For higher education to lead in Big Data, data skills must once again be a core competence of all graduates and valued within the professional food-chain, rather than a technical skill for specialised data workers alone. And it must continue to produce statisticians and librarians, despite their comparative lack of direct impact on research metrics employed by popular and other reporting. However, if we define Big Data to mean an as yet undiscovered branch of science, the most innovative strategy for higher education is to simply support multi-disciplinary people to fail often with big data problems until they discover some bold and useful approaches. This, of course, is just enabling entrepreneurship and getting out of its way.

 

Liu, Qiyan
UIUC

I am Qiyuan Liu, a master's student in GSLIS at UIUC. I am interested in text mining and knowledge discovery. We are all living in an information-drive world. Information overload exerts a great influence on how to use huge amount of information more efficiently. In the field of library and information science, we need not only to focus on organizing information/data and make it convenient to retrieve, but also to analyze data in order to reveal hidden valuable patterns. For example, we can get knowledge domains of a research field by analyzing publications, the production of visualization will be more informative and easier to understand. This is the amazing of integrating technology into data analytics. I have had an opportunity to create software named SATI in order to get the preprocessed intermediate data such as similarity matrices of elements in surrogate records from Chinese full-text periodical databases. This is my first time to attend an international academic conference and I am desired to know more about data science and analytics from this workshop.

 

http://sati.liuqiyuan.com

 

Mostafa,  Javed
UNC Chapel Hill

While in recent times there has been significant attention paid to curation of scholarly data, the applications that make the data available and valuable, namely software for data digitization, preservation, and access services, (we collectively refer to these DPA software) have not received a similar level of consideration. It is our position that teaching of data curation must be accompanied with teaching of software and their complex dimensions. A basic understanding of software development is a necessary dimension but not sufficient. Along with that students must also gain knowledge of how complex software are selected for local contexts (with consideration to local demands), adopted to match local platforms, and maintained over long-term. With increasing relevance of DPA software, LIS professionals are now in greater demands to support software development and maintenance activities, particularly in large academic libraries.

 

Nichols, David
University of Waikato

In human-computer interaction we often focus on 'breakdowns' between users and systems to illustrate issues of user-centred design. Similarly, we can learn a lot about data practices from the 'failures' of the scholarly communication process. Paper retractions are a valuable source of information about several types of 'failure'. The Retraction Watch blog has several relevant categories including 'faked data', 'image manipulation' and 'not reproducible'. Several of the retractions highlighted in such locations would make interesting case-studies for education in data curation, as they cover both the initial publication and subsequent use of data-rich publications.

http://retractionwatch.wordpress.com/

Palmer, Carole
University of Illinois at Urbana-Champaign

Our field will need to move quickly and effectively in developing and promoting our data science programs in this fast paced and competitive field. As noted in a recent blog post on EMC’s new survey on data scientists in the corporate sector, “big data analytics is quickly becoming the next competitive ante” and data scientists are “the new rock star.”* Information schools are making good progress in research and education for the emerging data profession, but it will take coordinated effort and serious advances in our base of expertise to establish iSchools as credible and competitive contributors, let alone leaders. At Illinois, we began our data curation specialization in 2006 with a focus on research data in the sciences, in response to demands in academic and government sectors. We extended the curriculum to the humanities in 2008. Now we are working to provide a more comprehensive data science program with a formal data analytics component. As a field, we need to better understand the demand in the corporate sector but also how best to craft strong, distinct programs that are seen as a valuable alternative to what will be offered by computer science departments, domain-based informatics programs, and business schools.

 

* Hollis, Chuck. Understanding The New Rock Star: The EMC Data Science Survey. http://chucksblog.emc.com/chucks_blog/2011/12/understanding-the-new-rock-star-the-emc-data-science-survey.html.

 

Qin, Jian
Syracuse University

Research workflows and data flows vary greatly between science disciplines and result in different requirements for data management and use as well as the tools for performing the tasks of data management and use. The ability to communicate effectively with scientists or researchers at large to understand their workflows and data flows and translate this knowledge into user/system requirements is one of the most important qualities in a data professional. Such communication is not simply an interview of scientists, but rather, it is a “real-time” (in the brain) processing of information that requires knowledge and skills in data modeling and structures, metadata standards, and essential information technology and most importantly, ability to analyze and generalize the information that will be used to develop technical solutions.

 

Future information professionals specializing in data science and analytics should build strong technical communication and analysis skills that will enable them to work effectively no matter how the workflows and data flows change in different disciplines.

 

Rodenburg, Dirk
University of Toronto

The iSchool's most valuable contribution to the areas of Data Science and Analytics should be as a nexus for the interdisciplinary analysis and synthesis of the interactions arising from the intersection of information, human beings, culture and potentiality. That is both its great strength, and its great weakness. Interdisciplinary pursuits demand a great deal from its participants: they must be able to operate both broadly and deeply without sacrificing rigour, insight and relevance, all of which are, I'll suggest, continuously under siege within a technocracy. They must also be capable of continuous adaptation without the devolution or dilution of core principles or commitments.

 

The iSchool has always, it seems to me, struggled with the tension between its role as a professional training program, and its role as a scholarly discipline. Although I'm in no position to judge to what degree the iSchool can offer a program in data management and curation as a professional school (analogous to MIS), my gut sense is that it cannot operate as its own technical discipline per se unless it fully commits to a highly specialized curriculum with the pre-requisite of a degree in a technical discipline for entry (CS, engineering, math). I might argue that even if it does, the value of that program may simply erode as other more technically rooted programs tackle (perhaps more effectively) the same issues of data structure, query and mining, curation and human / machine interaction. If it does choose this root, however, both the curriculum and the institutions providing it will have to be of the highest technical caliber, and the iSchool will have to be in a position to offer, and stand behind, an advanced technical / specialized degree (equivalent to a CMA for example).

 

On the other hand, iSchools can offer interdisciplinary programs that explore, trace, tease out and challenge the points of intersection between information, interpretation, utility, veracity, ethics and knowledge, and how these are shaped by human capacities, culture, historical forces and media. In terms of data analytics, I could see concomitant specialized graduate programs being developed in privacy; information and data processing transparency, ethics and standards; the impact of man / machine integration (a critical component of human potentiality over the next 25 to 50 years); cognitive science, data synthesis and interpretation (including causality, predisposition, limits and biases, cultural interactions); data, media and culture; data, interpretation and social justice; data, government and policy.

 

The death of the iSchool will reside in a half-hearted commitment to any direction. In a technocratic world in which narrowly focused, specialized skills will be highly valued, we can live within those constraints, choose to point them out or develop the most intellectually rigorous, widely read and courageous graduates of any faculty.

 

Stanton, Jeff
Syracuse University

Some web exploration of contemporary developments in data science education suggests two notable trends: 1) a rise in interest in do-it-yourself, self guided learning, and 2) an emerging focus on gaining skills that were traditionally taught in computer science programs (e.g., writing code in JavaScript or Ruby). Both of these trends are good news for iSchools. Generally speaking, there seems to be a bit of an uneasy relationship between iSchools and computer science departments, with each eyeing the other territorially, wondering if there is a threat or an opportunity. These two new trends bode well for defusing the tension between iSchools and computer science departments: It is possible that many learners in the data science space can obtain basic levels of competence in scripting or programming from self-guided learning rather than classroom learning.

 

Tennis, Joseph T.
University of Washington

At this point in my understanding of data science and analytics (big data) is that this field is concerned with making sense of already amassed data in both structured and unstructured formats.  This requires two things: 1) that we understand how we categorize what we see, and 2) that we are able to tell a reliable story using the data.  In the former, we are concerned with understand the context, content, and structure of the data (traditionally concerns for metadata specialists), and in the latter we’re concerned with the provenance, authenticity, and trustworthiness of the data.  It is possible for us to make mistakes about categories and provenance.  Thus, it seems to me, one interesting question with regard to data science and analytics in iSchools is the relationship between systems and algorithmic approaches and the more “theoretical” disciplines of metadata, knowledge organization, and archival science and even diplomatics.  It seems to me that we’re managing digital records when we deal with data science, so it is perhaps up to iSchools to posit the holistic methodology for its future use and reliability.