ICPSR: A Consortial Model to Advance and Expand Social and Behavioral Research

The Inter-university Consortium for Political and Social Research (ICPSR) was founded over 50 years ago "to further the development of research in political science" (Miller 1963:11). Since that summer of 1962, the scope and range of services ICPSR offers has expanded significantly to encompass the wider social and behavioral research community, and from a handful of data collections to thousands. With over 750 consortial members from around the world, ICPSR is now a leader in preserving, curating, and providing access to scientific data so others can reuse the data and validate research findings. Much of the success of ICPSR can be traced back to the consortial model upon which the organization was founded, with members providing funding, input, and a sense of community. This article describes the history and current status of the consortium, and discusses upcoming challenges and opportunities. 1 Origins of the Consortium The brainchild of renowned political scientist Warren Miller, the "Inter-university Consortium for Political Research"1 was organized to archive and distribute the growing data collections generated in machine-readable formats, including the 1962 Congressional elections. At that time, community sharing of research data was not embraced, even as new computing infrastructure and data sharing technologies were being developed. According to one of the original members of the ICPSR staff, "the concept of giving access to all interested scholars to one’s basic (micro) data was so foreign as to be considered ’revolutionary’. Miller even likened it, in retrospect, to a violation of basic economic precepts: data were the scientist’s capital, and ’they weren’t about to share their capital’." Despite the scholarly community’s hesitation, Miller was successful in organizing a consortium of partner universities to support the archiving and dissemination of data. At the organization’s inception, 21 universities "joined with each other and 1"Social" was not added to the name until 1975. ICPR changed its name "to reflect the growing interest of sociologists in the affairs of the enterprise" (Austin 2007).

with the Survey Research Center of the University of Michigan to further the development of research in political science" (ICPSR 1963).Members initially came from the United States, each paying $2,500 in annual membership fees (Austin 2007), with international members joining in the years immediately following.
Each member university was represented by "one person [Official Representative] chosen by each participating unit" (ICPSR 1963).Official Representatives played an important role by serving as liaisons between their institutions and ICPSR; for the first several decades, data tapes and hard copy codebooks were requested by, mailed to, and stored locally by Official Representatives.The transfer of data was no small task.In 1965-66, ICPSR transferred 3,555,600 card images; transfers expanded dramatically over time, with 103,443,394 card images in 1975-76 and  3,741,396,924 card images in 1987-88.(Blalock  et al. 1989:20)  The first 'Committee of Official Representatives' "affirmed the interest of the participating schools in four general objectives: (1) the development of data resources; (2) the establishment of a formal training program for graduate students and faculty; (3) the stimulation and facilitation of new research; and (4) the operation of a clearing house for the improved communication of information about ongoing research" (ICPSR 1963).In essence, "the search, identification, and recording phases that had been basic for all research scholars in the social sciences to this point were to become a community exercise" (Blalock et al. 1989:42).The Official Representatives also elected an executive Council of five members "with authority to act on behalf of the member institutions."(ICPSR 1962) Early Council members assisted with recruitment of new ICPSR members, represented a diversity of fields within political science, and "served as chairs of advisory committees that not only provided lists of high quality data that needed to be archived but, equally important, they were well enough known within their fields that they could play more informal roles in inducing Principal Investigators to turn their data over to the ICPR."(Blalock et al.  1989:34)   Council also engaged in diplomacy to calm fears that centralization of data resources would give one institution -in this case, Michigan -"undue advantage (...) and deprive scholars seeking to build data libraries at their own institutions of 'trading stamps' which could be used to obtain interesting data sets developed elsewhere."(Blalock et al.  1989:47) Just as individual researchers were territorial about the data they worked so hard to collect and analyze, so too did organizations fight for data resources.Eventually, more and more organizations recognized the value of the consortial model of collaboration.Researchers would get access to data "too expensive to generate locally" and ICPSR, in addition to the financial buoy of membership dues, would be able to mobilize "the research community in support of major grant applications."(Blalock et al. 1989:41) As Warren Miller wrote in 1966 to one early data requester, "in addition to the principle (...) that communal resources should be made as widely available as possible, the Consortium also rests on the principle of shared institutional support for the generation of such resources."(Miller 1966)  Early ICPSR funding came from a mix of membership dues and grant funding, and "grew both on the basis of shrewd and imaginative planning and unanticipated fortuitous circumstances."(Blalock  et al. 1989:46) Funding sources included the National Science Foundation, the National Endowment for the Humanities, the Social Science Research Council, the Stern Family Fund, International Business Machines, the Mathematical Social Science Board, and the Ford Foundation.ICPSR's growing Summer Program for training in quantitative methodology, established to train and connect with the upcoming generations of social scientists, was another revenue source.This diversified funding was critical to ICPSR's financial viability given later economic ebbs and flows caused by government cutbacks and fiscal stringency at member institutions.
Collection development at ICPSR originally balanced collecting significant existing survey data with aggregating "data bearing on political and social phenomena in historical depth."(Austin 2007)  Regarding criteria for acquired data, Council meeting minutes from October 1962 specified:

CC-BY
Acquired data should also have some obvious value either (1) as being sufficiently rich conceptually to bear heavy secondary analysis; (2) as training materials; and (3) as referring to sufficiently significant units of analysis (e.g., nation rather than some single country or community) that the material would provide benchmarks for later longitudinal study of change.Finally, it was recognized that these criteria might be relaxed somewhat in the case of data which were likely to disappear entirely if not stored.In sum, the Council advocated a selective program of domestic data acquisition following the above criteria.(ICPSR  1963)   Acquired data were typically processed, or 'cleaned', to insure internal quality control standards were met, which helped improve usability and long-term preservation of these assets for member institutions.From an early report of data processing activities (1962-63): "Cleaning has involved a variety of operations.Every column in each deck is checked for double punches, blank columns, and punches that do not correspond with code categories (...).A major part of the cleaning operation involves contingency checks (...).We feel that the potentially wide use of collected data, made possible by the Consortium, necessitates reducing ambiguities and inconsistencies insofar as possible."(ICPSR 1963)   2 The Consortium Today Fifty-two years since its inception, ICPSR as an organization is more robust than ever, and contin-ues to draw on many of the founding decisions and principles established by Warren Miller, the Council, and the Committee of Official Representatives.
Consortium membership now totals 752 institutions around the globe.2Growth of the organization, in terms of revenues, number of member institutions, and staff size, is outlined in Table 1 13 .These topical archives in essence collaborate with the membership to create a shared infrastructure and a whole that is larger than the sum of its parts.

Governance
ICPSR's governing Council is now composed of 12 persons (instead of 5) elected by the members to serve staggered four-year terms 14 , and meets three times each year 15 .They continue to govern ICPSR and provide an advisory role on policies and future directions. 16ke ICPSR also links to over 64,000 data-related citations 18 , "to encourage the replication of scientific results, to improve research standards, and to give proper credit to data producers" 19 .ICPSR's systems and infrastructure are based on community and international standards, such as the Reference Model for an Open Archival Information System (OAIS) 20 and the Trustworthy Repositories Audit and Certification (TRAC) 21 .ICPSR also takes the lead in developing data standards, including the Data Documentation Initiative (DDI), an XML-based metadata specification for describing the entire data lifecycle. 22Over forty institutions from around the world belong to the DDI Alliance, which develops the DDI specification.ICPSR is the lead partner in the Data Preservation Alliance for the Social Sciences (Data-PASS), "a voluntary partnership of organizations created to archive, catalog and preserve data used for social science research"23 .
ICPSR's data processing and curation standards have been refined over the years and published into a community-embraced document, "Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle" (ICPSR 2012).Curation ensures that data collections are organized, described, cleaned, enhanced, and preserved for public use, and is a hallmark of ICPSR data collections.The "ICPSR Pipeline Process" (see Figure 1) "(...) starts with a confidentiality review, designed to make sure that there are no direct identifiers (e.g., name, social security number, telephone number, etc.) in the data file.It continues with a check to make sure that the data match the documentation.If inconsistencies, errors, or confidentiality risks are found, the data processor works with the original researcher to correct the issue.The process concludes with the construction of a final, permanent version of the data, alternative versions (such as in SAS, Stata, SPSS, or tab-delimited files), and final documentation.The final processing steps can also include the creation of versions for online analysis" (Albright  and Lyle 2010:19-20).

Funding Models
The first area of interest for ICPSR is the issue of sustainable funding to pay for sharing, archiving, and preservation of research data.These activities carry real costs.While government mandates increase attention and help generate buzz, rarely do they provide additional financial assistance.Unfunded mandates can stretch already thin resources to the breaking point.Who will pay?Will they pay?How do they pay?The researchers, who must balance the present and pressing costs of research and analysis with future requests for storage and access?What if an in-vestigator nears the end of her grant funding and desires to eke out a little more analysis at the expense of archiving and preservation?Will archives ultimately shoulder the financial burden?Many archives, including ICPSR, are partially reliant on grants and contracts that run on three-or five-year cycles."Long-term access to data requires durable institutions that plan on a scale of decades and even generations" (Lyle, Alter, and Vardigan 2013), yet long-term planning is extremely challenging with short-term funding cycles.
To help address these issues, ICPSR, with funding from the Alfred P. Sloan Foundation, convened a fall 2013 meeting of 22 domain repositories spanning the social and natural sciences.As an outcome of the meeting, a white paper was drafted, which included a call for change: "Domain repositories must be funded as the essential piece of the U.S. research infrastructure that they are."(Ember and Hanisch 2013:3) Five principles were proposed to encourage sustainable data repositories: 1) Agencies need to support archiving and sharing with funding to assure the long-term preservation and viability of research data; 2) Cooperation among funding agencies, universities, domain repositories, journals and other stakeholders is essential; 3) Agencies must support trained professionals, organizations with the capacity to persist over time, and community standards for metadata and preservation; 4) Review criteria should be established to evaluate that data repositories are consistent with their mission as institutions entrusted with the long-term preservation of scientific memory; 5) Agencies need to incentivize investigators to archive their data.(Ember and Hanisch 2013:11-12)  ICPSR continues to diversify funding and explore new funding models.In early 2014, ICPSR added a new topical archive, National Archive of Data on Arts & Culture (NADAC), sponsored by the National Endowment for the Arts. 24Membership funding continues to grow.Additional products and services are being offered, including a fee- based secure virtual data environment for accessing confidential data. 25It is important to emphasize that ICPSR generates existing and new revenue to provide long-term, trusted access to scientific data, and not to generate profits.

Curation
Although recent research has shown that the number of data collections formally archived is very low (Pienta, Alter, and Lyle 2010), with many data collections lost (Vines et al. 2014), expectations are growing that more and more data will be shared, stored, and even preserved.This is a win-win for everyone: funders, researchers, and users.Secondary data use enables verification and replication of results, encourages new lines of research, discourages scientific misconduct, saves time, reduces costs, and maximizes access.(Niu 2006) Not all data are alike, however.Many data collections, while publicly available, are in inaccessible formats, poorly described, and full of errors.Unusable data equal lost data.(Peer, Green,  and Stephenson 2014) To truly maximize access to data, one needs to curate, or enhance, it.Curation is the work taken to make data usable, complete, and self-explanatory. 26CPSR has curated data since its founding and continues to intensively curate many of its data collections.This careful attention to detail is one primary differentiator of ICPSR's product from other data providers 27 , especially as the number of data repositories is growing at what seems like an exponential rate. 28PSR recognizes the opportunity to promote curation to the community, and does so through publications (ICPSR 2012), Web sites 29 , training 30 and advocacy.In response to the 2013 Office of Science and Technology Policy (OSTP) memorandum directing federal agencies to develop plans to support increased public access to data, ICPSR issued a formal public comment 31 encouraging federal agencies to, essentially, curate their funded data, by making data: 1. Discoverable with standardized, machineactionable metadata.2. Meaningful and usable through enhancement and cleaning.3. Persistent, through proactive digital preservation.4. Trustworthy, by encouraging deposits in community certified repositories.5. Confidential (when applicable).6. Citable, through adding basic references.

Open Access
Funders of research are increasingly requiring open access to the data they support.By one popular definition, open access data should be "available as a whole and at no more than a reasonable reproduction cost, (...) provided under terms that permit reuse and redistribution", and "everyone must be able to use, reuse and redistribute" 32 .
ICPSR strongly supports maximizing access to data.In fact, many of ICPSR's data collections, 26 The ISO Reference Model for an Open Archival Information System (OAIS), although not specifically referring to curation, captures the essence of curation when describing the role of an Archive: "The OAIS shall (...) Ensure that the information to be preserved is Independently Understandable to the Designated Community.In particular, the Designated Community should be able to understand the information without needing special resources such as the assistance of the experts who produced the information."(CCSDS 2012:3-1) (emphasis added) 27 ICPSR's users have come to expect the curation, as expressed in this June 2, 2014 tweet "Horribly documented ICPSR datasets are among the few things that will incite an academic to lash out violently".https: //twitter.com/jcpeterson13/status/473546934985052160(as of: 06/26/2014).
including the topical archives, are available at no cost to users.The original ICPSR Memorandum of Organization, states "In general, ICPSR data resources and technical services are available on an 'open access' basis" (Blalock et al. 1989:Appendix  C).
Still, the Memorandum of Organization is explicit that ICPSR services, data or facilities funded by the consortium are provided "only in a manner that imposes no handicap on ICPSR members.In providing services, data or facilities, non-members will defray all associated costs plus additional charges required to maintain equity with members" (Blalock et al. 1989:Appendix C).This exposes the primary limitations of the consortium funding model: "Membership models do not provide public access to data, and they favor researchers at institutions with more resources."(Ember and Hanisch  2013:11)  In response to these requirements and calls for public access to federally funded research data, ICPSR recently launched openICPSR 33 ."Currently in beta version, openICPSR is a re-search data-sharing service that allows depositors to rapidly self-publish research data, enabling the public to access the data without charge -or in the case of restricted-use data, for nominal charge."(Lyle 2014:55)  "The openICPSR service is an alternative to the standard ICPSR deposit process, providing immediate, self-controlled distribution of data and metadata, albeit without the rich curation and preservation features; data in openICPSR are distributed and preserved as-is."(Lyle 2014:55) Data in openICPSR are licensed under a Creative Commons Attribution 4.0 International License, which permits use, reuse and redistribute for any purpose, even commercial.Although openICPSR data are distributed as-is, they can still be selected for curation, either at the request of the depositor or upon selection by ICPSR staff.
The membership model is successful and has sustained ICPSR over fifty years.New models, including providing open access, are part of the organization's constant evolution to address the needs of funders, users, and members.

Conclusion
ICPSR, with over 750 consortial members from around the world, is a leader in data archiving.Much of the success of ICPSR can be traced back to the consortial model upon which the organization was founded, with members providing funding, input, and a sense of community.Standards and archival processes developed in the earlier years, such as intensive data curation, continue to define the ICPSR brand and enhance the digital assets available to the research community.Although challenges arise, and will continue to arise, ICPSR continues to derive opportunities from the challenges.

Figure 1 :
Figure 1: The ICPSR Pipeline Process: How ICPSR Acquires, Archives, and Disseminates a Typical Study

Figure 2 :
Figure 2: Screenshot of the openICPSR home page

Table 1 :
3. Topical archives are organized around specific topics or themes and typically are funded by agencies in the federal government and private foundations seeking to distribute data.The National Archive of Computerized Data on Aging (NACDA) 7 was the first topical archive established at ICPSR, in 1976.The National Archive of Criminal Justice Data (NACJD) 8 followed in 1978.NACDA is now funded by the National Institute on Aging; NACJD is funded by the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Preven- Growth of ICPSRMembership revenues continue to form a stable, and ever growing, base for funding the consortium's activities.So, too, does ICPSR's Summer Program in Quantitative Methods of Social Research, which has grown from approximately 60 graduate student and faculty participants in 1963 to more than 1,100 participants in "over 90 beginning and advanced courses at our primary location in Ann Arbor and several 'off-site' institutions" 4 in 2014.2.1 Diversified FundingTopical archive funding 5 , as well as grants and contracts 6 , now comprise the majority of ICPSR's revenue and contribute to the further diversification of resources.