027.7 Zeitschrift für Bibliothekskultur 2,1 (2014): Konsortien & Konsorten, S. 19-29.

DOI: 10.12685/027.7-2-1-49

ISSN: 2296-0597

ICPSR: A Consortial Model to Advance and Expand Social and Behavioral Research

Jared Lyle


The Inter-university Consortium for Political and Social Research (ICPSR) was founded over 50 years ago "to further the development of research in political science" (Miller 1963:11). Since that summer of 1962, the scope and range of services ICPSR offers has expanded significantly to encompass the wider social and behavioral research community, and from a handful of data collections to thousands. With over 750 consortial members from around the world, ICPSR is now a leader in preserving, curating, and providing access to scientific data so others can reuse the data and validate research findings. Much of the success of ICPSR can be traced back to the consortial model upon which the organization was founded, with members providing funding, input, and a sense of community. This article describes the history and current status of the consortium, and discusses upcoming challenges and opportunities.

1. Origins of the Consortium

The brainchild of renowned political scientist Warren Miller, the "Inter-university Consortium for Political Research" [1] was organized to archive and distribute the growing data collections generated in machine-readable formats, including the 1962 Congressional elections. At that time, community sharing of research data was not embraced, even as new computing infrastructure and data sharing technologies were being developed. According to one of the original members of the ICPSR staff, "the concept of giving access to all interested scholars to one’s basic (micro) data was so foreign as to be considered ’revolutionary’. Miller even likened it, in retrospect, to a violation of basic economic precepts: data were the scientist’s capital, and ’they weren’t about to share their capital’."

Despite the scholarly community’s hesitation, Miller was successful in organizing a consortium of partner universities to support the archiving and dissemination of data. At the organization’s inception, 21 universities "joined with each other and with the Survey Research Center of the University of Michigan to further the development of research in political science" (ICPSR 1963). Members initially came from the United States, each paying $2,500 in annual membership fees (Austin 2007), with international members joining in the years immediately following.

Each member university was represented by "one person [Official Representative] chosen by each participating unit" (ICPSR 1963). Official Representatives played an important role by serving as liaisons between their institutions and ICPSR; for the first several decades, data tapes and hard copy codebooks were requested by, mailed to, and stored locally by Official Representatives. The transfer of data was no small task. In 1965-66, ICPSR transferred 3,555,600 card images; transfers expanded dramatically over time, with 103,443,394 card images in 1975-76 and 3,741,396,924 card images in 1987-88. (Blalock et al. 1989:20)

The first ’Committee of Official Representatives’ "affirmed the interest of the participating schools in four general objectives: (1) the development of data resources; (2) the establishment of a formal training program for graduate students and faculty; (3) the stimulation and facilitation of new research; and (4) the operation of a clearing house for the improved communication of information about ongoing research" (ICPSR 1963). In essence, "the search, identification, and recording phases that had been basic for all research scholars in the social sciences to this point were to become a community exercise" (Blalock et al. 1989:42). The Official Representatives also elected an executive Council of five members "with authority to act on behalf of the member institutions." (ICPSR 1962) Early Council members assisted with recruitment of new ICPSR members, represented a diversity of fields within political science, and "served as chairs of advisory committees that not only provided lists of high quality data that needed to be archived but, equally important, they were well enough known within their fields that they could play more informal roles in inducing Principal Investigators to turn their data over to the ICPR." (Blalock et al. 1989:34)

Council also engaged in diplomacy to calm fears that centralization of data resources would give one institution – in this case, Michigan – "undue advantage (...) and deprive scholars seeking to build data libraries at their own institutions of ’trading stamps’ which could be used to obtain interesting data sets developed elsewhere." (Blalock et al. 1989:47) Just as individual researchers were territorial about the data they worked so hard to collect and analyze, so too did organizations fight for data resources. Eventually, more and more organizations recognized the value of the consortial model of collaboration. Researchers would get access to data "too expensive to generate locally" and ICPSR, in addition to the financial buoy of membership dues, would be able to mobilize "the research community in support of major grant applications." (Blalock et al. 1989:41) As Warren Miller wrote in 1966 to one early data requester, "in addition to the principle (...) that communal resources should be made as widely available as possible, the Consortium also rests on the principle of shared institutional support for the generation of such resources." (Miller 1966)

Early ICPSR funding came from a mix of membership dues and grant funding, and "grew both on the basis of shrewd and imaginative planning and unanticipated fortuitous circumstances." (Blalock et al. 1989:46) Funding sources included the National Science Foundation, the National Endowment for the Humanities, the Social Science Research Council, the Stern Family Fund, International Business Machines, the Mathematical Social Science Board, and the Ford Foundation. ICPSR’s growing Summer Program for training in quantitative methodology, established to train and connect with the upcoming generations of social scientists, was another revenue source. This diversified funding was critical to ICPSR’s financial viability given later economic ebbs and flows caused by government cutbacks and fiscal stringency at member institutions.

Collection development at ICPSR originally balanced collecting significant existing survey data with aggregating "data bearing on political and social phenomena in historical depth." (Austin 2007) Regarding criteria for acquired data, Council meeting minutes from October 1962 specified:

Acquired data should also have some obvious value either (1) as being sufficiently rich conceptually to bear heavy secondary analysis; (2) as training materials; and (3) as referring to sufficiently significant units of analysis (e.g., nation rather than some single country or community) that the material would provide benchmarks for later longitudinal study of change. Finally, it was recognized that these criteria might be relaxed somewhat in the case of data which were likely to disappear entirely if not stored. In sum, the Council advocated a selective program of domestic data acquisition following the above criteria. (ICPSR 1963)

Acquired data were typically processed, or ’cleaned’, to insure internal quality control standards were met, which helped improve usability and long-term preservation of these assets for member institutions. From an early report of data processing activities (1962-63):

"Cleaning has involved a variety of operations. Every column in each deck is checked for double punches, blank columns, and punches that do not correspond with code categories (...). A major part of the cleaning operation involves contingency checks (...). We feel that the potentially wide use of collected data, made possible by the Consortium, necessitates reducing ambiguities and inconsistencies insofar as possible." (ICPSR 1963)

2. The Consortium Today

Fifty-two years since its inception, ICPSR as an organization is more robust than ever, and continues to draw on many of the founding decisions and principles established by Warren Miller, the Council, and the Committee of Official Representatives.

Consortium membership now totals 752 institutions around the globe. [2] Growth of the organization, in terms of revenues, number of member institutions, and staff size, is outlined in Table 1 [3].

Year Revenues Member Institutions Staff
1962-63 $64,300 25
1972-73 $900,300 148 46
1982-83 $2,044,061 270 62
2002-03 $9,883,979 373 112
2012-13 $18,826,424 740 115

Table 1: Growth of ICPSR

Membership revenues continue to form a stable, and ever growing, base for funding the consortium’s activities. So, too, does ICPSR’s Summer Program in Quantitative Methods of Social Research, which has grown from approximately 60 graduate student and faculty participants in 1963 to more than 1,100 participants in "over 90 beginning and advanced courses at our primary location in Ann Arbor and several ’off-site’ institutions" [4] in 2014.

2.1 Diversified Funding

Topical archive funding [5], as well as grants and contracts [6], now comprise the majority of ICPSR’s revenue and contribute to the further diversification of resources. Topical archives are organized around specific topics or themes and typically are funded by agencies in the federal government and private foundations seeking to distribute data. The National Archive of Computerized Data on Aging (NACDA) [7] was the first topical archive established at ICPSR, in 1976. The National Archive of Criminal Justice Data (NACJD) [8] followed in 1978. NACDA is now funded by the National Institute on Aging; NACJD is funded by the Bureau of Justice Statistics, the National Institute of Justice, and the Office of Juvenile Justice and Delinquency Prevention. More topical archives have been established in the years following, including: the Substance Abuse and Mental Health Data Archive (funded by United States Department of Health and Human Services’ Substance Abuse and Mental Health Services Administration) [9], the Health and Medical Care Archive (funded by the Robert Wood Johnson Foundation) [10], the Child Care & Early Education Research Connections archive (funded by the Office of Planning, Research and Evaluation, Administration for Children and Families, United States Department of Health and Human Services) [11], the Data Sharing for Demographic Research archive (funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development) [12], and the National Addiction & HIV Data Archive Program (funded by the National Institute on Drug Abuse) [13]. These topical archives in essence collaborate with the membership to create a shared infrastructure and a whole that is larger than the sum of its parts.

2.2 Governance

ICPSR’s governing Council is now composed of 12 persons (instead of 5) elected by the members to serve staggered four-year terms [14], and meets three times each year [15]. They continue to govern ICPSR and provide an advisory role on policies and future directions. [16]

Like many organizations, ICPSR implements a Strategic Plan to work toward the ideals set out in its mission statement: "ICPSR advances and expands social and behavioral research, acting as a global leader in data stewardship and providing rich data resources and responsive educational opportunities for present and future generations." (ICPSR 2014) The latest Strategic Plan was implemented in January 2014 and consists of four strategic directions: "Enhancing Our Global Leadership", "Developing New and Responsive Products and Services", "Advancing Knowledge, Skills, and Tools for the Research Community", and "Expanding Organizational Capacity for Leadership and Innovation". [17]

2.3 Processes and Procedures

Over the years, physical data storage has moved from punch cards and tapes to mainframes and then servers; data formats have ranged from EBCDIC to ASCII to proprietary statistical formats. Data are now available in multiple formats for direct download from the ICPSR web site (no need any more to obtain data from an Official Representative), which received over 5.1 million page views and had 284,411 unique visitors in fiscal year 2013. (ICPSR 2013) More than 64,000 datasets are available for download from over 7,448 data collections. ICPSR also links to over 64,000 data-related citations [18], "to encourage the replication of scientific results, to improve research standards, and to give proper credit to data producers" [19].

ICPSR’s systems and infrastructure are based on community and international standards, such as the Reference Model for an Open Archival Information System (OAIS) [20] and the Trustworthy Repositories Audit and Certification (TRAC) [21]. ICPSR also takes the lead in developing data standards, including the Data Documentation Initiative (DDI), an XML-based metadata specification for describing the entire data lifecycle. [22] Over forty institutions from around the world belong to the DDI Alliance, which develops the DDI specification. ICPSR is the lead partner in the Data Preservation Alliance for the Social Sciences (Data-PASS), "a voluntary partnership of organizations created to archive, catalog and preserve data used for social science research" [23].

ICPSR’s data processing and curation standards have been refined over the years and published into a community-embraced document, "Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle" (ICPSR 2012). Curation ensures that data collections are organized, described, cleaned, enhanced, and preserved for public use, and is a hallmark of ICPSR data collections. The "ICPSR Pipeline Process" (see Figure 1)

"(...) starts with a confidentiality review, designed to make sure that there are no direct identifiers (e.g., name, social security number, telephone number, etc.) in the data file. It continues with a check to make sure that the data match the documentation. If inconsistencies, errors, or confidentiality risks are found, the data processor works with the original researcher to correct the issue. The process concludes with the construction of a final, permanent version of the data, alternative versions (such as in SAS, Stata, SPSS, or tab-delimited files), and final documentation. The final processing steps can also include the creation of versions for online analysis" (Albright and Lyle 2010:19-20).

The ICPSR Pipeline Process

Figure 1: The ICPSR Pipeline Process: How ICPSR Acquires, Archives, and Disseminates a Typical Study

3. Challenges and Opportunities

Data have increased in complexity and size in the 52 years since ICPSR’s founding, and technology has evolved. Recent developments in the past few years, especially in terms of government and funder mandates for open access to data, are drawing more and more attention to data stewardship. For instance, a 2013 memo from the United States Office of Science and Technology Policy, "Increasing Access to the Results of Federally Funded Scientific Research", directed Federal agencies with over $100 million in research and development expenditures to "develop a plan to support increased public access to the results of research funded by the Federal Government" (Holdren 2013:2). Plans are to address both publications and data. In the European Union, a similar requirement was issued for Horizon 2020 Open Research Data pilot projects. Researchers "are asked to make the underlying data needed to validate the results presented in scientific publications and other scientific information available for use by other researchers, innovative industries and citizens" (European Commission 2013).

Each year brings new challenges and opportunities for ICPSR to share, archive, and preserve data. Three issues of present interest to ICPSR, in particular, are worth detailing. These include: funding models, curation, and open access.

3.1 Funding Models

The first area of interest for ICPSR is the issue of sustainable funding to pay for sharing, archiving, and preservation of research data. These activities carry real costs. While government mandates increase attention and help generate buzz, rarely do they provide additional financial assistance. Unfunded mandates can stretch already thin resources to the breaking point.

Who will pay? Will they pay? How do they pay? The researchers, who must balance the present and pressing costs of research and analysis with future requests for storage and access? What if an investigator nears the end of her grant funding and desires to eke out a little more analysis at the expense of archiving and preservation? Will archives ultimately shoulder the financial burden? Many archives, including ICPSR, are partially reliant on grants and contracts that run on three- or five-year cycles. "Long-term access to data requires durable institutions that plan on a scale of decades and even generations" (Lyle, Alter, and Vardigan 2013), yet long-term planning is extremely challenging with short-term funding cycles.

To help address these issues, ICPSR, with funding from the Alfred P. Sloan Foundation, convened a fall 2013 meeting of 22 domain repositories spanning the social and natural sciences. As an outcome of the meeting, a white paper was drafted, which included a call for change: "Domain repositories must be funded as the essential piece of the U.S. research infrastructure that they are." (Ember and Hanisch 2013:3) Five principles were proposed to encourage sustainable data repositories: 1) Agencies need to support archiving and sharing with funding to assure the long-term preservation and viability of research data; 2) Cooperation among funding agencies, universities, domain repositories, journals and other stakeholders is essential; 3) Agencies must support trained professionals, organizations with the capacity to persist over time, and community standards for metadata and preservation; 4) Review criteria should be established to evaluate that data repositories are consistent with their mission as institutions entrusted with the long-term preservation of scientific memory; 5) Agencies need to incentivize investigators to archive their data. (Ember and Hanisch 2013:11-12)

ICPSR continues to diversify funding and explore new funding models. In early 2014, ICPSR added a new topical archive, National Archive of Data on Arts & Culture (NADAC), sponsored by the National Endowment for the Arts. [24] Membership funding continues to grow. Additional products and services are being offered, including a fee-based secure virtual data environment for accessing confidential data. [25] It is important to emphasize that ICPSR generates existing and new revenue to provide long-term, trusted access to scientific data, and not to generate profits.

3.2 Curation

Although recent research has shown that the number of data collections formally archived is very low (Pienta, Alter, and Lyle 2010), with many data collections lost (Vines et al. 2014), expectations are growing that more and more data will be shared, stored, and even preserved. This is a win-win for everyone: funders, researchers, and users. Secondary data use enables verification and replication of results, encourages new lines of research, discourages scientific misconduct, saves time, reduces costs, and maximizes access. (Niu 2006)

Not all data are alike, however. Many data collections, while publicly available, are in inaccessible formats, poorly described, and full of errors. Unusable data equal lost data. (Peer, Green, and Stephenson 2014) To truly maximize access to data, one needs to curate, or enhance, it. Curation is the work taken to make data usable, complete, and self-explanatory. [26]

ICPSR has curated data since its founding and continues to intensively curate many of its data collections. This careful attention to detail is one primary differentiator of ICPSR’s product from other data providers [27], especially as the number of data repositories is growing at what seems like an exponential rate. [28]

ICPSR recognizes the opportunity to promote curation to the community, and does so through publications (ICPSR 2012),Web sites [29], training [30] and advocacy. In response to the 2013 Office of Science and Technology Policy (OSTP) memorandum directing federal agencies to develop plans to support increased public access to data, ICPSR issued a formal public comment [31] encouraging federal agencies to, essentially, curate their funded data, by making data:

  1. Discoverable with standardized, machineactionable metadata.
  2. Meaningful and usable through enhancement and cleaning.
  3. Persistent, through proactive digital preservation.
  4. Trustworthy, by encouraging deposits in community certified repositories.
  5. Confidential (when applicable).
  6. Citable, through adding basic references.

3.3 Open Access

Funders of research are increasingly requiring open access to the data they support. By one popular definition, open access data should be "available as a whole and at no more than a reasonable reproduction cost, (...) provided under terms that permit reuse and redistribution", and "everyone must be able to use, reuse and redistribute" [32].

ICPSR strongly supports maximizing access to data. In fact, many of ICPSR’s data collections, including the topical archives, are available at no cost to users. The original ICPSR Memorandum of Organization, states "In general, ICPSR data resources and technical services are available on an ’open access’ basis" (Blalock et al. 1989:Appendix C).

Still, the Memorandum of Organization is explicit that ICPSR services, data or facilities funded by the consortium are provided "only in a manner that imposes no handicap on ICPSR members. In providing services, data or facilities, non-members will defray all associated costs plus additional charges required to maintain equity with members" (Blalock et al. 1989:Appendix C). This exposes the primary limitations of the consortium funding model: "Membership models do not provide public access to data, and they favor researchers at institutions with more resources." (Ember and Hanisch 2013:11)

In response to these requirements and calls for public access to federally funded research data, ICPSR recently launched openICPSR [33]. "Currently in beta version, openICPSR is a research data-sharing service that allows depositors to rapidly self-publish research data, enabling the public to access the data without charge – or in the case of restricted-use data, for nominal charge." (Lyle 2014:55)

"The openICPSR service is an alternative to the standard ICPSR deposit process, providing immediate, self-controlled distribution of data and metadata, albeit without the rich curation and preservation features; data in openICPSR are distributed and preserved as-is." (Lyle 2014:55) Data in openICPSR are licensed under a Creative Commons Attribution 4.0 International License, which permits use, reuse and redistribute for any purpose, even commercial. Although openICPSR data are distributed as-is, they can still be selected for curation, either at the request of the depositor or upon selection by ICPSR staff.

Screenshot of the openICPSR home page

Figure 2: Screenshot of the openICPSR home page

The membership model is successful and has sustained ICPSR over fifty years. New models, including providing open access, are part of the organization’s constant evolution to address the needs of funders, users, and members.

4. Conclusion

ICPSR, with over 750 consortial members from around the world, is a leader in data archiving. Much of the success of ICPSR can be traced back to the consortial model upon which the organization was founded, with members providing funding, input, and a sense of community. Standards and archival processes developed in the earlier years, such as intensive data curation, continue to define the ICPSR brand and enhance the digital assets available to the research community. Although challenges arise, and will continue to arise, ICPSR continues to derive opportunities from the challenges.

I thank Mary Vardigan for providing comments on this paper. I also thank Cole Whiteman for supplying the "ICPSR Pipeline Process" figure.

Jared Lyle, MSI, is director of curation services at the Inter-university Consortium for Political and Social Research (ICPSR), where he is responsible for developing and maintaining a comprehensive approach to data management and digital preservation policy. University of Michigan, Institute for Social Research. Tel. +1 734 763 60 75, E-Mail: