Editor: Shabiran Rahman · srahman@library.uwaterloo.ca

PHASE 1 OF NATIONAL DATA ARCHIVE CONSULTATION COMPLETE

Canada currently has no facility to collect, preserve, and distribute digital information generated from research (a National Data Archive). To address this issue, the National Archives and the Social Sciences and Humanities Research Council of Canada created a working group to assess the need for a National Data Archive and to recommend how such a facility might be implemented. Phase 1 of this process is now complete, and the group has reported on this substantial gap in Canada's research infrastructure. Their report outlines the current research environment, the needs of the research community in this country, and the possible benefits of creating a National Data Archive. A web site devoted to this process, and full text of the report can be found at mmsd1.mms.nrcan.gc.ca/archives/.
The next phase of this process involves consultation with the many stakeholders, and details on how to contribute to this process may be also found at the web site. This group is actively seeking the input of interested groups and individuals, and anyone with an interest in the preservation of digital data is encouraged to participate.




CONTENTS

Phase 1 of National Data Archive

Canadian Geospatial & Profile

Census Data

News

Your Contacts

Southwestern Ontario Research Data Centre

New Acquistions

Sites of Interest



CANADIAN GEOSPATIAL AND PROFILE CENSUS DATA: PREPARING THE DATA FOR MAPPING

by Richard Pinnell, Head University Map and Design Library, University of Waterloo
(For the complete article with figures, please visit tdr.tug-libraries.on.ca/HELPS/helpmain.htm)

This article will describe how to access and manipulate 1999 Canadian census data to the point where it becomes possible to visualize the data in map form.

The process of actually mapping the data (i.e., creating thematic maps) is beyond the scope of this article. In order to follow these step-by-step instructions you will need to download census data from the TriUniversity Data Resource (TDR) Web server and you will need to have access to software including:

  • geographical information system (GIS) software
  • Beyond 20/20 browser software
  • spreadsheet or database management software (e.g., Microsoft Excel)

I will illustrate using ArcView (version 3.2) desktop mapping software throughout the following exercise. The following exercise focuses on a common source of difficulty of attempting to “join” two data tables in order to begin mapping. I will illustrate this procedure by using census data for the Kitchener Census Metropolitan Area (CMA). I decided to work with data at the census-tract level because this is a popular level of aggregation for study and analysis, and also because the census-tract data is particularly difficult to join.

Downloading Data from TDR Web Server


1 Begin by downloading geospatial census boundary data from TDR. In this example we are interested in the census tract boundaries for the Kitchener CMA, an area that includes the cities of Kitchener, Waterloo, and Cambridge, and the municipal townships of Woolwich and North Dumfries.

  • On the TDR page tdr.tug-libraries.on.ca move the cursor over TDR Data, then over Geographic Files and click.

  • Scroll to Digital Boundary Files (DBF) and click on this link.

  • Scroll to the first of the two tables on this page and locate the desired link at the intersection of the Province column and the Census Tracts row; choose ArcInfo 1996 data.

  • You have now reached the point where you can select a data file and download it; the file to download is named gct_035a.exe (678k). Embedded in this filename is the Statistics Canada code for the province of Ontario (35). A page describing the complete set of geographic codes and abbreviations used by Statistics Canada may be found at tdr.uoguelph.ca/GEOG/abbrev96.htm

  • Download this file and uncompress it by double-clicking the filename; the file inflates to gct_035a.e00 (2833k). Students working on lab machines need to be aware that executable files may be blocked; if so, save downloaded data to a zip disk.

    2 Next you will need to download the census profile data you wish to visualize. In this example we will download 1996 Profile data in Beyond 20/20 format. We want to map the total number of economic families in private households by census tract (within the Kitchener CMA).

  • On the TDR page tdr.tug-libraries.on.ca, move the cursor over TDR Data, then over Web Retrieval and click.

  • Once past the Data Access and Use Restrictions page you will want to highlight the topmost category (Canadian Census: Restricted Access) and then click Sumit Group.

  • Scroll down to Census 1996 - Profile Series - B2020 Tables and highlight. Then click on the Submit Data Base button.

  • Navigate past the Data Access and Use Restrictions page.

  • Scroll through the list until you see the category entitled Private Households, Housing Costs, etc. and click on the second link—Private Households, Census Tracts—to begin the download.
Manipulating the Data Using ArcView and
Beyond 2020 Software

Now that you have downloaded the geospatial and profile data, you must manipulate the contents of these files so that they can be “joined.” The process of joining two files or tables) involves, in this example, matching each census tract area (there are 82 of these areas—also called polygons— in the Kitchener CMA) with the corresponding census tract profile data; the correspondence is a one-to-one relationship. In order to join two files (or tables) there has to be a “common linking field,” a field that is common to both tables. These two fields must match exactly; otherwise the joining procedure will abort or, worse, will provide erroneous results.


1 Begin with the census boundary file, gct_035a.e00.

  • Use the Import utility accompanying ArcView to convert this .e00 file to an Arc coverage. The .e00 file is in Interchange (ASCII) format and cannot be read directly by the GIS software. In this example, I will name the coverage “ct_ontario.”

  • Open this polygonal coverage in a View using ArcView. You will notice that all of the tracted areas across Ontario open in this View.

  • Then select and save out the census tracts for the Kitchener CMA (we are not interested in the tracts for Toronto, London, Hamilton, etc. at this time). Open the attribute table for the current theme (i.e., the census tract polygons) by clicking the Open Theme Table button.

  • Select Query in the Table menu. Since the unique CMA ID code for Kitchener CMA is “541" we can use this value to make our selection. The query box should contain this statement: Cmauid=”541"; then click New Set.

  • Switch back to the View and save out the Kitchener CMA tracts as a shapefile by clicking Convert to Shapefile in the Theme menu. Name this shapefile “ct_kitch” and bring it into the View when prompted to do so.

    You might want to take a look at the attribute table for this shapefile; again, click on the Open Theme Table button. If you look at the field named Ctuid you will see string values that take the form “541XXXX.XX”, where 541 is the CMA code for Kitchener and the remaining six characters are place holders for the census tract ”names" (i.e., IDs). This field will be used to link the geospatial data and the profile data; it will be our common linking field.

    2 Now open the Profile data in the Beyond 20/20 browser. The file you are opening is named pr9ct.ivt (1166k) if you downloaded the file I suggested above.

2
  • As a first step, “flip” the table by moving the Geography dimension to the rows and the Profile variables to the columns. You can do this by clicking anywhere in the Geography header along the top and then dragging the cursor to the left side of the screen. The Geography labels (i.e., the row labels) look complex and indeed they are! Again, we will want to select and save out data for
    the Kitchener CMA since we are not interested in the profile data for Toronto, Hamilton, etc. at this time.

  • To find the Kitchener census tracted data, click on Search in the Dimension menu; highlight English Desc and type “Kitchener” in the Text to Find box; click OK.

  • Select Show All in the Dimension menu.

  • Hold down the Control key, click on the first row of data for Kitchener CMA and drag to the last row of data for Kitchener CMA.

  • Still holding down the Control key, select the Profile variable(s) of interest. In this example, we will choose the first field (or column), the one to the immediate right of the Geography labels. This variable is entitled “Total number of economic families in private households.” Click in the header for this field to select it.

  • Now save out the data by choosing Save As in the File menu. Select a convenient directory for the datafile, select dBase Files from the list of file types, and name the file “data.dbf.” Manipulating the Data Using Excel At this point it is necessary to convert this file to dBase IV format; it is rather unfortunate but unless this is done, ArcView will not recognize the tabular profile data. An easy
    way to make this conversion (but by no means the only way) is to open data.dbf in spreadsheet software such as Excel.

  • Simply open the file in Excel then immediately save it as type DBF 4 (dBase IV). Excel asks if you wish to save your changes; click the Yes button and name the file profiled. The table is now ready for use in ArcView.



Bo Wandschneider, the driving force behind the creation of the University of Guelph Data Resource Centre and TriUniversity Data Resources, has accepted the position as Manager, Academic Services, Computing and Communication Services at the University of Guelph.

As part of his duties he will continue to be responsible for the administration of the CCS component of the DRC and TDR projects. We congratulate him and wish him success in his new position.



TDR MEMBERS ATTEND IASSIST CONFERENCE AT UNIVERSITY of AMSTERDAM


Bo Wandschneider and Carol Perry attended the 2001 International Association for Social Science Information Services and Technology (IASSIST) Conference, held at the University of Amsterdam from May 14-19. The sessions this year laid special emphasis on the Data Documentation Initiative (DDI) icpsr.umich.edu/DDI/. It appears that this international project has matured to the stage that people are starting to adopt these standards. Bo and Carol attended workshops on creating DDI compliant codebooks and XML. TDR staff is currently working on improving the metadata and implementing these standards into the TDR.

Other interesting sessions included a UK survey looking at how teachers used data in the classroom and talking about the progress being made on getting a National Data Archive for Canada. To view some of the multimedia presentations please log on to niwi.knaw.nl/us/ia2001/home.htm

Next year the IASSIST conference will be held at the
University of Connecticut.

 
YOUR CONTACTS FOR DATA SERVICE AT OUR THREE LOCATIONS
University of Guelph

Michelle Edwards
edwardsm@uoguelph.ca
519-824-4120 x4539

Wilfrid Laurier University

Wray Roulston
wroulsto@wlu.ca
519-884-0710 x3743
University of Waterloo

Susan Moskal
srmoskal@library.uwaterloo.ca
519- 888-4567 x2890


Shabiran Rahman
srahman@library.uwaterloo.ca
519-888-4567 x2882



3

SOUTHWESTERN ONTARIO RESEARCH DATA CENTRE



Located at the University of Waterloo

The official opening of the Southwestern Ontario Research Data Centre (SWORDC) is scheduled for Fall 2001 at UW. Unavoidable construction delays have prevented an earlier opening date for the centre, which will be housed in the Psychology, Anthropology and Sociology (PAS) building.

Researchers with approved proposals and security clearance will be able to conduct research within the SWORDC, which will house Statistics Canada longitudinal survey data. The core of survey data sets will include the National Population Health Survey (NPHS), the Survey of Labour and Income Dynamics (SLID), and the National Longitudinal Survey of Children and Youth (NLSCY). The proposal submission process for the SWORDC has been initiated, with at least four research projects currently awaiting approval. An overview of the SWORDC program, including the online web application procedure is available at sshrc.ca/rdc/english/overview.html . Interested South Western Ontario researchers are encouraged to start the proposal submission process now, since it may take up to eight weeks for a decision to be made.

Information about the SWORDC may be found at
stats.uwaterloo.ca/Stats_Dept/SWORDC/swo_rdc.html.
The SWORDC has a Statistics Canada Analyst onsite:

Dr. Pat Newcombe-Welch
Phone: 519 888-4567 x 5504
panewcombe@uwaterloo.ca
Fax: 519 746-1875

 

 

A FEW OF OUR NEW ACQUISITIONS


The following datasets were added to the collection in July 2001

Adult Education and Training Survey
(AETS) 1998
Canadian Tobacco Use Monitoring Survey
(CTUMS) 1999
General Social Survey
(GSS) Cycle 14 2000
Postal Code Conversion Files
(PCCF) Nov 2000 Version
Survey of Household Spending
(SHS) 1999
Survey of Labour Income & Dynamics
(SLID) 1998 EF, CF, PR

To view other recently acquired datasets please
visit the following URL on our TDR site
tdr.tug-libraries.on.ca/NEWS/mainnews.htm#n
ewacqu


SITES OF INTEREST FOR YOU TO
EXPLORE


In this segment we present the URLs of a few sites that may be of interest to data users. Our readers have recommended sites included in this issue.

The following two sites are non-Canadian sites one from UK and the other from US. If you would like to recommend a site for inclusion in this segment, please send it to Shabiran Rahman
srahman@library.uwaterloo.ca

  1. Project on the Use of Numeric Data in Learning and Teaching datalib.ed.ac.uk/projects/datateach.html

    This site would be of particular interest to those who use data for teaching particularly the link to case studies of current practice datalib.ed.ac.uk/projects/datateach/casestudies .html

  2. Counting California
    countingcalifornia.cdlib.org
    This new and free service named Counting California enhances public access to a range of social science and economic information from government agencies.

4