![]() |
|||||
|
|
|||||
|
|||||
|
Data files from the 2001 Census of Population are continuing to be released in stages by topic and by geographic area. Statistics Canada provides release dates for access to new data on their website. TDR receives the files for local distribution in the weeks following the initial release date. Locally held files contain more detail and lower levels of geography than are found on the public census pages of Statistics Canada.
All files released to date are now available on TDR in B2020 format from the web retrieval page. Topics covered include population and dwelling counts, age and sex data, marital status, families and household living arrangements, housing, language composition, migration, immigration, ethnocultural portrait, aboriginal peoples, the workforce, place of work, commuting to work and language use at work. ASCII files will follow in the coming weeks.
Upcoming releases include educational data; earnings; income of individuals, families and households; social and economic characteristics; and religions in Canada.
The
entire Census of Agriculture 2001 is now available on TDR in B2020 format
and ASCII files through web retrieval. The B2020 files also provide tables
of historical farm data at the federal and provincial levels. One file
contains census data from 1921 to present while other tables compare census
years from 1971 to 2001. Also included are prefabricated maps for Census
Agricultural Regions, Census Divisions and Census Consolidated Subdivisions.
|
|||||
|
|
|||||
|
I am currently conducting research on estimating the impact of higher cigarette taxes on smoking in Canada. Raising cigarette taxes is widely acknowledged to be an effective policy instrument in reducing and discouraging smoking. Much of this research has been conducted with survey data from the United States; however, similar research has been quite limited in Canada.
Between 1989 and 1993, excise taxes at the federal and provincial level rose sharply. In the face of enormous smuggling, federal and provincial taxes were halved in 1994. In recent years, however, there has again been a trend towards increasing these taxes in response to reports that the decline in cigarette prices has resulted in a significant increase in smoking by Canadians.
In order to assess the effects of higher cigarette taxes it is first important to estimate the impact of smuggling. I am attempting to do this through various surveys which have been conducted by Statistics Canada and which contain information on individual level smoking patterns. The strategy is to match these patterns to the prices of legal cigarettes and then estimate the sensitivity of consumption patterns, in "smuggling" versus "non-smuggling" provinces, to price changes generated by tax policy. Of course, it is important to control for not only other province and time-specific events but also individual characteristics before making a conclusion on the magnitude of smuggling. And therein lies the importance of using survey data. Most Canadian surveys have very rich data on individual characteristics such as education, income, marital status, labor force status etc. All these factors could plausibly affect smoking patterns independently of cigarette prices (taxes) and are, therefore, important to control for.
My research has definitely been aided by the easy electronic access to and availability of survey data from the TDR website. Data from an abundance of surveys consisting of smoking related questions, such as the: General Social Surveys 1985, 1991, and 1996; Survey on Smoking 1994; National Alcohol and Drugs Survey 1994; National Population Health Surveys 1994, 1996, and 1998 etc., can be quite easily downloaded from the website.
Prof. Anindya Sen, Department of Economics - University of Waterloo.
|
|||||
|
1 |
|||||
|
|
|||||
|
The Electronic Data Service Office has relocated within the Dana Porter Library to Room 502. The new location brings the EDS Office in closer proximity to the Government Information Service Desk and increases the visibility of the service. The EDS Group welcomes Sandra Keys as a new member. Sandra is the Liaison Librarian for Accountancy and Economics and has considerable experience as a librarian in both corporate and academic environments. She has worked for companies such as Datapoint Canada, CIBC World Markets and N M Rothschild & Sons Canada Limited. Her most recent experience, prior to coming to Waterloo, was a position as Public Services Librarian at the University of Toronto's Joseph L. Rotman School of Management. Sandra can be reached at skeys@uwaterloo.ca.
Sue Moskal, EDS
- University of Waterloo |
|
||||
|
|
|||||
|
With an increase in the number of longitudinal files being developed, along with an increased concern for the confidentiality of survey respondents, researchers are finding that the PUMFs (Public Use Microdata Files) being released are in some cases becoming less useful. Variables are increasingly being grouped and suppressed, and, for some surveys, there are no PUMFs being produced at all.
Synthetic files are being introduced in an attempt to reach a balance between the needs of the research community and the confidentiality requirements put forth by Statistics Canada. These files fall somewhere between PUMF's and Master Files. They contain the same number of records and variables as the Master Files; however, the data is put through a series of transformations so that individual observations do not actually represent true records. Subsequently, it becomes impossible to identify individuals. Unlike the PUMFs, the synthetic files tend to report the raw data instead of categorizing variables such as age, education or immigration status. The synthetic files also report more levels of geography than the PUMFs.
These files allow the user to work with the whole data set while not compromising the need for confidentiality. Empirical results are approximate, and as such, should not be used in publication. The degree of approximation will depend on the variables being analyzed. Users can develop and test models using the synthetic files in their own environment as the files are free to be distributed and have the same usage restrictions as the PUMFs. When models are finalized, the same programs can be run against the Master Files in the RDCs (Research Data Centres). It is costly to do all preparatory work in the secure RDC environment, so having access to these files should allow researchers easier access to the information.
At present, the synthetic files for the National Population Health Survey are available on the TDR website and it is hoped that other author divisions within Statistics Canada will soon release more of these files. It is expected surveys, such as Survey of Labour Income Dynamics, Workplace and Employment Survey, and Youth In Transition Survey, would benefit greatly from having synthetic files. Users should always start with the PUMF files as these are the easiest to use. For researchers requiring something more, moving to the synthetic files and subsequently the Master Files is an easy transition.
Bo Wandschneider,
CCS - University of Guelph |
|||||
|
2 |
|||||
|
|
|||||
|
Timing is everything when it comes to data, and timing for a lot of projects couldn't be better with Census 2001 data being released gradually over the course of the semester. This has been the star dataset at the University of Guelph this semester. With data from seven Census years now available; many projects have been geared towards discovering trends in the Canadian population since the 1971 Census. Along with the growing trend of mapping data _ projects are now mapping current and previous Census data to visualize these changes over the years.
Data retrieved from the Tri-University Data Resources (TDR) web page is often used to determine whether population dynamics of geographic areas match criteria set out by individual research proposals.
The National Population Health Survey data files and associated documentation have been used by graduate students and faculty while writing SHRC proposals to access the master files at the Southwestern Research Data Centre (SWORDC). The data files and associated documentation have allowed researchers to conduct preliminary statistics and determine whether access to the master files is required.
Another popular database with both graduate students and faculty has been the World Trade Database, both Imports and Exports. Studies examining trade policies and historical trade have taken advantage of the time series nature of this database. Since the inception of the TDR we have seen the University of Guelph community's usage increase over the years.
The graph below shows how usage, measured as the number of hits for data retrievals only, has been steadily increasing over the years. The statistics for 2003 were compiled in the second week of March and show that this year's usage may well surpass 2002.
Michelle Edwards,
CCS - University of Guelph
|
|||||
|
|||||
|
|
|||||
|
As the SWORDC welcomes new projects in 2003, researchers in various disciplines are increasingly rising to the challenge presented by the fact that the standard significance testing procedures available in popular software packages such as SAS and SPSS generally do not correctly estimate the variance for the complex sampling designs used by Statistics Canada. Programs which ignore the complex design tend to underestimate the variance and thus produce p-values which are smaller than they should be, sometimes resulting in incorrectly declaring a result to be statistically significant.
One advantage of conducting research at an RDC is that along with the master data files, Statistics Canada also provides the information necessary to calculate variances which correctly take into account the complex survey design. A re-sampling method for variance estimation, known as bootstrapping, may now be used for the National Population Health Survey (NPHS), the Survey of Labour and Income Dynamics (SLID) and for the National Longitudinal Survey of Children and Youth (NLSCY). SAS macros are provided to implement the bootstrapping procedure in order to estimate the bootstrap variance estimates for totals, ratios (which include means), differences in ratios, linear regression coefficients and logistic regression coefficients.
Knowledge of SAS on the researcher's part is required, however, the RDC analysts provide as much support as possible to help with the implementation of the macros. Currently, macros are available in SPSS only for the NPHS. It has been interesting to see how the use of the bootstrap variance estimation has changed the conclusions in two studies to date.
DATA ACQUISITIONS:
The employer portion of the Workplace and Employment Survey (WES) is now available in the RDCs , in addition to the employee portion which was made available earlier. WES is the only business survey which is currently allowed to be housed in the RDCs due to the confidentiality concerns associated with business data. The WES file has geographical variables suppressed in order to protect confidentiality. The availability of this additional segment of WES has generated research proposals from the field of psychology, where strong interest lies in linking the employee and employer portions of the survey.
Earlier this year, the Canadian Community Health Survey (CCHS) was placed in the RDCs, along with the Youth in Transition Survey (YITS).
This term we have had NLSCY researchers Tim Siefert and Henry Schultz visiting from Memorial University. Other projects actively under way are those of Beth Potter (UWO) using the National Population Health Survey; John Goyder (UW) using the Survey of Labour and Income Dynamics; Zenaida Ravanera and Rajulton Fernanda (UWO) using the National Survey of Giving, Volunteering and Participating (NSGVP) and Yigou Sun (UG) using the SLID, amongst others. Several new projects will commence in the near future.
Pat Newcombe-Welch,
Analyst - SWORDC |
|||||
|
3
|
|||||
|
|
|||||
|
|
||||
|
4
|
|||||