Click any of the links below to expand the answer to the question. UWDC is an acronym for the University of Washington Data Collaborative. It is a state-of the-art computing platform for storage and analysis of large, sensitive data, housed at the UW Center for Studies in Demography and Ecology.
The UWDC provides the infrastructure to harness innovative, but hard-to-access data for the development of novel, high-quality research and evidence-driven policy making.
Please note on the third Friday of each month, maintenance is done to the UWDC Enclave Systems. If you are experiencing problems, please check the current date to see if it is the third Friday. If so, please wait a few hours and try again later.
If this is not the third Friday of the month:
If you are having network problems (e.g., connecting, logging on, accessing systems), please send e-mail to dcollab_help@uw.edu, making sure to include any relevant information, such as the system you are trying to access, what you did, what you expected to happen and what did happen. Including error messages or screen captures is often helpful.
If you have access to the system but are having difficulties accessing data or performing data transfers, please send e-mail to dcollab@uw.edu. Include any relevant information, such as the system you are on, what data set(s) you are having problems with, what file system location you are using, etc.
Please note on the third Friday of each month, maintenance is done to the UWDC Enclave Systems. If you are experiencing problems, please check the current date to see if it is the third Friday. If so, please wait a few hours and try again later.
If this is not the third Friday of the month:
Make sure you have done the following:
- Downloaded, installed, and configured Husky OnNet.
- Specified one of the correct servers in Husky OnNet:
- https://dept-huskyonnet-ns.uw.edu/dcollab
- https://dept-huskyonnet.uw.edu/dcollab
- Specified one of the correct servers in Husky OnNet:
- Set up two-factor authentication.
- Are connecting with remote desktop directly to one of the Enclave servers. Do not attempt to daisy-chain by connecting to the Enclave from a connection you have made to another remote computing session. You need to connect directly from the computer you are using.
UW-IT has a FAQ on Husky OnNet VPN, which may provide helpful information if you are experiencing problems with the VPN.
If you are having network problems (e.g., connecting, logging on, accessing systems), please send e-mail to dcollab_help@uw.edu, making sure to include any relevant information, such as the system you are trying to access, what you did, what you expected to happen and what did happen. Including error messages or screen captures is often helpful.
If you have access to the system but are having difficulties accessing data or performing data transfers, please send e-mail to dcollab@uw.edu. Include any relevant information, such as the system you are on, what data set(s) you are having problems with, what file system location you are using, etc.
If you are having trouble connecting to the Enclaves with WinSCP, in the “Login” section, click “Edit” and then click “Advanced Site Settings”. Under SSH > Authentication, uncheck “Attempt GSSAPI authentication”.
You may need to increase the timeout for your SFTP client. Please see the instructions for increasing timeout settings for WinSCP and FileZilla.
You can restore files from the periodic file system snapshots. See Restoring accidentally deleted/changed/overwritten files.
Yes, although this requires copying \TinyTeX to %APPDATA%. On the Enclave you can run this command in R:
file.copy("/TinyTeX", Sys.getenv("APPDATA"), recursive = TRUE)
which will take a few minutes to run (it needs to copy many files). After this completes, you should be able to render R Markdown source files to compiled output files.
Alternatively, in R, you can run the script \TinyTeX\copy_tinytex_tree.R which will also copy the necessary files.
“Sensitive” data include elements that could identify specific individuals (e.g., citizens, patients, study subjects) or entities (e.g., businesses).
Release of such data could cause harm to the individuals or entities that the data represent.
By design, no outside network resources are accessible. This protects any confidential or sensitive data. If you need to have files transferred, there is a formal request process.
Many of the most important and intractable societal problems can only be addressed with interdisciplinary research conducted by teams of multiple investigators. The UWDC provides the technology to support this level of inquiry within a high-performance and secure computing environment.
In order for data sets to be housed on the UWDC, they need to show promise for supporting this type of research.
Data are protected using several technological controls.
- Enclave servers
Enclave servers allow in-bound network connections using Remote Desktop Protocol, but they allow only carefully controlled out-bound connections. This means that once a user logs onto the Enclave server, there is no visible internet, no e-mail, and no arbitrarily mountable remote file systems.The Enclave servers host the sensitive data sets in read-only shares that are available only to authorized users. The servers also are equipped with the latest suite of analytic software (e.g., R, Stata, SAS, ArcGIS). - End-to-end encryption
Connections to the Enclave servers pass through a VPN session, so that transmissions between the client and Enclave server are encrypted. A user connected through a “open” WiFi connection would still have encrypted communication between their computer and the server. - Dual-factor authentication
The logon process includes two authentication steps. When the VPN session is initialized, users must provide their UW unique identifier (UWNetID) and password. At that time, a push is sent to their mobile phone requiring a confirmation for logon.
Gaining access to the UWDC is a multi-step process.
- Requesting general access to the UWDC requires an initial overall access application.
- Once general access is granted, the user must request access to specific data sets by uploading completed data use agreement forms. Obtaining access to specific data sets may also entail completing additional training and/or certifications.
Before being able to access any UWDC resources, non-UW personnel need to:
- First obtain a sponsored UWNetID.
- Follow general processes for obtaining approval to use the UWDC (see above).
- Obtain authorization to use DUO two-factor authentication’
If users need external data sets (e.g., GIS files, ancillary tables, scripts) to be loaded on the Enclave, they use a web form to request data upload. The data sets are reviewed by the UWDC Manager, and if they do not contain any forbidden elements, they are transferred to the user’s personal share.
When users complete their analysis, they similarly request a download. The files are reviewed by the UWDC Manager, and if they contain no forbidden elements, they are copied from the Enclave to a share external to the Enclave to which the user has access.
Yes. Send a message to dcollab_help@uw.edu with as many details as you can (name of module, URL, etc.). We will upload a zip file containing the module files for you to install. When you are notified that the zip file is in your X:\ drive, follow the procedure below.
On the Enclave server:
- Create a folder in the E: drive under \ado\plus to store your modules.
- Create a text file in C: under \users\myuwnetid\profile.do (where myuwnetid is your UWNetID) with one line:
do "e:\profile.do"
This will instruct Stata to run e:\profile.do file when it starts. - Create the text file in E: drive under \profile.do containing any customizations or other code you want to run when Stata starts, but include the line
adopath + "e:\ado\plus"
This will instruct Stata to append to the ADOPATH your e:\ado\plus folder, where your modules are to be stored. - Unzip the archive and place the files in the E: drive under \ado\plus. By convention, files should go in the folder named with the first letter of the module, so for example you may create folders in the E: drive under \ado\plus\a, E: \ado\plus\b, and so on. However, this file system arrangement should not be necessary
Following this procedure will make your Stata startup robust to file system changes. When the Enclave servers are rebuilt, your E:\ folder will get automatically mounted on the new server, so your customizations will be maintained. The only action that would need to be taken would be to recreate the one-line C: \users\myuwnetid\profile.do file. If you were to put customizations in C:\ users\myuwnetid\profile.do, or to put modules in C:\ado or another location on the C: drive, this would not be robust against changes in the system, since the contents of C:\ do not get moved to the new system.
We are working out financial models for how users will pay for having their data sets hosted. If you do have a data set you would like us to host, please see Request for UWDC data set hosting.
UWDC systems are currently aligned with NIST 800-171 and HIPAA. Note that we use the terminology “aligned” rather than “compliant” because we have not undergone external audits. However, we have had data use agreements authorized that are aligned with those standards.
Each data use agreement will have a number of detailed and specified controls, rather than blanket statements such as “must be HIPAA compliant.” We review each control to become compliant with the data use agreement rather than a specific standard.
There may be fees for specific uses of the UWDC. These are still being worked out and this FAQ will be updated when the fee structure is finalized.
You may get instructions from UWDC support personnel to “logout and log back in.”
To do this:
- Locate the Enclave server’s Windows task bar.
- Right-click the Windows start button.
- Click Shut down or sign out > Sign out.
This is different from just disconnecting and reconnecting your remote desktop session.