Skip to content

Commit

Permalink
Update set-up-directories.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kbeutel committed Feb 21, 2024
1 parent 9401b3e commit 88b75c6
Showing 1 changed file with 8 additions and 18 deletions.
26 changes: 8 additions & 18 deletions GetStarted/set-up-directories.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
# Set Up Directories

Create a directory in which your project will be stored (this main diretory will be referred to as **[ProjectDir]** for the rest of the documentation). Inside, create two sub-directories named **00src** and **data**.

### **00src/**
This sub-directory stores your raw/source data files (GenomeStudio Final Report files for Illumina data [either **.csv** or **.csv.gz** format] or .CEL files for Affymetrix data) and can be named whatever you wish. We will use **00src/** throughout this documentation. No other files should be included in this directory.
This sub-directory stores your raw/source data files (GenomeStudio Final Report files for Illumina data [either **.csv** or **.csv.gz** format] or **.CEL** files for Affymetrix data) and can be named whatever you wish. We will use **00src/** throughout this documentation. No other files should be included in this directory.

### **data/**
This sub-directory stores several required and optional files. It must be named **data/** for Genvisis to work properly.
- **[pedigree.txt](../#/documentation/GetStarted--set-up-pedigree-and-linker)**
- **[linker.txt ](../#/documentation/GetStarted--set-up-pedigree-and-linker)**
- **batch file** (optional)
-If your samples were genotyped in batches, this file allows you to specify the groups in which they were genotyped. Genvisis can then detect whether a batch has such significant batch effects that it should be reclustered by itself.
- **.txt** file
- This file should list the sample ID (the same ID used in the **DNA** column of **linker.txt**) in the first column and an ID for its batch in the second column
- This file has no header
- Example:

Save your **[pedigree.txt](../#/documentation/GetStarted--set-up-pedigree-and-linker)** and **[linker.txt ](../#/documentation/GetStarted--set-up-pedigree-and-linker)** files in this sub-directory.

Save your batch file (optional) in this sub-directory. If your samples were genotyped in batches, a batch file allows you to specify the groups in which they were genotyped. Genvisis can then detect whether a batch has such significant batch effects that it should be reclustered by itself. The batch file is a **.txt** file that should list the sample ID (the same ID used in the **DNA** column of **linker.txt**) in the first column and an ID for its batch in the second column. There is no header. For example:
```
ID001 Batch1
ID002 Batch1
Expand All @@ -23,13 +18,8 @@ This sub-directory stores several required and optional files. It must be named
ID005 Batch3
ID006 Batch3
```
- **marker subset file** (optional)
- If your source data files do not contain every marker present in the manifest, or if they contain markers that are not in the manifest, this file can be used to specify the markers that are shared between the manifest and source files (i.e., the markers that Genvisis should use).
- **.txt** file
- This file has no header
- List the marker names in a single column with one marker per line

Save your **marker subset file** (optional) in this sub-directory. If your source data files do not contain every marker present in the manifest, or if they contain markers that are not in the manifest, this file can be used to specify the markers that are shared between the manifest and source files (i.e., the markers that Genvisis should use). The marker subset file is a **.txt** file that lists the marker names in a single column with one marker for line. There is no header.

### **Manifest** (Illumina data only)
- .csv version with probe sequences
- Stored in **[ProjectDir]**
- This file unlocks many important features, but a GenomeStudio **SNP_Map.csv** file (which is created alongside the final reports) can be used instead if a manifest is not available, or your manifest does not contain probe sequences.
For Illumina data, store the manifest file (the **.csv** version with probe sequences) in your **[ProjectDir]**. The manifest file unlocks many important features; if a manifest is not available or does not contain probe sequences, a GenomeStudio **SNP_Map.csv** file (which is created alongside the final reports) can be used instead.

0 comments on commit 88b75c6

Please sign in to comment.