I need to make a bar chart to show the abundance of specific genes in samples.
In my large Excel file, each row is a different sample and each sample has a different number of genes (meaning counting by column isn't practical). The abundance would be Number of Genes/Number of Samples.
Of course, I did consider simple count(x) or contains(x), but I remembered that some samples have duplicated genes and that could throw calculation off.
So I want to first make a table where I can see how many times each gene is sample:
| Sample | Gene 1 | Gene 2 | Gene 3 |
|---|---|---|---|
| S1 | 1 | 1 | 0 |
| S2 | 2 | 0 | 0 |
| S3 | 1 | 1 | 1 |
In other words: Count how many times gene X (from list of genes Y) appears on each sample row and combine them into one table/dataframe?