Anderson, S.R., Silliman, K., Barbero, L., Gomez, F.A., Stauffer, B.A., Schnetzer A., Kelble, C.R., and Thompson, L.R. Assessing the effects of warming and carbonate chemistry parameters on marine microbes in the Gulf of Mexico through basin scale DNA metabarcoding.
This markdown file describes code used to process 16S DNA samples collected on the fourth Gulf of Mexico Ecosystems and Carbon Cycle (GOMECC-4) cruise that sailed across the entire Gulf of Mexico basin for 40 days in summer-fall of 2021. A suite of metadata, including variables related to carbonate chemistry, were collected simultaneously with DNA samples, allowing us to explore environmental correlates of diverse microbial groups. Samples were collected along 16 inshore-offshore transects and up to three depths per site, reflecting the surface, deep chlorophyll maximum (DCM), and near bottom.
The analysis was performed in R version 4.3.1. All
input files needed to run the code are available in the
data-input
folder on GitHub. ASV and count files (.qza)
were generated using Tourmaline, which
uses a Snakemake workflow and wraps QIIME 2 and DADA2.
library(tidyverse)
library(vegan)
library(qiime2R)
library(phyloseq)
library(reshape2)
library(fantaxtic)
library(RColorBrewer)
library(microbiome)
library(factoextra)
library(ggridges)
library(microeco)
library(file2meco)
library(MASS)
library(sjPlot)
library(coefplot)
library(ggpubr)
library(treemap)
library(rcartocolor)
library(indicspecies)
library(performance)
library(dplyr)
library(sjstats)
library(lmtest)
library(ranacapa)
Load 16S count and taxonomy files produced from Tourmaline.
# Load ASV count table
table <- read_qza(file = "/Users/seananderson/Gomecc-4/table-16S-merge.qza")
count_tab <- table$data %>%
as.data.frame() # Convert to data frame
# write.csv(count_tab, 'Count16S_all.csv',row.names=T)
# Load taxonomy file
taxonomy <- read_qza(file="/Users/seananderson/Gomecc-4/taxonomy-16S-merge.qza")
tax_tab_16S <- taxonomy$data %>% # Convert to data frame, tab separate and rename taxa levels, and remove row with confidence values
as.data.frame() %>%
separate(Taxon, sep = ";", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")) %>%
column_to_rownames("Feature.ID") %>%
dplyr::select(-Confidence)
tax_tab_16S$Kingdom <- gsub("^.{0,3}", "", tax_tab_16S$Kingdom) # Clean up the taxonomy names for each level
tax_tab_16S$Phylum <- gsub("^.{0,4}", "", tax_tab_16S$Phylum)
tax_tab_16S$Class <- gsub("^.{0,4}", "", tax_tab_16S$Class)
tax_tab_16S$Order <- gsub("^.{0,4}", "", tax_tab_16S$Order)
tax_tab_16S$Family <- gsub("^.{0,4}", "", tax_tab_16S$Family)
tax_tab_16S$Genus <- gsub("^.{0,4}", "", tax_tab_16S$Genus)
tax_tab_16S$Species<- gsub("^.{0,4}", "", tax_tab_16S$Species)
#write.csv(tax_tab_16S, "Taxonomy16S_all.csv",row.names=T
Export the .csv count and taxonomy files and manually add a new column for 16S functional groups to the taxonomy file. These new .csv files will be uploaded again for downstream processing.
tax_new_16S = read.csv(file = "Taxonomy16S_all.csv.gz", header = T, row.names = NULL,
check.names = F, fileEncoding = "UTF-8-BOM") # File compressed to reduce size
count_new_16S = read.csv(file = "Count16S_all.csv.gz", header = T, row.names = NULL,
check.names = F, fileEncoding = "UTF-8-BOM") # File compressed to reduce size
count_new_16S = count_new_16S[order(match(count_new_16S$ASV, tax_new_16S$ASV)), ] # Match order of ASVs for both files
rownames(count_new_16S) <- count_new_16S$ASV # Rename row names
count_new_16S <- count_new_16S[-c(1)]
rownames(tax_new_16S) <- tax_new_16S$ASV # Rename row names to match count file
tax_new_16S <- tax_new_16S[-c(1)]
Load in metadata.
sample_info_tab <- read.csv("metadata_Aug2023.csv", header = T, row.names = NULL,
check.names = F, fileEncoding = "UTF-8-BOM")
row.names(sample_info_tab) <- sample_info_tab[, 1]
sample_info_tab <- sample_info_tab[, -1]
sample_info_tab$dic = as.numeric(sample_info_tab$dic)
sample_info_tab$carbonate = as.numeric(sample_info_tab$carbonate)
Merge all files into a phyloseq object, rename ASVs to be sequential, remove unwanted groups from the 16S dataset, and add an “unassigned” label to the lowest annotated level for easier interpretation of taxonomy.
ps1 <- phyloseq(tax_table(as.matrix(tax_new_16S)), otu_table(count_new_16S, taxa_are_rows = T),
sample_data(sample_info_tab)) # Create phyloseq object
taxa_names(ps1) <- paste0("bASV", seq(ntaxa(ps1))) # Rename ASVs to be sequential
ps_new_16S = subset_taxa(ps1, Order != "Chloroplast" | is.na(Order)) # The following are groups we want to remove
ps_new_16S = subset_taxa(ps_new_16S, Family != "Mitochondria" | is.na(Family))
ps_new_16S = subset_taxa(ps_new_16S, Kingdom != "Eukaryota" | is.na(Kingdom))
ps_new_16S <- subset_taxa(ps_new_16S, Phylum != "Unassigned", Prune = T)
ps_new_16S = name_na_taxa(ps_new_16S) # Add an 'unassigned' label to lowest annotation
Remove the filtered seawater and Milli-Q controls from the dataset for now. Subset to only include samples with >5,000 sequence reads.
These are 16S ASVs that were only observed once over all samples.
Estimate the mean, minimum, and maximum sequencing read counts in the 16S dataset after these curating steps.
Plot rarefaction curves for 16S samples, faceting by categorical depth (surface, DCM, and near bottom) and position on the continental shelf (< 200 m) vs. in open ocean regions (> 200 m).
rare_16S <- suppressWarnings(ggrare(ps_filt_16S, step = 100, plot = FALSE, parallel = FALSE,
se = FALSE))
rare_16S$data$depth_category <- factor(rare_16S$data$depth_category, levels = c("Surface",
"DCM", "Deep"))
rare_16S + theme(legend.position = "none") + theme_bw() + theme(legend.position = "right") +
facet_wrap(~depth_category + distance, scales = "free_y")
Perform hierarchical clustering based on Aitchison distances. This allowed us to better spatially group 16S samples with depth and their location on the shelf vs. open ocean.
ps_clr_16S <- microbiome::transform(ps_rare_16S, "clr") # Center log transform rarefied data
euc_16S = phyloseq::distance(ps_clr_16S, method = "euclidean") # Calculate Aitchison distance
euc.table <- as.matrix(dist(euc_16S)) # Convert to matrix
Cluster samples (Ward’s method) and plot average silhouette widths to determine optimal number of clusters.
spe.ward <- hclust(euc_16S, method = "ward.D2") # Hierarchical clustering with Ward's method
fviz_nbclust(euc.table, hcut, method = "silhouette", iter.max = 30)
# ggsave(filename = '16S_cluster_silhouette_v2.eps', plot = last_plot(), device
# = 'eps', path = NULL, scale = 1, width = 5, height = 4, dpi = 150)
Three clusters were found to be optimal. Similar clustering for 18S samples.
Cut the data into the three clusters based on composition.
These clusters have already been manually added into the metadata file.
## sub_grp
## 1 2 3
## 274 86 105
Plot the type of samples that were found within 16S (and 18S) sample clusters. Clusters largely reflect depth in the water column on the shelf and in the open GOM.
spatial <- read.csv("sample_info.csv", header = T, row.names = NULL, check.names = F,
fileEncoding = "UTF-8-BOM") # Load sample sheet
p <- ggplot(spatial, aes(x = Type, y = Count, fill = Cluster))
p$data$Type <- factor(p$data$Type, levels = c("Shelf_Surface", "Shelf_DCM", "Shelf_Bottom",
"Open_Surface", "Open_DCM", "Open_Bottom"))
p + geom_col(show.legend = FALSE, colour = "black") + theme_bw() + facet_grid(Cluster ~
Group) + theme(strip.text.x = element_text(margin = margin(2, 0, 2, 0))) + theme(axis.text.x = element_text(angle = 45,
hjust = 1, color = "black")) + scale_y_continuous(expand = c(0, 0), limits = c(0,
200)) + labs(y = "Number of DNA samples") + scale_fill_manual(values = c("#BB5566",
"#DDAA33", "#004488")) + theme(axis.title.x = element_blank()) + theme(text = element_text(size = 9))
# ggsave(filename = 'Sample_clusters_spatial_v2.pdf', plot = last_plot(),
# device = 'pdf', path = NULL, scale = 1, width = 5, height = 5, dpi = 150)
Cluster 1 = photic zone; Cluster 2 = DCM; Cluster 3 = aphotic zone.
Three samples were unexpectedly grouped (e.g., mesopelagic sample in Cluster 1), and so, these three were removed from downstream analysis. Same three samples as in the 18S dataset.
Plot the distribution of 16S samples in each cluster by absolute depth.
metadata <- as(sample_data(ps_subset_16S), "data.frame") # Subset metadata
ggplot(metadata, aes(x = depth_meters, y = cluster_16S, fill = cluster_16S)) + geom_density_ridges(rel_min_height = 0.005,
scale = 1, jittered_points = TRUE, point_size = 2, alpha = 0.5, point_alpha = 1,
point_shape = 21) + scale_fill_manual(values = c("#BB5566", "#DDAA33", "#004488")) +
theme(axis.title.y = element_blank()) + scale_x_continuous(name = "Depth (m)",
breaks = seq(0, 3500, 500)) + theme_bw()
Create new distance matrix after removing three samples.
ps_clr2_16S <- microbiome::transform(ps_subset_16S, "clr") # Center log transform the data
euc_16S = phyloseq::distance(ps_clr2_16S, method = "euclidean") # Aitchison distance
euc.table <- as.matrix(dist(euc_16S)) # Convert to matrix
Run PERMANOVA to explore significant changes in 16S community composition between transects, categorical depths, or position on the shelf vs. open ocean.
metadata$distance = as.factor(metadata$distance) # Convert to factor
metadata$region = as.factor(metadata$region) # Convert to factor
metadata$depth_category = as.factor(metadata$depth_category) # Convert to factor
adonis2(phyloseq::distance(ps_subset_16S, method = "euclidean") ~ distance, data = metadata,
perm = 9999) # Run with 9999 permutations
adonis2(phyloseq::distance(ps_subset_16S, method = "euclidean") ~ region, data = metadata,
perm = 9999)
adonis2(phyloseq::distance(ps_subset_16S, method = "euclidean") ~ depth_category,
data = metadata, perm = 9999)
Generate a PCoA plot and color samples by cluster.
ordu = ordinate(ps_subset_16S, "PCoA", distance = euc_16S)
p = plot_ordination(ps_subset_16S, ordu, color = "cluster_16S")
p$data$cluster_16S <- factor(p$data$cluster_16S, levels = c("Cluster 1", "Cluster 2",
"Cluster 3"))
p + theme_bw() + scale_fill_manual(values = c("#BB5566", "#DDAA33", "#004488")) +
geom_point(aes(fill = cluster_16S), size = 5, shape = 21, colour = "black") +
theme(text = element_text(size = 14))
Estimate richness (# of ASVs) and Shannon diversity index between clusters for 16S samples. Use Wilcoxon tests to determine significant differences in these measures between clusters.
rich_16S <- estimate_richness(ps_subset_16S, measures = c("Observed", "Shannon")) # Estimate richness
clus = sample_data(ps_subset_16S)$cluster_16S # Subset out cluster
region = sample_data(ps_subset_16S)$region # Subset out region
rich_all <- data.frame(rich_16S, region, clus) # Merge richness estimates with cluster and region
df2 = melt(rich_all) # Melt the data format for plotting
levels(df2$variable)[match("Observed", levels(df2$variable))] <- "# of 16S ASVs" # Change label
levels(df2$variable)[match("Shannon", levels(df2$variable))] <- "Shannon diversity index"
Plot richness and diversity in each cluster.
p <- ggplot(df2, aes(x = factor(clus), y = value, fill = clus))
p + geom_boxplot(alpha = 1, outlier.shape = NA, color = "black") + theme_bw() + theme(text = element_text(size = 14)) +
ylab("Diversity values") + theme(legend.position = "right") + scale_fill_manual(values = c("#BB5566",
"#DDAA33", "#004488")) + geom_point(aes(fill = clus), size = 5, shape = 21, colour = "black",
position = position_jitterdodge()) + theme(axis.text.x = element_text(angle = 45,
hjust = 1, color = "black")) + theme(axis.title.x = element_blank()) + stat_compare_means(method = "wilcox.test",
comparisons = list(c("Cluster 1", "Cluster 2"), c("Cluster 1", "Cluster 3"),
c("Cluster 2", "Cluster 3")), label = "p.signif", exact = TRUE) + facet_wrap(~variable,
scales = "free_y")
# ggsave(filename = 'Diversity_16S_cluster_final.pdf', plot = last_plot(),
# device = 'pdf', path = NULL, scale = 1, width = 9, height =5, dpi = 150)
Plot alpha diversity metrics within each cluster and with respect to sampling transects. Add regression lines (loess curves) to better observe spatial patterns.
p <- ggplot(df2, aes(x = factor(region), y = value, fill = region))
p$data$region <- factor(p$data$region, levels = c("27N", "FLSTRAITS", "CAPECORAL",
"TAMPA", "PANAMACITY", "PENSACOLA", "LA", "GALVESTON", "PAISNP", "BROWNSVILLE",
"TAMPICO", "VERACRUZ", "CAMPECHE", "MERIDA", "YUCATAN", "CATOCHE", "CANCUN")) # Set the transect order
p + geom_boxplot(alpha = 0.5, outlier.shape = NA, color = "black") + theme_bw() +
geom_smooth(method = "loess", se = TRUE, color = "black", aes(group = 1)) + theme(text = element_text(size = 14)) +
ylab("Diversity values") + theme(legend.position = "right") + scale_fill_manual(values = mycolors) +
geom_point(aes(fill = region), size = 3, shape = 21, colour = "black", position = position_jitterdodge(),
show.legend = FALSE) + theme(axis.text.x = element_text(angle = 45, hjust = 1,
color = "black")) + theme(axis.title.x = element_blank()) + facet_wrap(clus ~
variable, scales = "free_y")
Display 16S functional groups in each cluster. The percent of each functional group represents the mean proportion of reads attributed to that group.
Plot mean 16S relative abundance (%) stacked bar plots for each cluster and transect on GOMECC-4. We focused on the top 12 most relatively abundant Bacteria and Archaea at the order level across all samples. Less abundant organisms were grouped into an “others” category. Taxonomy was assigned to ASVs with SILVA (Version 138.1).
tax_table(ps_subset_16S) <- tax_table(ps_subset_16S)[,2:8] # Subset out the functional category column and focus on taxonomy levels
barplot <- ps_subset_16S %>%
tax_glom(taxrank = "Order", NArm=FALSE) %>% # Agglomerate to order level
transform_sample_counts(function(OTU) 100* OTU/sum(OTU)) %>% # Transform to relative abundance
psmelt() %>% # Melt data
group_by(region,Order,cluster_16S) %>% # Group by cluster and transect
summarise_at("Abundance", .funs = mean) # Summarize at the mean
focus <- c("SAR11_clade", "Synechococcales", "Marinimicrobia_(SAR406_clade)", "Nitrosopumilales", "SAR86_clade","Flavobacteriales","Actinomarinales","Marine_Group_II","Rhodospirillales","SAR324_clade(Marine_group_B)","Thiomicrospirales","Puniceispirillales") # Focus on top 12 groups
barplot$Order <- ifelse(barplot$Order %in% focus, barplot$Order, "Others") # Others category for plotting
barplot_16S = barplot # Rename
barplot_16S$Order<- as.character(barplot_16S$Order) # Convert to character
p <- ggplot(data=barplot_16S, aes(x=region, y=Abundance, fill=Order))
p$data$Order <- factor(p$data$Order, levels = c("Others","Puniceispirillales","Thiomicrospirales","SAR324_clade(Marine_group_B)","Rhodospirillales","Marine_Group_II","Actinomarinales","Flavobacteriales", "SAR86_clade","Nitrosopumilales","Marinimicrobia_(SAR406_clade)", "Synechococcales","SAR11_clade" )) # Set order of groups in the plot
p$data$region <- factor(p$data$region, levels = c("27N", "FLSTRAITS","CAPECORAL", "TAMPA", "PANAMACITY", "PENSACOLA", "LA", "GALVESTON", "PAISNP", "BROWNSVILLE", "TAMPICO", "VERACRUZ", "CAMPECHE", "MERIDA", "YUCATAN", "CATOCHE", "CANCUN")) # Set order of transects
p + geom_bar(aes(), stat="identity", position="fill", width = 0.9)+
scale_y_continuous(expand = c(0, 0))+ geom_hline(yintercept=0) + theme_bw()+
scale_fill_manual(values= rev(c("#88CCEE", "#CC6677", "#44AA99" , "#882255" , "#DDCC77","#332288" , "#117733","#999933" , "#AA4499","#F5793A","#F7CDA4", "#A5CFCC","#757575")))+ theme(axis.text.x=element_text(angle=45,vjust =1, hjust=1)) +
theme(legend.position="right") + theme(text = element_text(size = 12)) + guides(fill=guide_legend(nrow=14, ncol=1)) +
facet_wrap(~cluster_16S,scales="free_x") +labs(y = "Relative abundance (%)")
#ggsave(filename = "16S_stacked_class.pdf", plot = last_plot(), device = "pdf", path = NULL, scale = 1, width = 14, height = 5, dpi = 150)
Cyanobacteria are a major group in the GOM and species are known to occupy distinct environmental niches. Here, we look at the top genus level groups in cyanobacteria in the photic zone.
Note that strain level information for cyanobacteria, including Prochlorococcus and Synechococcus, is likely incorrect in the SILVA 138.1 database. As a result, we identify these groups to genus level to be conservative in final plots, realizing that there is ecotype level variation that we cannot infer with this data.
ps_clus1 = subset_samples(ps_subset_16S, cluster_16S == "Cluster 1") # Subset to Cluster 1
ps_syne = subset_taxa(ps_clus1, Order == "Synechococcales") # Subset to cyanobacteria
dataset <- phyloseq2meco(ps_syne) # Convert phyloseq to microeco object for plotting in that package
t1 <- trans_abund$new(dataset = dataset, taxrank = "Genus", ntaxa = 4, groupmean = "region") # Subset to top 4 families and group them by transect
g1 <- t1$plot_bar(bar_type = "full", use_alluvium = FALSE, clustering = FALSE, barwidth = NULL,
xtext_size = 12, color_values = group, others_color = "#757575", order_x = c("27N",
"FLSTRAITS", "CAPECORAL", "TAMPA", "PANAMACITY", "PENSACOLA", "LA", "GALVESTON",
"PAISNP", "BROWNSVILLE", "TAMPICO", "VERACRUZ", "CAMPECHE", "MERIDA", "YUCATAN",
"CATOCHE", "CANCUN")) # Order transects
g1 + geom_col(colour = "black") + labs(y = "Relative abundance (%)") + theme(text = element_text(size = 12)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Plot taxonomy tree maps for the major 16S groups to explore dynamics at multiple taxonomic levels. Prepare the data and subset to each cluster.
ps <- tax_glom(ps_subset_16S, "Genus", NArm = FALSE) # Agglomerate to genus level
x1 = speedyseq::psmelt(ps) # Melt the data
focus <- c("Proteobacteria", "Cyanobacteria", "Marinimicrobia_(SAR406_clade)", "Crenarchaeota",
"Bacteroidota", "Actinobacteriota") # Focus on top phylum level groups
x1$Phylum <- ifelse(x1$Phylum %in% focus, x1$Phylum, "Others") # Others category for plotting
x1$Phylum <- factor(x1$Phylum, levels = c("Actinobacteriota", "Bacteroidota", "Crenarchaeota",
"Cyanobacteria", "Marinimicrobia_(SAR406_clade)", "Proteobacteria", "Others")) # Set order for plotting
cluster1 <- x1[x1[["cluster_16S"]] == "Cluster 1", ] # Subset to Cluster 1
cluster2 <- x1[x1[["cluster_16S"]] == "Cluster 2", ] # Subset to Cluster 2
cluster3 <- x1[x1[["cluster_16S"]] == "Cluster 3", ] # Subset to Cluster 3
Plot 16S tree map - Cluster 1 (photic zone).
# pdf('Cluster1_treemap_16S_v2.pdf', width = 12, height = 4)
treemap(dtf = cluster1, title = "Cluster 1 (2-99 m)", algorithm = "pivotSize", border.lwds = c(2,
0.5, 0.1), border.col = c("black", "black", "black"), mapping = c(0, 0, 0), index = c("Order",
"Genus"), vSize = "Abundance", vColor = "Phylum", palette = "Dark2", type = "categorical",
fontsize.labels = c(10, 8), fontface.labels = c(2, 1), fontcolor.labels = c("white",
"white"), bg.labels = 255, position.legend = "bottom", align.labels = list(c("left",
"top"), c("center", "center")), inflate.labels = F, overlap.labels = 0.8,
lowerbound.cex.labels = 0.2, force.print.labels = F)
Plot 16S tree map - Cluster 2 (DCM).
# pdf('Cluster2_treemap_16S_v2.pdf', width = 12, height = 4)
treemap(dtf = cluster2, title = "Cluster 2 (2-124 m)", algorithm = "pivotSize", border.lwds = c(2,
0.5, 0.1), border.col = c("black", "black", "black"), mapping = c(0, 0, 0), index = c("Order",
"Genus"), vSize = "Abundance", vColor = "Phylum", palette = "Dark2", type = "categorical",
fontsize.labels = c(10, 8), fontface.labels = c(2, 1), fontcolor.labels = c("white",
"white"), bg.labels = 255, position.legend = "bottom", align.labels = list(c("left",
"top"), c("center", "center")), inflate.labels = F, overlap.labels = 0.8,
lowerbound.cex.labels = 0.2, force.print.labels = F)
Plot 16S tree map - Cluster 3 (aphotic zone).
# pdf('Cluster3_treemap_16S_v2.pdf', width = 12, height = 4)
treemap(dtf = cluster3, title = "Cluster 3 (135-3326 m)", algorithm = "pivotSize",
border.lwds = c(2, 0.5, 0.1), border.col = c("black", "black", "black"), mapping = c(0,
0, 0), index = c("Order", "Genus"), vSize = "Abundance", vColor = "Phylum",
palette = "Dark2", type = "categorical", fontsize.labels = c(10, 8), fontface.labels = c(2,
1), fontcolor.labels = c("white", "white"), bg.labels = 255, position.legend = "bottom",
align.labels = list(c("left", "top"), c("center", "center")), inflate.labels = F,
overlap.labels = 0.8, lowerbound.cex.labels = 0.2, force.print.labels = F)
Construct generalized linear models (GLMs) for each of the major 16S taxa at the order level, focusing on Cluster 1 samples (photic zone). We focused on Cluster 1 because most carbonate chemistry variables in Clusters 2-3 were collinear with each other. We also construct GLMs for the two major cyanobacterial groups, Prochlorococcus and Synechococcus.
First, for order level 16S groups.
order_16S <- tax_glom(ps_subset_16S, taxrank = "Order",NArm = FALSE) # Aggregate at the order level
order_16S = transform_sample_counts(order_16S, function(OTU) 100* OTU/sum(OTU)) # Conver to relative abundance
x1 = psmelt(order_16S) # Melt the data
cluster1 <- x1[x1[["cluster_16S"]]== "Cluster 1", ] # Subset to Cluster 1
glm.1 = cluster1 %>% # Group the data based on class and preserve sample replication for the models
dplyr::group_by(station,depth_category, Order,replicate) %>%
summarise(temp=mean(temp),
salinity= mean(salinity),
oxygen= mean(oxygen),
phosphate= mean(phosphate),
nitrate= mean(nitrate),
nitrite= mean(nitrite),
silicate= mean(silicate),
nh4= mean(nh4),
pH_corrected= mean(pH_corrected),
total_alkalinity= mean(total_alkalinity),
OMEGA_AR= mean(OMEGA_AR),
dic= mean(dic),
pCO2_corrected= mean(pCO2_corrected),
carbonate= mean(carbonate),
fluorescence= mean(fluorescence),
Abundance= mean(Abundance),
.groups = 'drop')
Now for the two genus level cyanobacteria groups.
genus_16S <- tax_glom(ps_subset_16S, taxrank = "Genus",NArm = FALSE) # Aggregate at the genus level
genus_16S = transform_sample_counts(genus_16S, function(OTU) 100* OTU/sum(OTU)) # Relative abundance
x2 = psmelt(genus_16S) # Melt the data
cluster1 <- x2[x2[["cluster_16S"]]== "Cluster 1", ] # Subset to Cluster 1
glm.2 = cluster1 %>% # Group the data based on class and preserve sample replication for the models
dplyr::group_by(station,depth_category, Genus,replicate) %>%
summarise(temp=mean(temp),
salinity= mean(salinity),
oxygen= mean(oxygen),
phosphate= mean(phosphate),
nitrate= mean(nitrate),
nitrite= mean(nitrite),
silicate= mean(silicate),
nh4= mean(nh4),
pH_corrected= mean(pH_corrected),
total_alkalinity= mean(total_alkalinity),
OMEGA_AR= mean(OMEGA_AR),
dic= mean(dic),
pCO2_corrected= mean(pCO2_corrected),
carbonate= mean(carbonate),
fluorescence= mean(fluorescence),
Abundance= mean(Abundance),
.groups = 'drop')
Select variables that are not collinear. Use Spearman correlations and variance inflation factors (VIFs) to find non-collinear variables.
df1 = glm.1[, c(5:19)] # Subset to include environmental variables
df1 = na.omit(df1) # Omit any rows that have NA values
correlations = cor(df1, method = "spearman") # Perform Spearman correlations to assess collinearity
# write.csv(correlations, 'Cluster1_corr_16S.csv',row.names=T) # Write .csv
# file
df1_filt = glm.1[, c(3, 5, 6, 7:9, 12, 13, 16, 20)] # Initial list of variables that were not collinear (Spearman < 0.7 or > -0.7)
df1_filt = na.omit(df1_filt) # Omit any rows that have NA values
model1 <- lm(Abundance ~ ., data = df1_filt) # Test model for VIFs
car::vif(model1) # Display VIFs from test model - VIFs should be < 10
## GVIF Df GVIF^(1/(2*Df))
## Order 1.000000 317 1.000000
## temp 2.817183 1 1.678446
## salinity 7.010084 1 2.647656
## oxygen 3.144437 1 1.773256
## phosphate 5.710115 1 2.389585
## nitrate 4.666002 1 2.160093
## nh4 1.565053 1 1.251021
## pH_corrected 5.197755 1 2.279859
## dic 9.105369 1 3.017510
# Repeat this process for the genus level groups
df1_filt2 = glm.2[, c(3, 5, 6, 7:9, 12, 13, 16, 20)]
df1_filt2 = na.omit(df1_filt2) # Omit any rows that have NA values
model1 <- lm(Abundance ~ ., data = df1_filt2)
car::vif(model1) # VIFs look good
## GVIF Df GVIF^(1/(2*Df))
## Genus 1.000000 762 1.000000
## temp 2.817183 1 1.678446
## salinity 7.010084 1 2.647656
## oxygen 3.144437 1 1.773256
## phosphate 5.710115 1 2.389585
## nitrate 4.666002 1 2.160093
## nh4 1.565053 1 1.251021
## pH_corrected 5.197755 1 2.279859
## dic 9.105369 1 3.017510
cluster2 <- x1[x1[["cluster_16S"]]== "Cluster 2", ] # Subset to Cluster 2
clus.2 = cluster2 %>% # Group the data based on class and preserve sample replication for the models
dplyr::group_by(station,depth_category, Class,replicate) %>%
summarise(temp=mean(temp),
salinity= mean(salinity),
oxygen= mean(oxygen),
phosphate= mean(phosphate),
nitrate= mean(nitrate),
nitrite= mean(nitrite),
silicate= mean(silicate),
nh4= mean(nh4),
pH_corrected= mean(pH_corrected),
total_alkalinity= mean(total_alkalinity),
OMEGA_AR= mean(OMEGA_AR),
dic= mean(dic),
pCO2_corrected= mean(pCO2_corrected),
carbonate= mean(carbonate),
fluorescence= mean(fluorescence),
Abundance= mean(Abundance),
.groups = 'drop')
cluster3 <- x1[x1[["cluster_16S"]]== "Cluster 3", ] # Subset to Cluster 3
clus.3 = cluster3 %>% # Group the data based on class and preserve sample replication for the models
dplyr::group_by(station,depth_category, Class,replicate) %>%
summarise(temp=mean(temp),
salinity= mean(salinity),
oxygen= mean(oxygen),
phosphate= mean(phosphate),
nitrate= mean(nitrate),
nitrite= mean(nitrite),
silicate= mean(silicate),
nh4= mean(nh4),
pH_corrected= mean(pH_corrected),
total_alkalinity= mean(total_alkalinity),
OMEGA_AR= mean(OMEGA_AR),
dic= mean(dic),
pCO2_corrected= mean(pCO2_corrected),
carbonate= mean(carbonate),
fluorescence= mean(fluorescence),
Abundance= mean(Abundance),
.groups = 'drop')
# Spearman Cluster 2
df2 = clus.2[,c(5:19)] # Subset to include environmental variables
df2 = na.omit(df2) # Omit any rows that have NA values
correlations2 = cor(df2, method = "spearman") # Perform Spearman correlations to assess collinearity
#write.csv(correlations2, "Cluster2_corr_16S.csv",row.names=T) # Write .csv file
# Spearman Cluster 3
df3 = clus.3[,c(5:19)] # Subset to include environmental variables
df3 = na.omit(df3) # Omit any rows that have NA values
correlations3 = cor(df3, method = "spearman") # Perform Spearman correlations to assess collinearity
#write.csv(correlations3, "Cluster3_corr_16S.csv",row.names=T) # Write .csv file
More variables in Clusters 2-3 are collinear, so stick with Cluster 1 models.
The following variables were selected for initial models: temperature, oxygen, salinity, phosphate, nitrate, ammonium, DIC, and pH.
Subset to the taxonomic groups of interest.
df_sar <- df1_filt[df1_filt[["Order"]] == "SAR11_clade", ]
df_sar86 <- df1_filt[df1_filt[["Order"]] == "SAR86_clade", ]
df_flavo <- df1_filt[df1_filt[["Order"]] == "Flavobacteriales", ]
df_synecho <- df1_filt2[df1_filt2[["Genus"]] == "Synechococcus_CC9902", ] # Synechococcus
df_prochlo <- df1_filt2[df1_filt2[["Genus"]] == "Prochlorococcus_MIT9313", ] # Prochlorococcus
Run model based on Poisson and negative binomial distribution. Compare models and select the better option using AIC values.
df_synecho$Abundance = round(df_synecho$Abundance) # Round abundance
model_synecho <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_synecho, family = poisson) # Poisson model
summary(model_synecho) # Summary
model_synecho_nb <- glm.nb(Abundance ~ temp + salinity + oxygen + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_synecho) # NB model
summary(model_synecho_nb) # Summary
compare_performance(model_synecho, model_synecho_nb, verbose = FALSE) # Compare the two models - NB has lower AIC
Use negative binomial for Synechococcus.
Select significant environmental variables to find the best model fit.
model_synecho_nb <- glm.nb(Abundance ~ temp + salinity + oxygen + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_synecho) # Run NB model again
summary(model_synecho_nb)
stepAIC(model_synecho_nb, direction = "both", k = 3.8415) # Stepwise AIC selection
model_synecho_sig <- glm.nb(Abundance ~ oxygen + nitrate + pH_corrected + dic, data = df_synecho) # Final model with significant factors
summary(model_synecho_sig) # Summary
Check model performance for Synechococcus.
performance::r2(model_synecho_sig) # Pseudo R2
check_overdispersion(model_synecho_sig) # Overdispersion check
check_zeroinflation(model_synecho_sig) # Zero-inflation check
model_performance(model_synecho_sig) # AIC, R2, RMSE, and other scores
check_residuals(model_synecho_sig) # Uniform distribution check
# check_model(model_synecho_sig,plot_type = 2) # All model performance plots
Split and train relative abundance data to test the fit of the final Synechococcus model. Confirm fit with Pearson correlation.
set.seed(1234) # Randomization
synecho_split <- df_synecho %>% # Split the data 80:20
split(if_else(runif(nrow(.)) <= 0.8, "train", "test"))
map_int(synecho_split, nrow)
model_synecho_train <- glm.nb(Abundance ~ oxygen + pH_corrected + nitrate + dic, data = synecho_split$train) # NB model with 80% of data
test = synecho_split$test
obs = test$Abundance
m_nb_pred <- predict(model_synecho_train, synecho_split$test, type = "response") %>% # Predict the other 20% and compare with real test values
tibble(family = "Negative Binomial", pred = .) %>%
bind_cols(obs)
p <- ggscatter(m_nb_pred, x = "pred", y = "...3", cor.method = "pearson", cor.coef =T, size = 3, add.params = list(color = "black", fill = "darkgray"), add = "reg.line", conf.int = TRUE, ylab = "Test relative abundance (%)", xlab = "Predicted relative abundance (%)",title = "Synechococcus GLM model fit") +
geom_point(fill="#732633",size = 5, shape = 21, colour = "black") + theme(legend.position = "right") + theme(text = element_text(size = 18)) + theme_bw()
p
Run through similar steps with Prochlorococcus to find best model.
df_prochlo$Abundance = round(df_prochlo$Abundance) # Round abundance
model_prochlo <- glm(Abundance ~ temp + oxygen + salinity + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_prochlo, family = poisson) # Poisson model
summary(model_prochlo) # Summary
model_prochlo_nb <- glm.nb(Abundance ~ temp + oxygen + salinity + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_prochlo) # NB model
summary(model_prochlo_nb)
compare_performance(model_prochlo, model_prochlo_nb, verbose = FALSE) # Compare the two models - NB has lower AIC
Use negative binomial for Prochlorococcus.
Select environmental variables for the final model.
model_prochlo_nb <- glm.nb(Abundance ~ temp + oxygen + salinity + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_prochlo) # Run NB model again
summary(model_prochlo_nb)
model_prochlo_sig <- glm.nb(Abundance ~ salinity + phosphate + pH_corrected + nitrate,
data = df_prochlo) # Final NB model with significant factors
summary(model_prochlo_sig)
Check model performance for Prochlorococcus.
performance::r2(model_prochlo_sig)
check_overdispersion(model_prochlo_sig)
check_zeroinflation(model_prochlo_sig)
check_residuals(model_prochlo_sig)
model_performance(model_prochlo_sig)
# check_model(model_prochlo_sig,plot_type=2)
Split and train relative abundance data to test the fit of the final Prochlorococcus model. Same steps as with Synechococcus, split data 80:20.
set.seed(1234) # Randomization
prochlo_split <- df_prochlo %>%
split(if_else(runif(nrow(.)) <= 0.8, "train", "test"))
map_int(prochlo_split, nrow)
model_prochlo_train <- glm.nb(Abundance ~ salinity + phosphate + pH_corrected + nitrate, data = prochlo_split$train) # NB model
test = prochlo_split$test
obs = test$Abundance
m_nb_pred <- predict(model_prochlo_train, prochlo_split$test, type = "response") %>% # Predict other 20% and compare with real values
tibble(family = "Negative Binomial", pred = .) %>%
bind_cols(obs)
p <- ggscatter(m_nb_pred, x = "pred", y = "...3", cor.method = "pearson", cor.coef =T, size = 3, add.params = list(color = "black", fill = "darkgray"), add = "reg.line", conf.int = TRUE, ylab = "Test relative abundance (%)", xlab = "Predicted relative abundance (%)",title = "Prochloroccus GLM model fit") +
geom_point(fill="#CC6677",size = 5, shape = 21, colour = "black") + theme(legend.position = "right") + theme(text = element_text(size = 18)) + theme_bw()
p
Run through similar steps with SAR11.
df_sar$Abundance = round(df_sar$Abundance) # Round abundance
model_sar <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_sar, family = poisson) # Poisson model
summary(model_sar)
model_sar_nb <- glm.nb(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_sar) # NB model
summary(model_sar_nb) # Summary
compare_performance(model_sar, model_sar_nb, verbose = FALSE) # Compare the two models - Poisson has lower AIC
Use Poisson model for SAR11.
Select final model with only significant variables.
model_sar <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_sar, family = poisson) # Run Poisson model again
summary(model_sar)
stepAIC(model_sar, direction = "both", k = 3.8415) # Stepwise AIC for factor selection
model_sar_sig <- glm(Abundance ~ temp + oxygen + dic + nh4, data = df_sar, family = poisson) # Final model
summary(model_sar_sig) # Summary
Check model performance for SAR11. Same checks as other groups.
performance::r2(model_sar_sig)
check_overdispersion(model_sar_sig)
check_zeroinflation(model_sar_sig)
check_residuals(model_sar_sig)
model_performance(model_sar_sig)
check_model(model_sar_sig, check = "pp_check", type = "density") # Predictive plot
Split and train relative abundance data to test the fit of the final SAR11 model.
set.seed(1234) # Randomization
sar_split <- df_sar %>% # Split data
split(if_else(runif(nrow(.)) <= 0.8, "train", "test"))
map_int(sar_split, nrow)
model_sar_train <- glm(Abundance ~ temp + oxygen + dic + nh4, data = sar_split$train,family=poisson) # Poisson model
test = sar_split$test
obs = test$Abundance
m_nb_pred <- predict(model_sar_train, sar_split$test, type = "response") %>% # Predict
tibble(family = "Poisson", pred = .) %>%
bind_cols(obs)
p <- ggscatter(m_nb_pred, x = "pred", y = "...3", cor.method = "pearson", cor.coef =T, size = 3, add.params = list(color = "black", fill = "darkgray"), add = "reg.line", conf.int = TRUE, ylab = "Test relative abundance (%)", xlab = "Predicted relative abundance (%)",title = "SAR11 clade GLM model fit") +
geom_point(fill="#88CCEE",size = 5, shape = 21, colour = "black") + theme(legend.position = "right") + theme(text = element_text(size = 18)) + theme_bw()
p
Repeat steps for SAR86.
df_sar86$Abundance = round(df_sar86$Abundance) # Round abundance
model_sar86 <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_sar86, family = poisson) # Poisson model
summary(model_sar86)
model_sar86_nb <- glm.nb(Abundance ~ temp + salinity + oxygen + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_sar86) # NB model
summary(model_sar86_nb)
compare_performance(model_sar86, model_sar86_nb, verbose = FALSE) # Compare the two models - Poisson has lower AIC
Use Poisson model for SAR86 as well.
Select final model with only significant variables.
model_sar86 <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_sar86, family = poisson) # Run Poisson model again
summary(model_sar86)
stepAIC(model_sar86, direction = "both", k = 3.8415) # Stepwise selection
model_sar86_sig <- glm(Abundance ~ temp + oxygen + nh4 + pH_corrected + dic, data = df_sar86,
family = poisson) # Final model
summary(model_sar86_sig)
Check model performance.
performance::r2(model_sar86_sig)
check_overdispersion(model_sar86_sig)
check_zeroinflation(model_sar86_sig)
check_residuals(model_sar86_sig)
model_performance(model_sar86_sig)
# check_model(model_sar86_sig,plot_type=2)
Split and train relative abundance data to test the fit of the final SAR86 model.
set.seed(1234) # Randomization
sar86_split <- df_sar86 %>%
split(if_else(runif(nrow(.)) <= 0.8, "train", "test"))
map_int(sar86_split, nrow)
model_sar86_train <- glm(Abundance ~ temp + oxygen + nh4 + pH_corrected + dic, data = sar86_split$train,
family = poisson) # Poisson model
test = sar86_split$test
obs = test$Abundance
m_nb_pred <- predict(model_sar86_train, sar86_split$test, type = "response") %>%
tibble(family = "Poisson", pred = .) %>%
bind_cols(obs)
p <- ggscatter(m_nb_pred, x = "pred", y = "...3", cor.method = "pearson", cor.coef = T,
point = "FALSE", add.params = list(color = "black", fill = "darkgray"), add = "reg.line",
conf.int = TRUE, ylab = "Test relative abundance (%)", xlab = "Predicted relative abundance (%)",
title = "SAR86 clade GLM model fit") + geom_point(fill = "#DDCC77", size = 5,
shape = 21, colour = "black", position = "jitter") + theme(legend.position = "right") +
theme(text = element_text(size = 18)) + theme_bw()
p
Repeat steps for Flavobacteriales.
df_flavo$Abundance = round(df_flavo$Abundance) # Round abundance
model_flavo <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_flavo, family = poisson) # Poisson model
summary(model_flavo) # Summary
model_flavo_nb <- glm.nb(Abundance ~ temp + salinity + oxygen + phosphate + nh4 +
pH_corrected + dic + nitrate, data = df_flavo) # NB model
summary(model_flavo_nb)
compare_performance(model_flavo, model_flavo_nb, verbose = FALSE) # Compare the two models - Poisson has lower AIC
Use Poisson model for Flavobacteriales.
Select final model with only significant variables.
model_flavo <- glm(Abundance ~ temp + salinity + oxygen + phosphate + nh4 + pH_corrected +
dic + nitrate, data = df_flavo, family = poisson) # Run Poisson model again
summary(model_flavo)
stepAIC(model_flavo, direction = "both", k = 3.8415) # Stepwise selection of factors
model_flavo_sig <- glm(Abundance ~ salinity + dic, data = df_flavo, family = poisson) # Final model
summary(model_flavo_sig)
Check model performance.
performance::r2(model_flavo_sig)
check_overdispersion(model_flavo_sig)
check_zeroinflation(model_flavo_sig)
model_performance(model_flavo_sig)
check_residuals(model_flavo_sig)
# check_model(model_flavo_sig,plot_type = 2)
Split and train relative abundance data to test the fit of the final Flavobacteriales model.
set.seed(1234) # Randomization
flavo_split <- df_flavo %>%
split(if_else(runif(nrow(.)) <= 0.8, "train", "test"))
map_int(flavo_split, nrow)
model_flavo_train <- glm(Abundance ~ salinity + dic, data = flavo_split$train, family = poisson) # Poisson model
test = flavo_split$test
obs = test$Abundance
m_nb_pred <- predict(model_flavo_train, flavo_split$test, type = "response") %>%
tibble(family = "Poisson", pred = .) %>%
bind_cols(obs)
p <- ggscatter(m_nb_pred, x = "pred", y = "...3", cor.method = "pearson", cor.coef = T,
point = "FALSE", add.params = list(color = "black", fill = "darkgray"), add = "reg.line",
conf.int = TRUE, ylab = "Test relative abundance (%)", xlab = "Predicted relative abundance (%)",
title = "Flavobacteriales GLM model fit") + geom_point(fill = "#332288", size = 5,
shape = 21, colour = "black", position = "jitter") + theme(legend.position = "right") +
theme(text = element_text(size = 18)) + theme_bw()
p
Model coefficients from all final 16S GLMs need to be scaled to plot them on one single plot.
df_synecho$oxygen <- scale(df_synecho$oxygen) # Scale model variables for Synechococcus
df_synecho$nitrate <- scale(df_synecho$nitrate)
df_synecho$dic <- scale(df_synecho$dic)
df_synecho$pH_corrected <- scale(df_synecho$pH_corrected)
df_prochlo$phosphate <- scale(df_prochlo$phosphate) # Scale model variables for Prochlorococcus
df_prochlo$pH_corrected <- scale(df_prochlo$pH_corrected)
df_prochlo$nitrate <- scale(df_prochlo$nitrate)
df_prochlo$salinity <- scale(df_prochlo$salinity)
df_sar$dic <- scale(df_sar$dic) # Scale model variables for SAR11
df_sar$nh4 <- scale(df_sar$nh4)
df_sar$oxygen <- scale(df_sar$oxygen)
df_sar$temp <- scale(df_sar$temp)
df_sar86$temp <- scale(df_sar86$temp) # Scale model variables for SAR86
df_sar86$oxygen <- scale(df_sar86$oxygen)
df_sar86$nh4 <- scale(df_sar86$nh4)
df_sar86$dic <- scale(df_sar86$dic)
df_sar86$pH_corrected <- scale(df_sar86$pH_corrected)
df_flavo$salinity <- scale(df_flavo$salinity) # Scale model variables for Flavobacteriales
df_flavo$dic <- scale(df_flavo$dic)
Run the final models again with scaled parameters.
model_Synechococcus <- glm.nb(Abundance ~ oxygen + pH_corrected + dic + nitrate,
data = df_synecho)
model_Prochlorococcus <- glm.nb(Abundance ~ nitrate + pH_corrected + phosphate +
salinity, data = df_prochlo)
model_SAR11 <- glm(Abundance ~ temp + oxygen + nh4 + dic, data = df_sar, family = poisson)
model_SAR86 <- glm(Abundance ~ temp + oxygen + nh4 + dic + pH_corrected, data = df_sar86,
family = poisson)
model_Flavobacteriales <- glm(Abundance ~ salinity + dic, data = df_flavo, family = poisson)
Plot the model coefficients.
p <- multiplot(model_Synechococcus, model_Prochlorococcus, model_SAR11, model_SAR86,
model_Flavobacteriales, intercept = FALSE, title = "Model Effect on 16S groups",
decreasing = F, shape = 16, pointSize = 4, sort = NULL, dodgeHeight = 0.5, outerCI = 2)
p$data$Coefficient <- factor(p$data$Coefficient, levels = c("temp", "oxygen", "salinity",
"nh4", "phosphate", "nitrate", "pH_corrected", "dic")) # Order variables on y-axis
p + theme(legend.position = "bottom") + xlim(-1.5, 1.5) + scale_color_manual(values = c("#332288",
"#CC6677", "#88CCEE", "#DDCC77", "#732633")) + theme_bw() + theme(text = element_text(size = 12))
# ggsave(filename = '16S_glm_coefficients_final.pdf', plot = last_plot(),
# device = 'pdf', path = NULL, scale = 1, width = 6, height = 5, dpi = 150)
Focus on certain variables from final 16S models, with an emphasis on carbonate chemistry parameters (DIC and pH) and temperature. In some cases, the variable was not significant to a given model, and so, the confidence interval and regression line were omitted.
# Synechococcus
dic_synecho = plot_model(model_synecho_sig, type = "pred", terms = c("dic"), colors = "#732633",
alpha = 0.3, dot.size = 1, dot.alpha = 1, show.data = T, jitter = 0.1)
fig1 <- dic_synecho + ylab("Predicted response") + xlab("DIC") + theme_bw() + geom_line(lwd = 0.5,
colour = "#732633", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig1)
temp_synecho = plot_model(model_synecho_nb, type = "pred", terms = c("temp"), colors = "#FFFFFF",
rawdata = TRUE, show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1,
ci.lvl = NA)
fig2 <- temp_synecho + ylab("Predicted response") + xlab("Temperature") + theme_bw() +
geom_line(lwd = 0, colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig2)
pH_synecho = plot_model(model_synecho_sig, type = "pred", terms = c("pH_corrected"),
colors = "#732633", show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1,
dot.alpha = 1, jitter = 0.1)
fig3 <- pH_synecho + ylab("Predicted response") + xlab("pH") + theme_bw() + geom_line(lwd = 0.5,
colour = "#732633", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig3)
# Prochlorococcus
dic_prochlo = plot_model(model_prochlo_nb, type = "pred", terms = c("dic"), colors = "#FFFFFF",
alpha = 0.3, dot.size = 1, dot.alpha = 1, show.data = T, jitter = 0.1, ci.lvl = NA)
fig4 <- dic_prochlo + ylab("Predicted response") + xlab("DIC") + theme_bw() + geom_line(lwd = 0,
colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45, vjust = 1,
hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig4)
temp_prochlo = plot_model(model_prochlo_nb, type = "pred", terms = c("temp"), colors = "#FFFFFF",
rawdata = TRUE, show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1,
ci.lvl = NA)
fig5 <- temp_prochlo + ylab("Predicted response") + xlab("Temperature") + theme_bw() +
geom_line(lwd = 0, colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig5)
pH_prochlo = plot_model(model_prochlo_sig, type = "pred", terms = c("pH_corrected"),
colors = "#CC6677", show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1,
dot.alpha = 1, jitter = 0.1)
fig6 <- pH_prochlo + ylab("Predicted response") + xlab("pH") + theme_bw() + geom_line(lwd = 0.5,
colour = "#CC6677", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig6)
# SAR11
dic_sar = plot_model(model_sar_sig, type = "pred", terms = c("dic"), colors = "#88CCEE",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1)
fig7 <- dic_sar + ylab("Predicted response") + xlab("DIC") + theme_bw() + geom_line(lwd = 0.5,
colour = "#88CCEE", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig7)
temp_sar = plot_model(model_sar_sig, type = "pred", terms = c("temp"), colors = "#88CCEE",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1)
fig8 <- temp_sar + ylab("Predicted response") + xlab("Temperature") + theme_bw() +
geom_line(lwd = 0.5, colour = "#88CCEE", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig8)
pH_sar = plot_model(model_sar, type = "pred", terms = c("pH_corrected"), colors = "#FFFFFF",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, ci.lvl = NA, jitter = 0.1)
fig9 <- pH_sar + ylab("Predicted response") + xlab("pH") + theme_bw() + geom_line(lwd = 0,
colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45, vjust = 1,
hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig9)
# SAR86
dic_sar86 = plot_model(model_sar86_sig, type = "pred", terms = c("dic"), colors = "#DDCC77",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1)
fig10 <- dic_sar86 + ylab("Predicted response") + xlab("DIC") + theme_bw() + geom_line(lwd = 0.5,
colour = "#DDCC77", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig10)
temp_sar86 = plot_model(model_sar86_sig, type = "pred", terms = c("temp"), colors = "#DDCC77",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1)
fig11 <- temp_sar86 + ylab("Predicted response") + xlab("Temperature") + theme_bw() +
geom_line(lwd = 0.5, colour = "#DDCC77", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig11)
pH_sar86 = plot_model(model_sar86_sig, type = "pred", terms = c("pH_corrected"),
colors = "#DDCC77", show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1,
jitter = 0.1)
fig12 <- pH_sar86 + ylab("Predicted response") + xlab("pH") + theme_bw() + geom_line(lwd = 0.5,
colour = "#DDCC77", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig12)
# Flavobacteriales
dic_flavo = plot_model(model_flavo_sig, type = "pred", terms = c("dic"), colors = "#332288",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1)
fig13 <- dic_flavo + ylab("Predicted response") + xlab("DIC") + theme_bw() + geom_line(lwd = 0.5,
colour = "#332288", linetype = "solid") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig13)
temp_flavo = plot_model(model_flavo, type = "pred", terms = c("temp"), colors = "#FFFFFF",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, jitter = 0.1, ci.lvl = NA)
fig14 <- temp_flavo + ylab("Predicted response") + xlab("Temperature") + theme_bw() +
geom_line(lwd = 0, colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45,
vjust = 1, hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig14)
pH_flavo = plot_model(model_flavo, type = "pred", terms = c("pH_corrected"), colors = "#FFFFFF",
show.data = T, alpha = 0.3, dot.size = 1, dot.alpha = 1, ci.lvl = NA, jitter = 0.1)
fig15 <- pH_flavo + ylab("Predicted response") + xlab("pH") + theme_bw() + geom_line(lwd = 0,
colour = "#FFFFFF") + theme(axis.text.x = element_text(angle = 45, vjust = 1,
hjust = 1)) + theme(text = element_text(size = 12), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) +
ggtitle("")
# plot(fig15)
ggarrange(fig1, fig2, fig3, fig4, fig5, fig6, fig7, fig8, fig9, fig10, fig11, fig12,
fig13, fig14, fig15, nrow = 5, ncol = 3) # Arrange all plots on single panel figure
Models for 16S groups were used to predict relative abundance at 135 GOMECC-4 sites at the surface (< 10 m). Data for 6 sites were not available at the surface. This resulted in 84 sites where DNA was not collected, allowing for an expanded view of microbial distribution in the Gulf of Mexico surface waters at this time.
gomecc_all <- read.csv("GOMECC-all-sites-final.csv", header = T, row.names = NULL,
check.names = F, fileEncoding = "UTF-8-BOM") # Load file of metadata from all surface sites
gomecc_all_synecho = gomecc_all[, c(9, 11, 13, 15)] # Subset to variables in final Synechococcus model
new_val_synecho = predict(model_synecho_sig, newdata = gomecc_all_synecho, type = "response") # Predict new values based on GLM
new_val_synecho = as.data.frame(new_val_synecho)
odv_synecho = cbind(new_val_synecho, gomecc_all[4:6])
gomecc_all_prochlo = gomecc_all[, c(8, 11, 15, 16)] # Subset to variables in final Prochlorococcus model
new_val_prochlo = predict(model_prochlo_sig, newdata = gomecc_all_prochlo, type = "response") # Predict new values based on GLM
new_val_prochlo = as.data.frame(new_val_prochlo)
odv_prochlo = cbind(new_val_prochlo, gomecc_all[4:6])
gomecc_all_sar11 = gomecc_all[, c(7, 9, 12:13)] # Subset to variables in final SAR11 model
new_val_sar11 = predict(model_sar_sig, newdata = gomecc_all_sar11, type = "response") # Predict new values based on GLM
new_val_sar11 = as.data.frame(new_val_sar11)
odv_sar11 = cbind(new_val_sar11, gomecc_all[4:6])
gomecc_all_sar86 = gomecc_all[, c(7, 9, 12:13, 15)] # Subset to variables in final SAR86 model
new_val_sar86 = predict(model_sar86_sig, newdata = gomecc_all_sar86, type = "response")
new_val_sar86 = as.data.frame(new_val_sar86)
odv_sar86 = cbind(new_val_sar86, gomecc_all[4:6])
gomecc_all_flavo = gomecc_all[, c(8, 13)] # Subset to variables in final Flavobacteriales model
new_val_flavo = predict(model_flavo_sig, newdata = gomecc_all_flavo, type = "response")
new_val_flavo = as.data.frame(new_val_flavo)
odv_flavo = cbind(new_val_flavo, gomecc_all[4:6])
all_odv_16S = cbind(odv_sar11, new_val_synecho, new_val_prochlo, new_val_sar86, new_val_flavo) # Combine new predicted values
# write.csv(all_odv_16S, 'ODV_16S_abund_v5.csv',row.names=T) # Write .csv file
This file with predicted values will be uploaded and analyzed in ODV (ODV files provided on GitHub).
Explore microbes at the ASV level that are indicative of different acidic conditions in the Gulf of Mexico at this time. As with the models, focus on samples from the photic zone (Cluster 1). We used the TA:DIC (total alkalinity:dissolved inorganic carbon) ratio as it is a good proxy of OA, indicating the buffering capacity of seawater. A lower TA:DIC = less buffered waters (more acidic), while a higher TA:DIC = more buffered waters.
The cutoff for TA:DIC was 1.16 and this was manually set by inspecting histograms. See the 18S workflow for the TA:DIC histogram and plot of TA:DIC vs. in situ pH values.
Run the core indicator analysis program using 16S ASVs.
ps_dsq <- subset_samples(ps_subset_16S, cluster_16S == "Cluster 1") # Subset to Cluster 1
ps_dsq <- subset_samples(ps_dsq, alk_dic_oa == "High" | alk_dic_oa == "Low") # Split samples based on high or low TA:DIC (see above)
ASV = otu_table(ps_dsq) # Subset ASVs
ASV = as.data.frame(ASV) # Convert to data frame
if (taxa_are_rows(ps_dsq)) {
ASV <- t(ASV)
} # Flip table for analysis
ASV = as.data.frame(ASV) # Convert to df again
data <- as(sample_data(ps_dsq), "data.frame") # Subset metadata
data = data[!is.na(data$alk_dic_oa), ] # Remove TA:DIC NA values
bin = data$alk_dic_oa # Bin based on low vs. high categories
inv = multipatt(ASV, bin, func = "r.g", control = how(nperm = 999)) # Core function using 999 permutations
summary(inv, alpha = 0.001) # Filter for significance
##
## Multilevel pattern analysis
## ---------------------------
##
## Association function: r.g
## Significance level (alpha): 0.001
##
## Total number of species: 41876
## Selected number of species: 364
## Number of species associated to 1 group: 364
##
## List of species associated to each combination:
##
## Group High #sps. 136
## stat p.value
## bASV34469 0.602 0.001 ***
## bASV25500 0.594 0.001 ***
## bASV38618 0.547 0.001 ***
## bASV2782 0.536 0.001 ***
## bASV15407 0.520 0.001 ***
## bASV19985 0.520 0.001 ***
## bASV64330 0.518 0.001 ***
## bASV33394 0.511 0.001 ***
## bASV59407 0.506 0.001 ***
## bASV36177 0.498 0.001 ***
## bASV51551 0.481 0.001 ***
## bASV57209 0.480 0.001 ***
## bASV14844 0.472 0.001 ***
## bASV63192 0.472 0.001 ***
## bASV861 0.469 0.001 ***
## bASV50572 0.457 0.001 ***
## bASV23115 0.452 0.001 ***
## bASV7705 0.449 0.001 ***
## bASV5199 0.447 0.001 ***
## bASV44906 0.440 0.001 ***
## bASV16174 0.437 0.001 ***
## bASV64489 0.432 0.001 ***
## bASV54460 0.428 0.001 ***
## bASV226 0.428 0.001 ***
## bASV54629 0.426 0.001 ***
## bASV10486 0.417 0.001 ***
## bASV24327 0.416 0.001 ***
## bASV5372 0.415 0.001 ***
## bASV39737 0.411 0.001 ***
## bASV53062 0.409 0.001 ***
## bASV49874 0.408 0.001 ***
## bASV1863 0.396 0.001 ***
## bASV44663 0.391 0.001 ***
## bASV56818 0.389 0.001 ***
## bASV16332 0.389 0.001 ***
## bASV64418 0.383 0.001 ***
## bASV7977 0.383 0.001 ***
## bASV20352 0.375 0.001 ***
## bASV13884 0.375 0.001 ***
## bASV5042 0.373 0.001 ***
## bASV35455 0.373 0.001 ***
## bASV568 0.372 0.001 ***
## bASV8852 0.370 0.001 ***
## bASV53318 0.365 0.001 ***
## bASV61792 0.363 0.001 ***
## bASV11961 0.363 0.001 ***
## bASV26915 0.360 0.001 ***
## bASV45849 0.355 0.001 ***
## bASV19343 0.350 0.001 ***
## bASV28302 0.346 0.001 ***
## bASV57962 0.345 0.001 ***
## bASV2543 0.345 0.001 ***
## bASV32865 0.344 0.001 ***
## bASV59673 0.340 0.001 ***
## bASV36253 0.338 0.001 ***
## bASV22426 0.332 0.001 ***
## bASV31852 0.330 0.001 ***
## bASV62363 0.326 0.001 ***
## bASV37687 0.325 0.001 ***
## bASV20058 0.325 0.001 ***
## bASV46630 0.323 0.001 ***
## bASV34644 0.315 0.001 ***
## bASV15828 0.308 0.001 ***
## bASV4542 0.307 0.001 ***
## bASV56333 0.300 0.001 ***
## bASV59490 0.298 0.001 ***
## bASV54102 0.291 0.001 ***
## bASV32308 0.291 0.001 ***
## bASV28382 0.290 0.001 ***
## bASV57616 0.289 0.001 ***
## bASV45968 0.289 0.001 ***
## bASV8975 0.285 0.001 ***
## bASV29765 0.285 0.001 ***
## bASV17378 0.283 0.001 ***
## bASV64130 0.281 0.001 ***
## bASV53134 0.280 0.001 ***
## bASV38043 0.277 0.001 ***
## bASV59075 0.275 0.001 ***
## bASV5449 0.271 0.001 ***
## bASV46386 0.267 0.001 ***
## bASV7560 0.267 0.001 ***
## bASV20149 0.267 0.001 ***
## bASV63597 0.266 0.001 ***
## bASV14749 0.266 0.001 ***
## bASV13864 0.264 0.001 ***
## bASV4599 0.262 0.001 ***
## bASV36054 0.259 0.001 ***
## bASV13983 0.257 0.001 ***
## bASV15554 0.257 0.001 ***
## bASV38183 0.256 0.001 ***
## bASV15709 0.255 0.001 ***
## bASV13753 0.253 0.001 ***
## bASV12796 0.252 0.001 ***
## bASV61256 0.251 0.001 ***
## bASV12106 0.250 0.001 ***
## bASV18527 0.249 0.001 ***
## bASV39984 0.248 0.001 ***
## bASV14688 0.248 0.001 ***
## bASV7176 0.246 0.001 ***
## bASV20871 0.244 0.001 ***
## bASV34960 0.239 0.001 ***
## bASV19485 0.238 0.001 ***
## bASV41020 0.237 0.001 ***
## bASV28929 0.236 0.001 ***
## bASV34464 0.235 0.001 ***
## bASV2438 0.234 0.001 ***
## bASV60666 0.232 0.001 ***
## bASV21271 0.231 0.001 ***
## bASV3970 0.231 0.001 ***
## bASV18017 0.228 0.001 ***
## bASV36864 0.227 0.001 ***
## bASV40171 0.225 0.001 ***
## bASV11769 0.222 0.001 ***
## bASV53975 0.222 0.001 ***
## bASV10546 0.221 0.001 ***
## bASV668 0.220 0.001 ***
## bASV146 0.220 0.001 ***
## bASV12056 0.219 0.001 ***
## bASV58239 0.218 0.001 ***
## bASV16816 0.217 0.001 ***
## bASV5107 0.216 0.001 ***
## bASV39398 0.215 0.001 ***
## bASV52950 0.214 0.001 ***
## bASV10947 0.211 0.001 ***
## bASV4835 0.210 0.001 ***
## bASV60665 0.209 0.001 ***
## bASV52865 0.205 0.001 ***
## bASV52298 0.205 0.001 ***
## bASV17058 0.203 0.001 ***
## bASV7111 0.202 0.001 ***
## bASV43620 0.201 0.001 ***
## bASV14960 0.197 0.001 ***
## bASV60080 0.194 0.001 ***
## bASV13680 0.193 0.001 ***
## bASV5554 0.193 0.001 ***
## bASV43964 0.181 0.001 ***
##
## Group Low #sps. 228
## stat p.value
## bASV39694 0.568 0.001 ***
## bASV56004 0.526 0.001 ***
## bASV7300 0.514 0.001 ***
## bASV26955 0.489 0.001 ***
## bASV20831 0.455 0.001 ***
## bASV23197 0.454 0.001 ***
## bASV46867 0.452 0.001 ***
## bASV37663 0.444 0.001 ***
## bASV63182 0.424 0.001 ***
## bASV15747 0.416 0.001 ***
## bASV35656 0.415 0.001 ***
## bASV57491 0.404 0.001 ***
## bASV22910 0.401 0.001 ***
## bASV32420 0.397 0.001 ***
## bASV57336 0.391 0.001 ***
## bASV49524 0.390 0.001 ***
## bASV47158 0.387 0.001 ***
## bASV6522 0.386 0.001 ***
## bASV55335 0.385 0.001 ***
## bASV330 0.385 0.001 ***
## bASV44319 0.380 0.001 ***
## bASV34171 0.379 0.001 ***
## bASV3765 0.369 0.001 ***
## bASV47397 0.365 0.001 ***
## bASV3819 0.364 0.001 ***
## bASV29048 0.358 0.001 ***
## bASV5090 0.353 0.001 ***
## bASV57651 0.351 0.001 ***
## bASV19020 0.348 0.001 ***
## bASV51640 0.347 0.001 ***
## bASV38839 0.346 0.001 ***
## bASV62614 0.341 0.001 ***
## bASV51620 0.339 0.001 ***
## bASV42763 0.337 0.001 ***
## bASV29109 0.334 0.001 ***
## bASV50435 0.331 0.001 ***
## bASV691 0.330 0.001 ***
## bASV61991 0.330 0.001 ***
## bASV36684 0.330 0.001 ***
## bASV45475 0.328 0.001 ***
## bASV33634 0.328 0.001 ***
## bASV58460 0.327 0.001 ***
## bASV58715 0.325 0.001 ***
## bASV8268 0.324 0.001 ***
## bASV20840 0.320 0.001 ***
## bASV9869 0.315 0.001 ***
## bASV7873 0.313 0.001 ***
## bASV31646 0.312 0.001 ***
## bASV49903 0.309 0.001 ***
## bASV38494 0.308 0.001 ***
## bASV21661 0.306 0.001 ***
## bASV52762 0.301 0.001 ***
## bASV35759 0.300 0.001 ***
## bASV40299 0.299 0.001 ***
## bASV44115 0.297 0.001 ***
## bASV49082 0.295 0.001 ***
## bASV6434 0.295 0.001 ***
## bASV63151 0.295 0.001 ***
## bASV18000 0.294 0.001 ***
## bASV44046 0.294 0.001 ***
## bASV23523 0.292 0.001 ***
## bASV39070 0.291 0.001 ***
## bASV45052 0.289 0.001 ***
## bASV59440 0.288 0.001 ***
## bASV18705 0.286 0.001 ***
## bASV6733 0.284 0.001 ***
## bASV46293 0.284 0.001 ***
## bASV8490 0.281 0.001 ***
## bASV54663 0.280 0.001 ***
## bASV58795 0.278 0.001 ***
## bASV11009 0.277 0.001 ***
## bASV45399 0.275 0.001 ***
## bASV2342 0.275 0.001 ***
## bASV402 0.274 0.001 ***
## bASV3834 0.270 0.001 ***
## bASV14991 0.268 0.001 ***
## bASV24507 0.268 0.001 ***
## bASV9838 0.265 0.001 ***
## bASV24243 0.265 0.001 ***
## bASV26478 0.265 0.001 ***
## bASV3636 0.262 0.001 ***
## bASV13379 0.260 0.001 ***
## bASV20119 0.260 0.001 ***
## bASV23412 0.260 0.001 ***
## bASV11555 0.260 0.001 ***
## bASV45972 0.259 0.001 ***
## bASV40348 0.259 0.001 ***
## bASV36425 0.258 0.001 ***
## bASV17673 0.256 0.001 ***
## bASV43883 0.253 0.001 ***
## bASV49769 0.252 0.001 ***
## bASV17653 0.251 0.001 ***
## bASV14662 0.248 0.001 ***
## bASV26713 0.248 0.001 ***
## bASV42445 0.248 0.001 ***
## bASV17698 0.247 0.001 ***
## bASV53828 0.247 0.001 ***
## bASV39643 0.247 0.001 ***
## bASV49480 0.244 0.001 ***
## bASV63661 0.241 0.001 ***
## bASV39460 0.241 0.001 ***
## bASV2859 0.240 0.001 ***
## bASV27877 0.240 0.001 ***
## bASV30649 0.239 0.001 ***
## bASV4625 0.239 0.001 ***
## bASV39342 0.238 0.001 ***
## bASV31808 0.237 0.001 ***
## bASV25598 0.236 0.001 ***
## bASV54990 0.234 0.001 ***
## bASV52151 0.234 0.001 ***
## bASV53165 0.232 0.001 ***
## bASV12780 0.232 0.001 ***
## bASV12942 0.232 0.001 ***
## bASV34216 0.231 0.001 ***
## bASV28962 0.230 0.001 ***
## bASV59639 0.230 0.001 ***
## bASV31294 0.230 0.001 ***
## bASV13452 0.229 0.001 ***
## bASV43295 0.229 0.001 ***
## bASV31888 0.227 0.001 ***
## bASV27304 0.226 0.001 ***
## bASV48128 0.226 0.001 ***
## bASV23139 0.224 0.001 ***
## bASV30091 0.224 0.001 ***
## bASV29416 0.224 0.001 ***
## bASV59814 0.223 0.001 ***
## bASV22405 0.223 0.001 ***
## bASV32264 0.222 0.001 ***
## bASV24509 0.222 0.001 ***
## bASV4620 0.221 0.001 ***
## bASV23092 0.221 0.001 ***
## bASV16329 0.220 0.001 ***
## bASV5901 0.219 0.001 ***
## bASV61147 0.218 0.001 ***
## bASV5450 0.217 0.001 ***
## bASV65032 0.215 0.001 ***
## bASV29115 0.213 0.001 ***
## bASV10768 0.213 0.001 ***
## bASV40921 0.213 0.001 ***
## bASV54823 0.212 0.001 ***
## bASV64042 0.212 0.001 ***
## bASV17751 0.212 0.001 ***
## bASV35563 0.211 0.001 ***
## bASV8556 0.211 0.001 ***
## bASV55128 0.210 0.001 ***
## bASV25753 0.210 0.001 ***
## bASV5656 0.209 0.001 ***
## bASV2395 0.208 0.001 ***
## bASV13968 0.208 0.001 ***
## bASV18231 0.208 0.001 ***
## bASV46499 0.208 0.001 ***
## bASV1428 0.207 0.001 ***
## bASV36268 0.207 0.001 ***
## bASV20135 0.207 0.001 ***
## bASV9988 0.205 0.001 ***
## bASV32840 0.204 0.001 ***
## bASV59779 0.203 0.001 ***
## bASV25815 0.202 0.001 ***
## bASV7671 0.201 0.001 ***
## bASV37232 0.201 0.001 ***
## bASV3421 0.200 0.001 ***
## bASV34943 0.200 0.001 ***
## bASV43596 0.200 0.001 ***
## bASV14089 0.199 0.001 ***
## bASV60447 0.199 0.001 ***
## bASV28971 0.198 0.001 ***
## bASV37879 0.198 0.001 ***
## bASV41931 0.197 0.001 ***
## bASV11147 0.196 0.001 ***
## bASV62483 0.196 0.001 ***
## bASV51514 0.196 0.001 ***
## bASV57573 0.196 0.001 ***
## bASV6930 0.195 0.001 ***
## bASV658 0.194 0.001 ***
## bASV25222 0.194 0.001 ***
## bASV9837 0.193 0.001 ***
## bASV64397 0.193 0.001 ***
## bASV30059 0.192 0.001 ***
## bASV52753 0.192 0.001 ***
## bASV62445 0.191 0.001 ***
## bASV61349 0.191 0.001 ***
## bASV17801 0.191 0.001 ***
## bASV62569 0.190 0.001 ***
## bASV23490 0.190 0.001 ***
## bASV28161 0.189 0.001 ***
## bASV52774 0.188 0.001 ***
## bASV43230 0.188 0.001 ***
## bASV61342 0.187 0.001 ***
## bASV24940 0.186 0.001 ***
## bASV8959 0.186 0.001 ***
## bASV53790 0.186 0.001 ***
## bASV37371 0.185 0.001 ***
## bASV45412 0.185 0.001 ***
## bASV332 0.184 0.001 ***
## bASV30532 0.182 0.001 ***
## bASV55020 0.182 0.001 ***
## bASV24935 0.182 0.001 ***
## bASV50126 0.180 0.001 ***
## bASV12618 0.180 0.001 ***
## bASV53814 0.180 0.001 ***
## bASV10014 0.179 0.001 ***
## bASV15659 0.179 0.001 ***
## bASV2302 0.179 0.001 ***
## bASV27598 0.179 0.001 ***
## bASV19797 0.178 0.001 ***
## bASV3165 0.177 0.001 ***
## bASV31211 0.177 0.001 ***
## bASV46497 0.177 0.001 ***
## bASV3110 0.176 0.001 ***
## bASV46176 0.176 0.001 ***
## bASV35145 0.176 0.001 ***
## bASV48611 0.175 0.001 ***
## bASV59276 0.175 0.001 ***
## bASV57133 0.172 0.001 ***
## bASV43211 0.170 0.001 ***
## bASV24936 0.169 0.001 ***
## bASV3067 0.169 0.001 ***
## bASV18617 0.168 0.001 ***
## bASV23901 0.165 0.001 ***
## bASV16122 0.164 0.001 ***
## bASV61793 0.159 0.001 ***
## bASV55450 0.156 0.001 ***
## bASV18616 0.155 0.001 ***
## bASV38275 0.155 0.001 ***
## bASV21109 0.148 0.001 ***
## bASV7841 0.146 0.001 ***
## bASV33547 0.145 0.001 ***
## bASV23375 0.138 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results of indicator analysis (ASV, stat, and p-value) combined in a single .csv file using the summary above. Upload this file into R (provided on GitHub). Match indicator values table for each ASV with the relative abundance of the same ASV in the photic zone via the ASV count table. Data frames need to match.
indic_OA <- read.csv("OA_indic_16S.csv", header=T, row.names = NULL, check.names=F,fileEncoding="UTF-8-BOM") # Indicator ASVs file
colnames(indic_OA)[1] <- "OTU" # Change column names
indic.high = indic_OA[indic_OA[["OA"]]== "High", ] # Subset to new data frame with only high TA:DIC
indic.low = indic_OA[indic_OA[["OA"]]== "Low", ] # Subset to new df with only low TA:DIC
ps4 <- transform_sample_counts(ps_dsq, function(x) 100*x / sum(x) ) # Transform to relative abundance
x1 = speedyseq::psmelt(ps4) # Melt the data
OA.1 = x1 %>% # Group to the mean relative abundance at the ASV level
dplyr::group_by(OTU,alk_dic_oa,Phylum, Genus, Species) %>%
dplyr::summarise_at(.vars = c("Abundance"), .funs = mean)
OA.high = OA.1[OA.1[["alk_dic_oa"]]== "High", ] # Subset ASV relative abundance based on high TA:DIC
OA.low = OA.1[OA.1[["alk_dic_oa"]]== "Low", ] # Subset ASV based on low TA:DIC
df_h <- OA.high[OA.high$OTU %in% indic.high$OTU,] # Keep ASVs if they match indicator dataset
df_L <- OA.low[OA.low$OTU %in% indic.low$OTU,] # Keep ASVs if they match indicator dataset
#write.csv(df_h, "ASVs_16S_abund_high.csv",row.names=T) # Write .csv file
#write.csv(df_L, "ASVs_16S_abund_low.csv",row.names=T) # Write .csv file
The previous two files were used in combination with the indicator file to create a master .csv file that has ASV IDs, indicator values, and relative abundance for each specific 16S ASV in the photic zone. This file to be loaded into R (provided on GitHub).
indic_OA2 <- read.csv("OA_indic2_16S.csv", header = T, row.names = NULL, check.names = F,
fileEncoding = "UTF-8-BOM") # Upload file
p <- ggplot(indic_OA2, aes(x = stat, y = Abundance)) + geom_point(aes(fill = Phylum),
shape = 21, size = 5) + theme_bw() + theme(text = element_text(size = 14)) +
scale_y_continuous(limits = c(0, NA)) + facet_wrap(~OA, scales = "free") + scale_fill_brewer(palette = "Dark2")
p$data$Phylum <- factor(p$data$Phylum, levels = c("Actinobacteriota", "Bacteroidota",
"Crenarchaeota", "Cyanobacteria", "Marinimicrobia_(SAR406_clade)", "Proteobacteria",
"Others")) # Set order
p
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggpubfigs_0.0.1 geosphere_1.5-18
## [3] ranacapa_0.1.0 lmtest_0.9-40
## [5] zoo_1.8-12 sjstats_0.19.0
## [7] performance_0.11.0 indicspecies_1.7.14
## [9] DESeq2_1.40.2 SummarizedExperiment_1.30.2
## [11] Biobase_2.60.0 MatrixGenerics_1.12.3
## [13] matrixStats_1.3.0 GenomicRanges_1.52.1
## [15] GenomeInfoDb_1.36.4 IRanges_2.34.1
## [17] S4Vectors_0.38.2 BiocGenerics_0.46.0
## [19] mdatools_0.14.1 rcartocolor_2.1.1
## [21] treemap_2.4-4 ggpubr_0.6.0
## [23] coefplot_1.2.8 sjPlot_2.8.16
## [25] MASS_7.3-60 file2meco_0.7.1
## [27] microeco_1.7.1 ggridges_0.5.6
## [29] factoextra_1.0.7 microbiome_1.22.0
## [31] RColorBrewer_1.1-3 fantaxtic_0.2.0
## [33] reshape2_1.4.4 viridis_0.6.5
## [35] viridisLite_0.4.2 phyloseq_1.44.0
## [37] qiime2R_0.99.6 vegan_2.6-6.1
## [39] lattice_0.22-6 permute_0.9-7
## [41] lubridate_1.9.3 forcats_1.0.0
## [43] stringr_1.5.1 dplyr_1.1.4
## [45] purrr_1.0.2 readr_2.1.5
## [47] tidyr_1.3.1 tibble_3.2.1
## [49] ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.3.1 later_1.3.2 plotwidgets_0.5.1
## [4] bitops_1.0-7 datawizard_0.10.0 rpart_4.1.23
## [7] lifecycle_1.0.4 rstatix_0.7.2 insight_0.19.11
## [10] backports_1.5.0 magrittr_2.0.3 Hmisc_5.1-2
## [13] sass_0.4.9 rmarkdown_2.27 jquerylib_0.1.4
## [16] yaml_2.3.8 useful_1.2.6.1 httpuv_1.6.15
## [19] sp_2.1-4 ade4_1.7-22 abind_1.4-5
## [22] zlibbioc_1.46.0 Rtsne_0.17 RCurl_1.98-1.14
## [25] nnet_7.3-19 GenomeInfoDbData_1.2.10 ggrepel_0.9.5
## [28] codetools_0.2-20 DelayedArray_0.26.7 DT_0.33
## [31] ggtext_0.1.2 xml2_1.3.6 tidyselect_1.2.1
## [34] farver_2.1.2 ggeffects_1.6.0 base64enc_0.1-3
## [37] jsonlite_1.8.8 multtest_2.56.0 Formula_1.2-5
## [40] survival_3.6-4 iterators_1.0.14 foreach_1.5.2
## [43] tools_4.3.1 Rcpp_1.0.12 glue_1.7.0
## [46] gridExtra_2.3 xfun_0.44 mgcv_1.9-1
## [49] withr_3.0.0 formatR_1.14 fastmap_1.2.0
## [52] rhdf5filters_1.12.1 fansi_1.0.6 digest_0.6.35
## [55] truncnorm_1.0-9 timechange_0.3.0 R6_2.5.1
## [58] mime_0.12 colorspace_2.1-0 utf8_1.2.4
## [61] generics_0.1.3 data.table_1.15.4 htmlwidgets_1.6.4
## [64] S4Arrays_1.0.6 pkgconfig_2.0.3 gtable_0.3.5
## [67] zCompositions_1.5.0-3 XVector_0.40.0 htmltools_0.5.8.1
## [70] carData_3.0-5 biomformat_1.28.0 scales_1.3.0
## [73] knitr_1.45 rstudioapi_0.16.0 tzdb_0.4.0
## [76] checkmate_2.3.1 nlme_3.1-164 ggnested_0.1.0
## [79] cachem_1.1.0 rhdf5_2.44.0 sjlabelled_1.2.0
## [82] parallel_4.3.1 foreign_0.8-86 pillar_1.9.0
## [85] grid_4.3.1 vctrs_0.6.5 promises_1.3.0
## [88] car_3.1-2 xtable_1.8-4 cluster_2.1.6
## [91] speedyseq_0.5.3.9018 htmlTable_2.4.2 evaluate_0.23
## [94] locfit_1.5-9.9 cli_3.6.2 compiler_4.3.1
## [97] rlang_1.1.3 crayon_1.5.2 ggsignif_0.6.4
## [100] labeling_0.4.3 plyr_1.8.9 sjmisc_2.8.10
## [103] stringi_1.8.4 BiocParallel_1.34.2 gridBase_0.4-7
## [106] munsell_0.5.1 Biostrings_2.68.1 Matrix_1.5-0
## [109] hms_1.1.3 NADA_1.6-1.1 Rhdf5lib_1.22.1
## [112] shiny_1.8.1.1 highr_0.10 gridtext_0.1.5
## [115] igraph_2.0.3 broom_1.0.6 bslib_0.7.0
## [118] ape_5.8