I have 2 elevation datasets -- one is a low resolution dataset (~0.1 degree resolution), and another is a high resolution dataset (~0.001 degree resolution). Let's assume that the datasets look something like this (in reality they are much larger, so I will need to find a way to do this in a memory efficient manner, such as through chunking):
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
lr_elev = np.random.uniform(low=0,high=4000,size=(11,11))
hr_elev = np.random.uniform(low=0,high=4000,size=(1001,1001))
lr_lats = np.linspace(44.0,45.0,11)
lr_lons = np.linspace(-115.0,-114.0,11)
hr_lats = np.linspace(44.0,45.0,1001)
hr_lons = np.linspace(-115.0,-114.0,1001)
lr_ds = xr.Dataset(
data_vars=dict(
elevation=(['lat','lon'],lr_elev)
),
coords=dict(
lon=('lon',lr_lons),
lat=('lat',lr_lats)
)
)
hr_ds = xr.Dataset(
data_vars=dict(
elevation=(['lat','lon'],hr_elev)
),
coords=dict(
lon=('lon',hr_lons),
lat=('lat',hr_lats)
)
)
print(lr_ds)
print(hr_ds)
I know how to find the nearest elevation value from the low resolution dataset for each of the high resolution lat/lon pairs:
# ### Treat midpoint of HR grid cells as a point -- then find the nearest LR grid cell as if they are stations
nearest_lr_point = lr_ds.sel(lat=hr_ds.lat,lon=hr_ds.lon, method='nearest')
print(nearest_lr_point)
However, I haven't been able to figure out how I would instead return the index of the value instead of the actual value. The reason for this is that I would like to be able to match each high-res lat/lon pair with its corresponding low-resolution grid cell (for later calculations). Is there a way to return the index of the value instead for each lat/lon pair?
I am unable to convert this into a dataframe easily (since the real high resolution dataset is about 700GB in size)