Wax Apple Hyperspectral-Image Open Dataset
Materials and Data Preparation
In the study, 136 wax apple samples were collected from Meishan, Fengshan, Liouguei, and Jiadong. These samples were separated into slices (1034 slices in total) for HSIs data collection by the coaxial heterogeneous HSI system [1]. The raw hyperspectral image (HSI) is three-dimensional data (W,L,∧), where W is image width, L is image length, ∧ is the number of spectral bands, which has 1367 bands from 400 to 1700 nm.
Each slice’s HSI were calibrated with spatial calibration, white/dark light calibration, and Savitzky-Golay Filtering, then each HSIs cube was randomly cropped into size 20×20. The 3-D cubes dataset would be used to train 2D-CNN models, for FNN model the 3-D datasets were averaged to array-like datasets by their width and length.
On the other hand, each wax apple slice was squeezed to extract juice sample, and the ground truth Brix value (label) was measured by a commercial refractometer ATAGO PAL-1.
After the datasets were prepared, they were randomly sampled into “training”, “validation”, and “test” for modeling. The statistic info of each set is shown in below table. (we randomly discard some samples to make dataset balanced.)
Training set : 621 slices, validation set : 209 slices, test set : 204 slices.
Below is the file structure and code snippet for loading HIS data samples and corresponding labels.
├── Hyper
│ ├── 3d_data
│ │ └── 400_1700
│ │ ├── data_test.pkl
│ │ ├── data_train.pkl
│ │ ├── data_val.pkl
│ │ ├── label_test.pkl
│ │ ├── label_train.pkl
│ │ └── label_val.pkl
└─ └──
Convert Hyperspectral Data to Multispectral Data
In the coaxial heterogeneous HSI system, the VIS spectrometer has a spectral resolution of ~0.5 nm in the wavelength ranging from 400 to 1000 nm; the SWIR spectrometer has a spectral resolution of ~2.5 nm in the wavelength ranging from 900 to 1700 nm.
The hyperspectral data were converted to multispectral data, RM(x, y, λc) using following fomula:
where RM(x, y, λc) is the mean of the integrated intensity of the central band, λc; the central band λc ranges from 400 to 1700 nm with an interval of bandwidth, w.
A total of six sets of spectral data were converted from hyperspectral data according to the to the six bandwidths,w, of ± 2.5 nm, ± 5 nm, ± 7.5 nm, ± 10 nm, ± 12.5 nm, and ± 15 nm.
The code snippet to convert Hyperspectral image data to multispectral image data is shown below:
Use hyper2multi function to convert train, val, test dataset.
References
1.Yu-Hsiang Tsai, Yung-Jhe Yan, Yi-Sheng Li, Chao-Hsin Chang, Chi-Cho Huang, Tzung-Cheng Chen, Shiou-Gwo Lin, and Mang Ou-Yang*, “Development and verification of the coaxial heterogeneous hyperspectral imaging system,” Review of Scientific Instruments, Vol.93, pp.063105-1-17, Jun. 2022. DOI: 10.1109/I2MTC.2019.8826836
2.Chih-Jung Chen, Yung-Jhe Yan, Chi-Cho Huang, Jen-Tzung Chien, Chang-Ting Chu, Je-Wei Jang, Tzung-Cheng Chen, Shiou-Gwo Lin, Ruei-Siang Shih and Mang Ou-Yang*, “Sugariness Prediction of Syzygium samarangense using Convolutional Learning of Hyperspectral Images,” Scientific Reports, 12:2774, 17 Feb. 2022. DOI:10.1038/s41598-022-06679-6
3.Yung-Jhe Yan, Weng-Keong Wong, Chih-Jung Chen, Chi-Cho Huang, Jen‑Tzung Chien and Mang Ou-Yang*, “Hyperspectral signature-band extraction and learning: an example of sugar content prediction of Syzygium samarangense,” Scientific Reports, 13:15100, 12 Sep. 2023. DOI: 10.1038/s41598-022-06679-6
Acknowledgements
This work is supported by grants from Agricultural Research Institute, Council of Agriculture, Executive of Yuan, ROC, and by the National Science and Technology Council, Taiwan , and National Yang Ming Chiao Tung University. This work contributed by following authors: Chih-Jung Chen, Yung-Jhe Yan, Weng-Keong Wong, Chih-Jung Chen, Chi-Cho Huang, Jen-Tzung Chien, Chang-Ting Chu, Je-Wei Jang, Tzung-Cheng Chen, Shiou-Gwo Lin, Ruei-Siang Shih and Mang Ou-Yang.
For more details please refer to our github page:
https://github.com/EBILNYCU/Wax-Apple-Hyperspectral-Image-Open-Dataset/tree/main
Open data set download link:
https://drive.google.com/drive/folders/12J3CuofxMXng8Nd6omLaFPB71twuBMPG?usp=drive_link