Development of Europe-Wide Models for Particle Elemental Composition Using Supervised Linear Regression and Random Forest
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Development of Europe-Wide Models for Particle Elemental Composition Using Supervised Linear Regression and Random Forest. / Chen, Jie; De Hoogh, Kees; Gulliver, John; Hoffmann, Barbara; Hertel, Ole; Ketzel, Matthias; Weinmayr, Gudrun; Bauwelinck, Mariska; Van Donkelaar, Aaron; Hvidtfeldt, Ulla A.; Atkinson, Richard; Janssen, Nicole A.H.; Martin, Randall V.; Samoli, Evangelia; Andersen, Zorana J.; Oftedal, Bente M.; Stafoggia, Massimo; Bellander, Tom; Strak, Maciej; Wolf, Kathrin; Vienneau, Danielle; Brunekreef, Bert; Hoek, Gerard.
In: Environmental Science and Technology, Vol. 54, No. 24, 2020, p. 15698-15709.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Development of Europe-Wide Models for Particle Elemental Composition Using Supervised Linear Regression and Random Forest
AU - Chen, Jie
AU - De Hoogh, Kees
AU - Gulliver, John
AU - Hoffmann, Barbara
AU - Hertel, Ole
AU - Ketzel, Matthias
AU - Weinmayr, Gudrun
AU - Bauwelinck, Mariska
AU - Van Donkelaar, Aaron
AU - Hvidtfeldt, Ulla A.
AU - Atkinson, Richard
AU - Janssen, Nicole A.H.
AU - Martin, Randall V.
AU - Samoli, Evangelia
AU - Andersen, Zorana J.
AU - Oftedal, Bente M.
AU - Stafoggia, Massimo
AU - Bellander, Tom
AU - Strak, Maciej
AU - Wolf, Kathrin
AU - Vienneau, Danielle
AU - Brunekreef, Bert
AU - Hoek, Gerard
N1 - Publisher Copyright: © 2020 American Chemical Society.
PY - 2020
Y1 - 2020
N2 - We developed Europe-wide models of long-term exposure to eight elements (copper, iron, potassium, nickel, sulfur, silicon, vanadium, and zinc) in particulate matter with diameter <2.5 μm (PM2.5) using standardized measurements for one-year periods between October 2008 and April 2011 in 19 study areas across Europe, with supervised linear regression (SLR) and random forest (RF) algorithms. Potential predictor variables were obtained from satellites, chemical transport models, land-use, traffic, and industrial point source databases to represent different sources. Overall model performance across Europe was moderate to good for all elements with hold-out-validation R-squared ranging from 0.41 to 0.90. RF consistently outperformed SLR. Models explained within-area variation much less than the overall variation, with similar performance for RF and SLR. Maps proved a useful additional model evaluation tool. Models differed substantially between elements regarding major predictor variables, broadly reflecting known sources. Agreement between the two algorithm predictions was generally high at the overall European level and varied substantially at the national level. Applying the two models in epidemiological studies could lead to different associations with health. If both between- and within-area exposure variability are exploited, RF may be preferred. If only within-area variability is used, both methods should be interpreted equally.
AB - We developed Europe-wide models of long-term exposure to eight elements (copper, iron, potassium, nickel, sulfur, silicon, vanadium, and zinc) in particulate matter with diameter <2.5 μm (PM2.5) using standardized measurements for one-year periods between October 2008 and April 2011 in 19 study areas across Europe, with supervised linear regression (SLR) and random forest (RF) algorithms. Potential predictor variables were obtained from satellites, chemical transport models, land-use, traffic, and industrial point source databases to represent different sources. Overall model performance across Europe was moderate to good for all elements with hold-out-validation R-squared ranging from 0.41 to 0.90. RF consistently outperformed SLR. Models explained within-area variation much less than the overall variation, with similar performance for RF and SLR. Maps proved a useful additional model evaluation tool. Models differed substantially between elements regarding major predictor variables, broadly reflecting known sources. Agreement between the two algorithm predictions was generally high at the overall European level and varied substantially at the national level. Applying the two models in epidemiological studies could lead to different associations with health. If both between- and within-area exposure variability are exploited, RF may be preferred. If only within-area variability is used, both methods should be interpreted equally.
U2 - 10.1021/acs.est.0c06595
DO - 10.1021/acs.est.0c06595
M3 - Journal article
C2 - 33237771
AN - SCOPUS:85097473908
VL - 54
SP - 15698
EP - 15709
JO - Environmental Science & Technology
JF - Environmental Science & Technology
SN - 0013-936X
IS - 24
ER -
ID: 269668847