Loading NCDC data into R

 

The National Climatic Data Center (NCDC) is a great source of meteorological and climate data in the US. There’s a huge need for the datasets like their Local Climatological Data reports: long-term, high-resolution data gathered simultaneously at many locations.

I think overall their data are well archived and consistently reported. However, there are a couple shortcomings that limit the use of their data.

First, the website that mediates access to LCD reports makes it time-consuming to download them. This is especially a problem if you want monthly reports from multiple years or stations, since there’s apparently no way to download more than a single station-month of data at a time.

A second obstacle to usage is that their LCD reports aren’t formatted to be machine readable. Annual reports (and monthly reports older than ~15 years) can only be downloaded as pdfs, requiring an additional step of pdf parsing before the text can be worked with. (Changes caused during parsing derailed an earlier attempt I made at digitizing the annual summary LCDs.) Monthly reports usually have the additional option of comma delimited text files, but even these are organized sub-optimally.

It seems genuinely challenging to offer data that is useful for analysts (whose methods and demands are constantly changing), while also satisfying the need to present data consistently from year to year. On the whole, I think NCDC does a good job of reconciling these potentially competing goals.

After admiring NCDC’s data for a while, I finally had occasion to write some R code that reads monthly Local Climatological Data reports and extracts some simple summary data. I think this code could easily be expanded to report any of the data in the LCDs.

 

Read more of this post