I thought it was probably best to share this text database with others mostly to save people the need to do all the text parsing themselves. I've converted it into JSON-LD (http://json-ld.org/) which is a Javascript Object Notation (JSON) file with some extra information to describe what the data are. I even had some helpful input from one of the authors of the JSON-LD specification to make sure my output was valid. You can find it via https://github.com/slowe/apod/ (the raw version of the file is around 14 MB). Here is an example of what an individual APOD entry looks like:
Code: Select all
{
"id": "http://apod.nasa.gov/apod/ap140313.html",
"date": "2014-03-13",
"title": "Messier 63: The Sunflower Galaxy",
"image": "http://apod.nasa.gov/apod/image/1403/M63_PS1V10snyder900.jpg",
"thumb": "http://apod.nasa.gov/apod/calendar/S_140313.jpg",
"text": "A bright spiral galaxy of the northern sky, <a href=\"http://messier.seds.org/m/m063.html\">Messier 63</a> is about 25 million light-years distant in the loyal constellation <a href=\"http://en.wikipedia.org/wiki/Canes_Venatici\">Canes Venatici</a>. Also cataloged as NGC 5055, the majestic <a href=\"ap100109.html\">island universe</a> is nearly 100,000 light-years across. That's about the size of our own <a href=\"ap080104.html\">Milky Way</a> Galaxy. Known by the popular moniker, The Sunflower Galaxy, M63 sports a bright yellowish core in <a href=\"http://billsnyderastrophotography.com/?page_id=4163\">this sharp, colorful galaxy portrait.</a> Its sweeping blue spiral arms are streaked with cosmic dust lanes and dotted with pink star forming regions. A dominant member of a known <a href=\"http://www.atlasoftheuniverse.com/galgrps/m101.html\">galaxy group</a>, M63 has faint, extended features that could be the result of gravitational <a href=\"http://burro.cwru.edu/JavaLab/GalCrashWeb/\">interactions</a> with nearby galaxies. In fact, M63 <a href=\"http://coolcosmos.ipac.caltech.edu/cosmic_classroom/ multiwavelength_astronomy/multiwavelength_astronomy/\">shines across</a> the electromagnetic spectrum and is thought to have <a href=\"http://arxiv.org/abs/astro-ph/0701125\">undergone</a> bursts of intense <a href=\"http://cass.ucsd.edu/public/tutorial/Starbursts.html\">star formation</a>.",
"credit": "<a href=\"http://billsnyderastrophotography.com/?page_id=2\">Bill Snyder</a> (at <a href=\"http://www.sierra-remote.com/index.php\">Sierra Remote Observatories</a>)",
"objects": [
{
"name": "M63",
"ra": "198.9555375",
"dec": "42.0292889",
"category": [
"5.3.2.4"
]
},
{
"name": "Sunflower Galaxy",
"ra": "198.9555375",
"dec": "42.0292889",
"category": [
"5.3.2.4"
]
},
{
"name": "NGC 5055",
"ra": "198.9555375",
"dec": "42.0292889",
"category": [
"5.3.2.4"
]
}
]
}
Code: Select all
import json
json_data = open('apod.json')
data = json.load(json_data)
Clear skies,
Stuart