NASA Open API for APOD

Introductions, Rules, Announcements, and Feedback
User avatar
geckzilla
Ocular Digitator
Posts: 9180
Joined: Wed Sep 12, 2007 12:42 pm
Location: Modesto, CA

Re: NASA Open API for APOD

Post by geckzilla » Fri Dec 13, 2019 1:17 am

just my two cents, but it's more important to actually have some classes and ids defined within the html than it is to have the css file itself.
Just call me "geck" because "zilla" is like a last name.

User avatar
RJN
Baffled Boffin
Posts: 1675
Joined: Sat Jul 24, 2004 1:58 pm
Location: Michigan Tech

Re: NASA Open API for APOD

Post by RJN » Sat Dec 14, 2019 7:21 pm

geckzilla wrote: Fri Dec 13, 2019 1:17 am just my two cents, but it's more important to actually have some classes and ids defined within the html than it is to have the css file itself.
Thank you. But why? Is it so that codes that ingest the HTML will better be able to figure out details of what information they are ingesting, like copyright information?

User avatar
geckzilla
Ocular Digitator
Posts: 9180
Joined: Wed Sep 12, 2007 12:42 pm
Location: Modesto, CA

Re: NASA Open API for APOD

Post by geckzilla » Sun Dec 15, 2019 6:31 am

RJN wrote: Sat Dec 14, 2019 7:21 pm
geckzilla wrote: Fri Dec 13, 2019 1:17 am just my two cents, but it's more important to actually have some classes and ids defined within the html than it is to have the css file itself.
Thank you. But why? Is it so that codes that ingest the HTML will better be able to figure out details of what information they are ingesting, like copyright information?
Yes. If you've ever worked with databases or even a spreadsheet, in order to select a particular item in the data, it has to have *some* form of id. It's like that with HTML. The way to disambiguate every item within the page is to assign it an id. The same way CSS uses the id to select that item, so too can anyone who wants to parse the data select it based on the same id.
Just call me "geck" because "zilla" is like a last name.

PawelPleskaczynski
Asternaut
Posts: 9
Joined: Fri Nov 22, 2019 5:35 pm

Re: NASA Open API for APOD

Post by PawelPleskaczynski » Sun Dec 15, 2019 11:20 am

Sorry, I've been offline for a while, but I see that it's going in a good direction. I see that your intention is to make the website look the same (so straight from the 90s) but change the core code. In previous posts I suggested to rewrite everything from scratch, use a database and modern development tools. But this approach means that the website wouldn't work on older devices and browsers and existing web scraping tools, like official APOD API, or my own, would stop working. Your approach lets the web scrapers to work normally (I'll test if my APOD API would work on this test site) and it'll work on older browsers to some extent. Also please, really, please add classes or ids to elements on the website. Here's my suggestion; the APOD picture/video/something else would always have ID "apod-content", and other elements would always have ID describing what they are, eg. the title would have "title" ID and description field would have "copyright" ID. That way it'd be much, much easier to scrape this website. For now, to choose, for example, the copyright field, I have to do something like this: `copyright = body('center').eq(1).text();` which isn't elegant at all and it's prone to errors if website's layout changes. If elements had proper ID, this process would be as easy as something like this: `copyright = body('copyright').text();`.

videotizer
Ensign
Posts: 21
Joined: Thu Nov 21, 2019 10:24 pm

Re: NASA Open API for APOD

Post by videotizer » Sun Dec 15, 2019 12:02 pm

@geckzilla @PawelPleskaczynski

Did you take a look at my previous post? it covers all the points regarding the use of semantic elements, ids, and classes where applicable. I also created a revised home page which you can see here https://gist.github.com/videotizer/0493 ... e257678724.

PawelPleskaczynski
Asternaut
Posts: 9
Joined: Fri Nov 22, 2019 5:35 pm

Re: NASA Open API for APOD

Post by PawelPleskaczynski » Sun Dec 15, 2019 2:05 pm

@videotizer

Sorry, I didn't notice your post.. But yes, I fully agree with your points, and the website's code you provided looks good.

videotizer
Ensign
Posts: 21
Joined: Thu Nov 21, 2019 10:24 pm

Re: NASA Open API for APOD

Post by videotizer » Sun Dec 15, 2019 3:00 pm

@PawelPleskaczynski

No worries. Glad we agree.

User avatar
geckzilla
Ocular Digitator
Posts: 9180
Joined: Wed Sep 12, 2007 12:42 pm
Location: Modesto, CA

Re: NASA Open API for APOD

Post by geckzilla » Sun Dec 15, 2019 7:37 pm

videotizer wrote: Sun Dec 15, 2019 12:02 pm @geckzilla @PawelPleskaczynski

Did you take a look at my previous post? it covers all the points regarding the use of semantic elements, ids, and classes where applicable. I also created a revised home page which you can see here https://gist.github.com/videotizer/0493 ... e257678724.
yes, I saw it, and I wish you the same luck that I wish all people on this same endeavor. maybe you will be the lucky one.
Just call me "geck" because "zilla" is like a last name.

User avatar
RJN
Baffled Boffin
Posts: 1675
Joined: Sat Jul 24, 2004 1:58 pm
Location: Michigan Tech

Re: NASA Open API for APOD

Post by RJN » Mon Dec 16, 2019 4:31 pm

Another strange idea. Would it be possible to write a code (say, in Python) that ingests that current APOD HTML and outputs an HTML file that has all the useful semantic tags. Alternatively the output could be an XML file. Then Jerry and I could continue to write and edit (nearly) the same APOD HTML file but have automatically created a more ingestion-friendly file that APIs and the like can then utilize. The two APOD HTML files together would allow both ingesters -- one hard-coded to the old HTML, and the other newly more easily coded ingesting programs -- to work simultaneously. We at APOD central would alert mirror sites and world language sites that ingest the old HTML file that this file will soon go away, and so they should switch over to working with the new HTML file. However, they will have a grace period of a year or so where both APOD HTML files will exist simultaneously so that everything works for everyone.

Later, after the transition, we at APOD central can write/edit ONLY an XML file which will be ingested by a program (say, in Python) that generates the HTML file that is then picked up by browsers, mirror sites, smartphone apps, etc. Thoughts?

User avatar
Chris Peterson
Abominable Snowman
Posts: 18617
Joined: Wed Jan 31, 2007 11:13 pm
Location: Guffey, Colorado, USA

Re: NASA Open API for APOD

Post by Chris Peterson » Mon Dec 16, 2019 6:20 pm

RJN wrote: Mon Dec 16, 2019 4:31 pm Another strange idea. Would it be possible to write a code (say, in Python) that ingests that current APOD HTML and outputs an HTML file that has all the useful semantic tags. Alternatively the output could be an XML file. Then Jerry and I could continue to write and edit (nearly) the same APOD HTML file but have automatically created a more ingestion-friendly file that APIs and the like can then utilize. The two APOD HTML files together would allow both ingesters -- one hard-coded to the old HTML, and the other newly more easily coded ingesting programs -- to work simultaneously. We at APOD central would alert mirror sites and world language sites that ingest the old HTML file that this file will soon go away, and so they should switch over to working with the new HTML file. However, they will have a grace period of a year or so where both APOD HTML files will exist simultaneously so that everything works for everyone.

Later, after the transition, we at APOD central can write/edit ONLY an XML file which will be ingested by a program (say, in Python) that generates the HTML file that is then picked up by browsers, mirror sites, smartphone apps, etc. Thoughts?
If someone is going to sit down and do a bit of serious coding, I think the clean solution is to provide a tool where you or Jerry simply fill in a bunch of fields (could be a standalone program or one that runs on a web server) and outputs the XML file you're talking about. That XML file is the source document for an APOD. Then all you need is a simple renderer that reads that in and outputs HTML. Much simpler than going from HTML to XML and back to HTML again. It's likely to take longer to figure out what fields are necessary (although the API work has already done most of this) than to actually produce the code.
Chris

*****************************************
Chris L Peterson
Cloudbait Observatory
https://www.cloudbait.com

videotizer
Ensign
Posts: 21
Joined: Thu Nov 21, 2019 10:24 pm

Re: NASA Open API for APOD

Post by videotizer » Mon Dec 16, 2019 6:50 pm

Chris Peterson wrote: Mon Dec 16, 2019 6:20 pm I think the clean solution is to provide a tool where you or Jerry simply fill in a bunch of fields (could be a standalone program or one that runs on a web server)
If that's the case I would suggest using a proven Content Management System (CMS) such as WordPress. It has builtin features that cover most of what's needed, such as:
  • Posts management and editing with possibility for multiple authors
  • Taxonomy management for tagging and categorizing
  • Archive
  • User management
  • RSS feed
  • API
BTW, I noticed that some mirror/world language sites are already using WordPress, such as: If you're interested I'll be more than happy to setup a demo server.

User avatar
Chris Peterson
Abominable Snowman
Posts: 18617
Joined: Wed Jan 31, 2007 11:13 pm
Location: Guffey, Colorado, USA

Re: NASA Open API for APOD

Post by Chris Peterson » Mon Dec 16, 2019 7:10 pm

videotizer wrote: Mon Dec 16, 2019 6:50 pm
Chris Peterson wrote: Mon Dec 16, 2019 6:20 pm I think the clean solution is to provide a tool where you or Jerry simply fill in a bunch of fields (could be a standalone program or one that runs on a web server)
If that's the case I would suggest using a proven Content Management System (CMS) such as WordPress. It has builtin features that cover most of what's needed, such as:
  • Posts management and editing with possibility for multiple authors
  • Taxonomy management for tagging and categorizing
  • Archive
  • User management
  • RSS feed
  • API
BTW, I noticed that some mirror/world language sites are already using WordPress, such as: If you're interested I'll be more than happy to setup a demo server.
I think that using WP is massive overkill here (and it's based on SQL databases, not XML. For something as simple as APOD, XML makes a lot more sense and it's a lot more portable. I run a number of WP based sites, and I'll say that there's a fair bit of overhead involved with keeping them up-to-date, and some updates break things. I'm sure that the editors don't want to deal with anything like that. It would border on trivial to write a form processor that extracted the fields to an XML file, and also to write a parser that converted the XML file to HTML. And these would be very stable, not dependent upon third party software like WP, and require essentially no maintenance.
Chris

*****************************************
Chris L Peterson
Cloudbait Observatory
https://www.cloudbait.com

videotizer
Ensign
Posts: 21
Joined: Thu Nov 21, 2019 10:24 pm

Re: NASA Open API for APOD

Post by videotizer » Mon Dec 16, 2019 7:54 pm

Chris Peterson wrote: Mon Dec 16, 2019 7:10 pm I think that using WP is massive overkill here (and it's based on SQL databases, not XML. For something as simple as APOD, XML makes a lot more sense and it's a lot more portable. I run a number of WP based sites, and I'll say that there's a fair bit of overhead involved with keeping them up-to-date, and some updates break things. I'm sure that the editors don't want to deal with anything like that. It would border on trivial to write a form processor that extracted the fields to an XML file, and also to write a parser that converted the XML file to HTML. And these would be very stable, not dependent upon third party software like WP, and require essentially no maintenance.
Hi Chris,

I agree that it has it's cons, just like everything else, but if you think long term and think about the issues related to maintaining the API among other things, then using a CMS would become more feasible then trying to come up with hybrid solutions that would not provide a proper centralized workflow - we don't need to reinvent the wheel. There's always something that needs to be maintained/updated/upgraded/etc..., even APOD in it's current HTML only form requires that. The question is what's the most efficient way to maintain and manage a particular website, and in the case of APOD, I believe it has grown large enough to require a proper CMS to achieve that task.

Perhaps if @RJN would explain the current workflow for posting the daily picture, it would become easier to compare different proposed solutions.

PawelPleskaczynski
Asternaut
Posts: 9
Joined: Fri Nov 22, 2019 5:35 pm

Re: NASA Open API for APOD

Post by PawelPleskaczynski » Sun Dec 29, 2019 4:00 pm

I don't really like where this is going. In my opinion, website shouldn't be modified, other than maybe upgrading to HTML5
(though I feel that's not very important to do, because the website would stop working on older browsers) and adding IDs to some sections, like title or description. If the current workflow of adding new APOD is convenient for admins, it makes no sense to change anything. Also, the website works pretty well with current APIs, including the official one (though official one is poorly maintained), changing the layout and using a CMS or other system would break them and they would need to be updated.

User avatar
RJN
Baffled Boffin
Posts: 1675
Joined: Sat Jul 24, 2004 1:58 pm
Location: Michigan Tech

Re: NASA Open API for APOD

Post by RJN » Mon Dec 30, 2019 3:04 am

PawelPleskaczynski wrote: Sun Dec 29, 2019 4:00 pm ... website shouldn't be modified, other than maybe upgrading to HTML5 ... and adding IDs to some sections, like title or description.
Wow, that's pretty close to my view! In fact, a few weeks ago, I created a minimal-change, HTML5-compatible version of an APOD page. It can be found here: https://apod.nasa.gov/apod/fap/pagetest5.html . I have been thinking of adding a copyright ID tag, but as usual I worry about a domino effect to some of APOD's mirror sites.

When editing new APODs this past week, I remembered again why we edit the main HTML each day and not rely on some form or XML meta-version. It is because there are, many times, unusual changes, such as rollover images, videos, links to future lectures, and possible other things that are easily coded in right in the HTML but not just would not be easily incorporated into an XLM-type meta-page.

- RJN

User avatar
geckzilla
Ocular Digitator
Posts: 9180
Joined: Wed Sep 12, 2007 12:42 pm
Location: Modesto, CA

Re: NASA Open API for APOD

Post by geckzilla » Tue Dec 31, 2019 6:49 pm

Using templating or a CMS system doesn't necessarily mean you can't still do that, though. A widely distributed CMS requires constant updating, though, and that's what would get you in the end. That, or you'd get hacked eventually if you failed to update.

I'm just at a loss as to why even the most simple changes such as adding IDs and a css page can't just as easily be dealt with by the mirror sites. APOD literally breaks itself on a regular basis by human error, and everyone deals with it just fine.
Just call me "geck" because "zilla" is like a last name.

Gimmeslack12
Asternaut
Posts: 6
Joined: Mon Jan 31, 2011 4:58 am

Re: NASA Open API for APOD

Post by Gimmeslack12 » Tue Feb 11, 2020 4:08 am

Overall the API runs great and the options for it are generally good, I wish it had a search functionality to it, or the ability to query multiple, non-sequential dates. I also wish the concept_tags option was enabled to provide more categorical descriptions of the APOD's.

I have also discovered that you can get thumbnails of the APOD's by visiting the link: https://apod.nasa.gov/apod/calendar/S_<DATE>.jpg where the <DATE> is represented in the format: `031106` -> `YYMMDD`. This is the same date format as APOD's are represented as. Example:

Code: Select all

APOD Link
https://apod.nasa.gov/apod/ap031106.html

Thumbnail
https://apod.nasa.gov/apod/calendar/S_031106.jpg
For the past 2 years I've been using the APOD API to power my Chrome/Firefox Extension called "APOD By the Trav". It adds APOD as your new tab page with a number of (what I think) are cool features. I am very excited to share it wherever I can but don't want to be too spammy about it.

Chrome - https://chrome.google.com/webstore/deta ... pdcmlfjcdj
Firefox - https://addons.mozilla.org/en-US/firefo ... -the-trav/
MS Edge - https://microsoftedge.microsoft.com/add ... ngnlfipjdh

I am still pretty active in developing it and any thoughts or feature ideas anyone has I'm more than happy to hear about.

Gimmeslack12
Asternaut
Posts: 6
Joined: Mon Jan 31, 2011 4:58 am

Re: NASA Open API for APOD

Post by Gimmeslack12 » Tue Feb 11, 2020 4:44 am

Ok, I just realized the conversation that was happening in this thread after I posted my post above ^^^ about my extension (and that I wasn't really adding to the topic at hand). So I went and read through everything to understand what's going on.

All I'll say in regards to breaking older browsers is that if server side rendering from the API was supported then you could still output a minimally upgraded, HTML5 compliant, APOD site yet have the power of modern development tools. I didn't catch exactly how APOD is updated each day (maybe it's an old HTML template that's copied each day?) but that cut and paste approach could be deprecated with the use of a simple admin page form for adding the date/description/hdurl/url/etc. that uploads to the API database. I could even see the ability to line up a week or month of APOD's in advance and have that switch over at midnight EST (or whenever!).

Whatever the case, I'm a willing front end developer that is at your service.

User avatar
RJN
Baffled Boffin
Posts: 1675
Joined: Sat Jul 24, 2004 1:58 pm
Location: Michigan Tech

Re: NASA Open API for APOD

Post by RJN » Tue Feb 11, 2020 5:28 pm

Gimmeslack12 wrote: Tue Feb 11, 2020 4:44 am Whatever the case, I'm a willing front end developer that is at your service.
Thanks! Your offer is appreciated. Things may move on this front this summer.

Alexei
Asternaut
Posts: 1
Joined: Thu Aug 13, 2020 6:12 pm

Re: NASA Open API for APOD

Post by Alexei » Thu Aug 13, 2020 7:14 pm

This API has a flaw, it doesn't provide keyword/tags. It is crucial for social network. 7 years ago I wrote custom parser for APOD pages, which translate APOD page to post in group https://vk.com/nasa_apod of russian social network.

Example of this feature:
For example https://vk.com/nasa_apod/cometneowise will return post tagged with #cometneowise
https://vk.com/nasa_apod/hubble will return post tagged with #hubble

Also I can say that manual editing of html pages is bad idea, may be XML 2 HTML translation will be a better solution
For example https://apod.nasa.gov/apod/ap090225.html - missed <html><hea in start
3 times I saw not existing symbol " in meta keywords ( in https://apod.nasa.gov/apod/ap080315.html https://apod.nasa.gov/apod/ap080427.html https://apod.nasa.gov/apod/ap130719.html )
About 2 times I saw new APOD image with a yesterday date

tanialuky
Asternaut
Posts: 1
Joined: Thu Nov 05, 2020 10:32 am

Re: NASA Open API for APOD

Post by tanialuky » Sun Feb 07, 2021 4:28 am

ok

jgosses
Asternaut
Posts: 1
Joined: Tue Mar 08, 2022 3:38 am

Re: NASA Open API for APOD

Post by jgosses » Thu Mar 10, 2022 3:26 pm

Hello, I used to worked as a NASA contractor and one of the things I did was maintain the APOD API on api.nasa.gov.

I only recently learned of this comment board. I thought it might be useful to provide some context about the APOD API, why it was created, how it is maintained, and how it relates to NASA's organizations structure.

API.NASA.GOV, DATA.NASA.GOV, and CODE.NASA.GOV are all run under NASA Office of Chief Information Officer by a group called internally "Open Innovation". Be aware there is another group until NASA Chief Technology Officer called "Open Innovation" that does public contests.

All federal agencies are required to have DATA.<insert-agency-name>.GOV and CODE.<insert-agency-name>.GOV sites that hold metadata that describes all of their agencies' public data and open source code respectively. Getting to and keeping it at a state of "everything" is hard as you might imagine.

You can think of API.<insert-agency-name>.GOV as an extension of data.nasa.gov. Technically, it isn't required to exist but many agencies have them as seen on https://api.data.gov/

NASA uses api.nasa.gov as a place for beginner friendly APIs. If all APIs were attempted to be put there (1) it would always end up being a partial out-of-date list (2) complicated APIs not of used to most people would swamp the beginner friendly ones. This is basically what happens when students try to find things on data.nasa.gov. The signal to noise is low.

Most of the APIs on api.nasa.gov are not created or run by Open Innovation group. Instead API.nasa.gov basically acts as a pass through service. It provides the public face, the key management, the DDOS attack protection, etc. All this is actually done by GSA (General Services Administration) for all the API.<insert-agency-name>.GOV as a free service, which is pretty awesome.

The APOD API is one of the few created and maintained by Open Innovation. It was created entirely separately from the APOD website itself before my time there.

- The API is at api.nasa.gov.
- The github repository is https://github.com/nasa/apod-api
- The API is currently deployed on AWS Elastic Beanstalk (used to be heroku but that became harder to get paid due to government procurement stuff)
- The API can be deployed by yourself on whatever cloud service you like. Several people do this actually.

The APOD API is the most used API and dataset at NASA.

I don't have access to the current numbers, but it wasn't uncommon to have 10,000s hits on that API a month and several hundred individual developers keys be used a month. If you search on GitHub for the URL of the API, you'll get 14,000 results. https://github.com/search?q=api.nasa.go ... &type=code

The long-time stability of the API interface is a key part of this popularity.

I should note that Open Innovation at time of this writing is currently down to one person plus a little extra help of others. This impacts the degree to which pull requests and issues can be responded to quickly.

I'm no longer working as a NASA contractor or have direct edit rights to the repo. However, I can provide come context. Any discussions about bugs or enhancement should be done in the repo issues though you can cross-post here for visibility if you like.

mewhoRob
Asternaut
Posts: 1
Joined: Sun Mar 10, 2024 10:09 am

Re: NASA Open API for APOD

Post by mewhoRob » Sun Mar 10, 2024 1:12 pm

hello all,
Just found out about this APOD forum. It's exciting to see the discussions about maintaining the APOD site while improving its API. I'm curious if there have been any advancements in adding the "credits" dataset to NASA APOD API?

Recently I was able to make a Star Trek themed webpage by using NASA APOD API. I paired up Trek's UI with APOD's pretty images and Prof Nemiroff's explanation. Unfortunately, the credits info is absent from the API. I suppose I will attempt to parse that from APOD's web pages instead.
https://mewho.com/apod

I looked at the source code on a few APOD webpages -- in order to add an id attribute for the credits, a new tag would be necessary because of the current html code setup. Inserting this new tag into the body might potentially cause issues with some parsers out there. How about adding a custom meta tag in the <head> section instead? Though one downside to this approach is that the same credit content would be maintained in two places (html text & meta tag) :-/

tgx
Asternaut
Posts: 4
Joined: Tue Nov 19, 2024 4:23 am

Re: NASA Open API for APOD

Post by tgx » Fri Dec 06, 2024 12:34 am

mewhoRob wrote: Sun Mar 10, 2024 1:12 pm hello all,
Just found out about this APOD forum. It's exciting to see the discussions about maintaining the APOD site while improving its API. I'm curious if there have been any advancements in adding the "credits" dataset to NASA APOD API?

Recently I was able to make a Star Trek themed webpage by using NASA APOD API. I paired up Trek's UI with APOD's pretty images and Prof Nemiroff's explanation. Unfortunately, the credits info is absent from the API. I suppose I will attempt to parse that from APOD's web pages instead.
https://mewho.com/apod

I looked at the source code on a few APOD webpages -- in order to add an id attribute for the credits, a new tag would be necessary because of the current html code setup. Inserting this new tag into the body might potentially cause issues with some parsers out there. How about adding a custom meta tag in the <head> section instead? Though one downside to this approach is that the same credit content would be maintained in two places (html text & meta tag) :-/
I ran into the same issue when building https://apod.akatgx.link. Initially I used the NASA's APOD API but ended up creating my own typescript client side parser instead as a workaround. It fetches APOD data directly from apod.nasa.gov through a CORS proxy (a CORS proxy is necessary due to the CORS policy on the main APOD site, which prevents direct fetching from the browser). I set up a simple proxy using a Cloudflare worker, so feel free to use it if your site doesn't experience heavy traffic. I've took the parsing techniques used on NASA's APOD API and improved them for better parsing + adding parsing for credits.

Here's a translated Javascript version that you could easily copy paste and use directly:

Code: Select all

const urlBase = "https://apod.nasa.gov/apod/"
const corsProxy = "https://apodcors.tiggerx04.workers.dev/?apod="

// Cache manager class
class APODCache {
  constructor() {
    this.cache = new Map()
    this.enabled =
      typeof window !== "undefined" && window.localStorage !== undefined
    this.cacheKey = "apod_cache"
    this.loadCache()
  }

  loadCache() {
    if (!this.enabled) return
    try {
      const cached = localStorage.getItem(this.cacheKey)
      if (cached) {
        const parsed = JSON.parse(cached)
        this.cache = new Map(Object.entries(parsed))
      }
    } catch (e) {
      console.warn("Failed to load APOD cache:", e)
    }
  }

  saveCache() {
    if (!this.enabled) return
    try {
      const obj = Object.fromEntries(this.cache)
      localStorage.setItem(this.cacheKey, JSON.stringify(obj))
    } catch (e) {
      console.warn("Failed to save APOD cache:", e)
    }
  }

  get(key) {
    return this.cache.get(key)
  }

  set(key, value) {
    this.cache.set(key, value)
    this.saveCache()
  }

  has(key) {
    return this.cache.has(key)
  }
}

const cache = new APODCache()

// Format a date into APOD's required format (YYMMDD)
function formatDate(dt) {
  const year = dt
    .getFullYear()
    .toString()
    .slice(-2)
  const month = (dt.getMonth() + 1).toString().padStart(2, "0")
  const day = dt
    .getDate()
    .toString()
    .padStart(2, "0")
  return year + month + day
}

// Parse the HTML of an APOD page to extract metadata.
async function parseAPODPage(url, date) {
  const response = await fetch(`${corsProxy}${url}`)
  if (response.status === 404) return null

  const text = await response.text()
  const parser = new DOMParser()
  const doc = parser.parseFromString(text, "text/html")

  const props = {}
  let mediaType = "image"
  let mediaUrl = ""
  let hdUrl = ""

  const img = doc.querySelector("img")
  const iframe = doc.querySelector("iframe")

  if (img) {
    const imgParentAnchor = img.closest("a")
    mediaUrl = img.getAttribute("src") || ""
    if (!mediaUrl.startsWith("http")) {
      mediaUrl = urlBase + mediaUrl
    }

    if (
      imgParentAnchor &&
      imgParentAnchor.getAttribute("href")?.startsWith("image/")
    ) {
      hdUrl = urlBase + imgParentAnchor.getAttribute("href")
    } else {
      hdUrl = mediaUrl
    }
  } else if (iframe) {
    mediaType = "video"
    mediaUrl = iframe.src
  } else {
    mediaType = "other"
  }

  const title = extractTitle(doc)
  const explanation = extractExplanation(doc)
  const { credits, copyright } = extractCredits(doc)

  if (mediaUrl) props.url = mediaUrl
  if (hdUrl) props.hdurl = hdUrl
  if (title) props.title = title
  if (explanation) props.explanation = explanation
  props.credits = credits
  props.copyright = copyright

  props.media_type = mediaType

  const year = date.getFullYear()
  const month = String(date.getMonth() + 1).padStart(2, "0")
  const day = String(date.getDate()).padStart(2, "0")
  props.date = year + "-" + month + "-" + day

  props.link = `https://apod.nasa.gov/apod/ap${formatDate(
    new Date(date.toString().replace(/-/g, "/"))
  )}.html`

  props.error = !title || !explanation || !mediaUrl

  return props
}

// Extract the APOD title
function extractTitle(doc) {
  const centerElement = doc.querySelector("center")
  if (centerElement) {
    const boldText = centerElement.querySelector("b")
    if (boldText) {
      return boldText.textContent?.trim() || ""
    }
  }
  const pageTitle =
    doc.title
      .split(" - ")
      .pop()
      ?.trim() || ""
  return pageTitle.replace(/^APOD:\s*\d{4}\s*\w+\s*\d+\s*[–-]\s*/, "").trim()
}

// Extract the explanation text
function extractExplanation(doc) {
  const explanationHeader = Array.from(doc.querySelectorAll("b")).find(
    el => el.textContent?.trim().toLowerCase() === "explanation:"
  )

  if (!explanationHeader) return ""

  let explanation = ""
  let currentNode = explanationHeader.nextSibling

  while (
    currentNode &&
    !currentNode.textContent?.includes("Tomorrow's picture")
  ) {
    if (
      currentNode.nodeType === Node.TEXT_NODE ||
      currentNode.nodeType === Node.ELEMENT_NODE
    ) {
      explanation += currentNode.textContent || ""
    }
    currentNode = currentNode.nextSibling
  }

  return explanation
    .replace(/\n/g, " ")
    .replace(/\s+/g, " ")
    .trim()
}

// Extract credits and copyright information
function extractCredits(doc) {
  const centerElements = doc.querySelectorAll("center")
  let credits = null
  let copyright = null

  for (const centerElement of centerElements) {
    const creditElements = centerElement.querySelectorAll("b")
    for (const element of creditElements) {
      if (
        element.textContent?.toLowerCase().includes("credit") ||
        element.textContent?.toLowerCase().includes("copyright")
      ) {
        const nodes = Array.from(element.parentElement?.childNodes || [])
        const labelIndex = nodes.indexOf(element)
        const nodesAfter = nodes.slice(labelIndex + 1)

        let text = nodesAfter
          .filter(node => {
            return (
              node instanceof Node &&
              (node.nodeType === Node.TEXT_NODE ||
                (node.nodeType === Node.ELEMENT_NODE &&
                  node.nodeName !== "BR" &&
                  !node.textContent?.toLowerCase().includes("credit") &&
                  !node.textContent?.toLowerCase().includes("copyright")))
            )
          })
          .map(node => {
            if (node.nodeType === Node.ELEMENT_NODE) {
              return node.textContent?.trim()
            }
            return node.textContent?.trim()
          })
          .filter(text => text)
          .join(" ")
          .trim()

        if (text) {
          const formattedText = text
            .replace(/\s*,\s*/g, ", ")
            .replace(/\s*;\s*/g, "; ")
            .replace(/\s+/g, " ")
            .replace(/;\s*Processing:/g, "; Processing:")
            .replace(/&(?!\s)/g, "& ")
            .replace(/(?<=\S)&(?=\S)/g, " & ")
            .replace(/\(\s*/g, "(")
            .replace(/\s*\)/g, ")")
            .replace(/(?<=\S)\/(?=\S)|(?=\S)\/(?<=\S)|(?<=\S)\/(?=\S)/g, " / ")
            .replace(/\s{2,}/g, " ")
            .trim()

          if (element.textContent?.toLowerCase().includes("copyright")) {
            copyright = formattedText
          } else if (element.textContent?.toLowerCase().includes("credit")) {
            credits = formattedText
          }
        }
      }
    }

    if (credits && copyright && credits === copyright) {
      credits = null
    }

    if (credits && copyright) break
  }

  return { credits: credits || null, copyright: copyright || null }
}

async function fetchAPOD(startDate, endDate = null) {
  try {
    const getMichiganTime = () => {
      return new Date(
        new Date().toLocaleString("en-US", { timeZone: "America/Detroit" })
      )
    }

    if (!endDate) {
      // Single date fetch
      let date = startDate
        ? new Date(startDate.toString().replace(/-/g, "/"))
        : getMichiganTime()

      // If fetching latest and Michigan time is still on previous day, roll back one day
      if (!startDate) {
        const michiganDate = getMichiganTime()
        if (michiganDate.getHours() < 5) {
          // If before 5 AM Michigan time
          date.setDate(date.getDate() - 1)
        }
      }

      let formattedDate = startDate ? formatDate(date) : null

      // Check cache first
      if (formattedDate && cache.has(formattedDate)) {
        return cache.get(formattedDate) || null
      }

      let url = formattedDate
        ? `${urlBase}ap${formattedDate}.html`
        : `${urlBase}astropix.html`

      let result = await parseAPODPage(url, date)

      if (result) {
        cache.set(formattedDate || "", result)
      }
      return result
    }

    // Date range fetch
    const start = new Date(startDate.toString().replace(/-/g, "/"))
    const end = new Date(endDate.toString().replace(/-/g, "/"))
    const dates = []
    let currentDate = start

    while (currentDate <= end) {
      dates.push(new Date(currentDate))
      currentDate.setDate(currentDate.getDate() + 1)
    }

    // Fetch in parallel with concurrency limit
    const concurrencyLimit = 5
    const results = []

    for (let i = 0; i < dates.length; i += concurrencyLimit) {
      const chunk = dates.slice(i, i + concurrencyLimit)
      const promises = chunk.map(async date => {
        const formattedDate = formatDate(date)

        // Check cache first
        if (cache.has(formattedDate)) {
          return cache.get(formattedDate)
        }

        const url = `${urlBase}ap${formattedDate}.html`
        const result = await parseAPODPage(url, date)
        if (result) {
          cache.set(formattedDate, result)
          return result
        }
        return null
      })

      const chunkResults = await Promise.allSettled(promises)
      results.push(
        ...chunkResults
          .filter(
            result => result.status === "fulfilled" && result.value !== null
          )
          .map(result => result.value)
      )
    }

    return results.reverse()
  } catch (err) {
    console.error("Error fetching APOD:", err)
    throw err
  }
}
Usage example:

Code: Select all

await fetchAPOD(); // Fetches the latest APOD

await fetchAPOD("2024-11-16"); // Fetches the APOD for a specific date

await fetchAPOD("2024-10-16", "2024-11-16"); // Fetches a date range
Example response:

Code: Select all

{
    "date": "2024-12-03",
    "title": "Ice Clouds over a Red Planet",
    "explanation": "If you could stand on Mars -- what might you see? You m...",
    "credits": "NASA, JPL-Caltech, Kevin M. Gill; Processing: Rogelio Bernal Andreo", // Credits when an image/video is not copyrighted
    "copyright": null,
    "url": "https://apod.nasa.gov/apod/image/2412/MarsClouds_Perseverance_960.jpg",
    "hdurl": "https://apod.nasa.gov/apod/image/2412/MarsClouds_Perseverance_2048.jpg",
    "link": "https://apod.nasa.gov/apod/ap241203.html",
    "media_type": "image",
    "error": false
}
This of course isn't 100% perfect yet, but from my testing, it seems to parse data better than NASA's APOD API, as occasionally some other stuff might be included in the explanation, seems faster when fetching a date range too...

Cool site btw!
Last edited by tgx on Fri Dec 06, 2024 2:22 am, edited 2 times in total.

tgx
Asternaut
Posts: 4
Joined: Tue Nov 19, 2024 4:23 am

Re: NASA Open API for APOD

Post by tgx » Fri Dec 06, 2024 1:40 am

RJN wrote: Mon Dec 16, 2019 4:31 pm Another strange idea. Would it be possible to write a code (say, in Python) that ingests that current APOD HTML and outputs an HTML file that has all the useful semantic tags. Alternatively the output could be an XML file. Then Jerry and I could continue to write and edit (nearly) the same APOD HTML file but have automatically created a more ingestion-friendly file that APIs and the like can then utilize. The two APOD HTML files together would allow both ingesters -- one hard-coded to the old HTML, and the other newly more easily coded ingesting programs -- to work simultaneously. We at APOD central would alert mirror sites and world language sites that ingest the old HTML file that this file will soon go away, and so they should switch over to working with the new HTML file. However, they will have a grace period of a year or so where both APOD HTML files will exist simultaneously so that everything works for everyone.

Later, after the transition, we at APOD central can write/edit ONLY an XML file which will be ingested by a program (say, in Python) that generates the HTML file that is then picked up by browsers, mirror sites, smartphone apps, etc. Thoughts?
:idea: I have a similar solution that I would like to propose:

Currently, the way to access an APOD is through a URL like this:

https://apod.nasa.gov/apod/ap240101.html

I think that you should offer a more machine readable version by simply changing the file extension, for example, by providing JSON and XML versions:

https://apod.nasa.gov/apod/ap240101.json would return:

Code: Select all

{
    "date": "2024-01-01",
    "title": "NGC 1232: A Grand Design Spiral Galaxy",
    "explanation": "Galaxies ar...",
    "credits": "FORS, 8.2-meter VLT Antu, ESO",
    "copyright": null,
    "url": "https://apod.nasa.gov/apod/image/2401/ngc1232b_vlt_960.jpg",
    "hdurl": "https://apod.nasa.gov/apod/image/2401/ngc1232b_vlt_3969.jpg",   
    "media_type": "image"
    ...
}
And for the XML version:

https://apod.nasa.gov/apod/ap240101.xml would return:

Code: Select all

<apod>
    <date>2024-01-01</date>
    <title>NGC 1232: A Grand Design Spiral Galaxy</title>
    <explanation>Galaxies ar...</explanation>
    <credits>FORS, 8.2-meter VLT Antu, ESO</credits>
    <copyright></copyright>
    <url>https://apod.nasa.gov/apod/image/2401/ngc1232b_vlt_960.jpg</url>
    <hdurl>https://apod.nasa.gov/apod/image/2401/ngc1232b_vlt_3969.jpg</hdurl>
    <media_type>image</media_type>
    ...
</apod>
Implementing this would require no extra effort on your part when creating HTML files each day. A script could be created to monitor for the daily addition of APOD HTML files, parse them, and generate the corresponding XML and JSON files. You could continue to produce HTML files as you currently do, while also providing machine friendly versions directly accessible from apod.nasa.gov. This would eliminate the need for APIs and the associated overhead, allowing those who parse directly from your HTML to continue without disruption, while also facilitating easier access for anyone developing applications that utilize APOD content.

Additionally, archivepix.json and archivepix.xml could be created to include every APOD in a single file, which would be updated daily by the script, making searching and indexing much simpler.

I also understand that explanations & image credits/copyright typically include hyperlinks, which are removed in the APOD API to create a simple paragraph. Therefore, we could include a separate key that retains the content with the hyperlinks.

In my opinion, this approach is more efficient than the current API.

I would be more than happy to code this solution for you. I would also develop a script to process the older APODs and generate files for them, as well as build a more robust parser to handle unusual APODs that the API may struggle with, such as those embedding iframes other than Youtube & Vimeo (e.g., here). I could also create a GUI for you to quickly edit any APOD data in case the parser did not parse it correctly.

Thoughts?