Policy on saving NSL data

Post by RJN » Fri Jan 28, 2005 8:17 pm

The NSL array of nine 75 Gig SCSI drives keeps filling up. We keep saving data to DVD leaving the FITS files from the last two months, but the drives are too frequently 90% filled or more. What should we do?

1. Bigger disk drives. Switch to large IDE drives. Pricegrabber.com lists external 500G disk drives for under $1. a Gig. For perhaps $5,000, we could increase the NSL data saving capacity by a factor of a few. One problem is that our system administrator and general computer guru is a big fan of SCSI drives and upgrading them is much more expensive.

2. Save less data. Specifically, save less than 2 months of old FITS files, as is done now. This will help a little bit but be annoying for people wishing to see a FITS file from the month before last.

3. Faster DVD burner. The old one we have now was purchased two years ago, and even though we can't find the actual speed, surely a faster one is available today. This will likely cost around $500.

4. Ring buffer. Start automatically deleting FITS and moving GIF data that is older than two months. JPGs can stay, for now. On the down side, this means losing real data and possibly re-defining the work-study job for one of our key undergraduates. On the up side, this solution automates everything and no humans will be needed to intervene on a regular basis.

Any thoughts would be much appreciated!


Post by The Meal » Fri Jan 28, 2005 8:35 pm

Buy more hard drives! Lots and lots and lots and lots of 'em!!

Thoughts on data-saving

Post by TJ » Fri Jan 28, 2005 9:15 pm

I wouldn't say I'm against IDE drives, just that whatever system we choose makes sense in terms of scalability and reliability. The problem with IDE isn't the speed or capacity, but rather the limit on number of devices -- namely 4. The system drive takes one spot, leaving 3. I'm not sure the case allows space for 3 more drives, and I'm not aware of an external IDE case/mounting solution for a Sun -- hence the SCSI array.

Ideally, we would look at some sort of "real" disk array -- probably attached to the server via SCSI, but probably running with IDE disks inside. The drives that we have are not really in an array, rather a string of 9 individual disks on a SCSI bus. Real arrays, properly implemented, scale quite well -- either through the addition of drives until all slots are populated, or through the connection of an additional drive chassis.

It's all a matter of balancing cost against requirements. More expensive solutions are often more flexible, while cheaper ones (backup tapes come to mind) don't allow fast access to archived data. I'd recommend keeping these principles in mind, regardless of technical limitations like the number of IDE devices.

Size: how much do you want to keep?
Lifetime: how long do you want to keep it?
Access: how much trouble are you willing to go to in order to look at archived data?
alternately, how immediate should access be? online, nearline, or offline?
Cost: what is it worth to do this?

Post by lior » Fri Jan 28, 2005 9:31 pm

Changing the hardware is a long term solution. On the short term we will probably have to pay on-line data. I was thinking about writing a short program that compresses the older files the archives. In a second thought, the JPGs are already compressed, so I need to check whether the additional disk space (if at all) provided by using gzip worth changing our web site.

Post by Emoticon Fury » Sat Jan 29, 2005 1:26 am

Couldnt you load a full tower with serveral DVD burners with dual layer capability? That way you could cram twice as much on one disk and burn more than one disk at a time. Also has anyone looked into file compression technologies like ZIP, ACE or RAR?

Post by Matt Merlo » Sat Jan 29, 2005 8:15 pm

I would go with getting a new DVD burner. This way there is no data lost, but a huge amount of money is not needed.
Post by Vic Muzzin » Sat Jan 29, 2005 9:03 pm

Can we do SATA?
I have this hard drive from newegg
http://www.newegg.com/app/ViewProductDe ... 59&depa=1I
63 cents per gig.
It may just be my imagination but I believe my SATA drive responds much faster than my IDE drive.
I also recommend this DVD burner.
http://www.newegg.com/app/ViewProductDe ... 962&depa=1
Maybe with careful shopping we could employ both solutions, therby best preparing the system for future expansion.

What is the speed of the current burner?

Post by nbrosch » Mon Jan 31, 2005 4:52 pm

I suggest NOT going SCSI, but using the cheaper alternative. We are regularly using 200GB ATA disks and are pretty happy. Note though that we are doubling the data up on two disks, so that a crash does not destroy irreplaceable observation data.
Noah Brosch

Post by dnabost » Wed Aug 29, 2007 6:59 pm

With the way technology is growing at an exponential rate, storage is becoming cheaper and cheaper. It's should not cost you an arm and a leg.

I would probably say to go with the dvd burner though.

Disks vs. DVDs

Post by nbrosch » Wed Aug 29, 2007 7:14 pm

The disk advantage is the on-line availability of the images and the reduction of hassle in mounting DVDs. One can now have 4x500 GB disks in a single enclosure, for a 2 TB storage (1 TB with redundancy). However, images grew as well. The CONCAM IV now in testing at the Wise Observatory produces ~13MB images and we probably will go to a faster cadence than the lower CONCAM models. This implies a data generation of some 6 GB per night and even a 1 TB storage will fill up in six months...