[Sluglug] new to the list
cerise at armory.com
cerise at armory.com
Tue Oct 4 00:13:48 PDT 2005
On Mon, Oct 03, 2005 at 05:31:26PM -0700, Ignacio Solis wrote:
> * cerise at armory.com (cerise at armory.com) said:
> > In response to my suggestion that the filesystem spread files out around the
> > disk, you responded:
>
> Either we disagree what "spread ... around the disk" means or you're wrong.
Well, since you apparently fail in defining your terms, I'll define them
for you. Spreading a files around the disk means placing files around the surface
of the disk. For the sake of argument, we'll forget about interlacing and assume
that we mean placing contiguous files around the surface of the disk.
An exercise for the reader: contrast "spread files out around the disk" vs.
"spread the contents of files around the disk."
> > Fast access tends to imply minimal disk latency. Minimal disk latency in
> > this situation implies that you follow my recommendation in my last post.
>
> First sentence true, the second doesn't explain itself.
That's because it already did. I assume you're capable of finding the post I
specified.
> Do you know how addressing in a disk works? What the relation of address 1 is
> to address 100? How cylinders and sectors are organized? what is considered to
> be "close" to each other?
I understand perfectly well that there's no necessary relation. However, most
disk drive manufacturers have understood that good filesystems make their disks
look good. Since filesystems tend to believe that address x has a minimal change
in access time from address x+1, they tend to have those map across cylinders, then
sectors in a track, then in adjacent across tracks (in order of small increments to
larger increments).
> > > Schedulers are a different beast, but they still benefit from files being
> > > close by.
> >
> > False.
>
> No.
You're...uhh...going to have to try just a wee bit harder there. I followed
mine with an explanation!
> > Schedulers attempt to optimize reads and writes based on the current
> > position of the head over the disk. If the frequently accessed files are
> > close by (read: localized to one region of the disk), then it will have a lot
> > of work for the disk when it happens to be in that region and almost no work
> > for it the rest of the time.
>
> Hence it will spend more time in that region and waste no time on seeking to
> the other side of the disk.
Err, this may come as a surprise to you, but in the current state of magnetic
media, the disk drive doesn't really have a choice about rotation. It can't stop
in one place, so it can't "spend more time" in a region. It has a limited, maximum
amount of time that it can spend in a region per rotation. It's true that some
disk drives will slow down the rotation, but that doesn't change anything about the
argument.
So, in short, it will in fact waste time seeking to the other side of disk because
once that maximum amount of time has been exceeded (and with speeds >5400 RPM, that
time is short), it will have to wait for the rest of the disk to rotate around.
> > It runs the rather believable risk that it will
> > have too much work to do in that region and not complete all reads/ writes
> > while the head is over that region.
>
> What? You mean that it will have less work if on top of that it has to seek to
> another track?
Funny, I never said seeking to another track. In fact, up to this point, I
hadn't spoken about drive geometries at all apart from the idealized form of a
platter rotating at constant speed.
Now the contention here, as I've stated a few times, is that putting all the
data in a pie sliced region of the disk is slower than spreading the data out
over the entire disk.
Judging by your statements, you either have real problems in reading English
or you have problems understanding how hard disk drives work at the moment.
> It seems clearer to me that we disagree on what "spread around the disk" means.
> To me it means on a disk with addresses ranging from 1 to 100 it any one
> address will have the same probablilty to have data, hence "spread around". If
> you mean to say that it is on a track/cylinder (hence from one point, you turn
> the disk and cover some space), then that's not "spread around", that is
> defragmented into contiguous space. Obviously you want to fill all the track,
> and if you need more space go to the next track and so forth.
Discussing it in terms of addresses is meaningless because there is no direct
relation between addresses and disk geometry.
"spread around the disk" means literally what it sounds like. The idea situation
is to place frequently accessed files as evenly as possible around the circumference
of the disk.
I could repeat my comments from two posts ago, but since you're already supposed
to reread that post, I'll assume you're capable of following this pointer to it.
> I've no idea what trivially means to you, but if what you want is proof that
> putting everything together is the best thing, here is the proof done at IBM
> in 1973:
>
> "Placement of Records on a Secondary Storage Device to Minimize Access Time"
> Grossman D.D., Silverman H.F., Journal of the ACM July 1973.
> http://portal.acm.org/citation.cfm?id=321775
Well, I could `webster trivially` for you, but I assume you can manage that for
yourself. They must teach you _something_ in grad school.
As for papers, I can quote much more recently than 1973. If I really cared, I
could quote papers that I'm credited on.
Amazingly, I don't. If you can't be bothered citing the bits that you think are
important, then I can't be bothered to refute them.
-Phil/CERisE
More information about the Sluglug
mailing list