Here's the raw data of an HTTP GET, an e-mail, and a MIME encoded picture. Tell me if you see any patterns.
First, here is the raw source of a short e-mail. Note that it is also the format that it is kept on my harddrive, as the Mozilla mail client uses the mbox format.
From - Tue Apr 15 21:22:51 2003
X-UIDL: 3e9bf91300000055
X-Mozilla-Status: 0001
X-Mozilla-Status2: 02000000
Return-Path: <jo.....working.org>
Message-ID: <3E9C665A.603030....working.org>
Date: Tue, 15 Apr 2003 16:06:50 -0400
From: Joe Gregorio <jo.....working.org>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Joe Gregorio <jo.....working.org>
Subject: Test
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Status: RO
This is a test.
Now here is the source of a simple HTTP GET on the url http://bitworking.org/news/1?xml
.
HTTP/1.1 200 OK
Date: Thu, 17 Apr 2003 03:37:56 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) mod_throttle/3.1.2 PHP/4.1.2 DAV/1.0.2 mod_ssl/2.8.12 OpenSSL/0.9.6
Transfer-Encoding: chunked
Content-Type: text/xml
Connection: close
Proxy-Connection: close
<?xml version="1.0" ?>
...
Here is the source of another mail message, this time with an attachement. (To keep things short I have removed some of the SMTP headers.)
From - Tue Apr 15 21:22:50 2003
Return-Path: <joe....working.org>
Message-ID: <3E9C652E.403060...working.org>
Date: Tue, 15 Apr 2003 16:01:50 -0400
From: Joe Gregorio <jo....working.org>
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.3) Gecko/20030312
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Joe Gregorio <jo....working.org>
Subject: Picture
Content-Type: multipart/mixed;
boundary="------------030605020005060907080306"
Status: RO
This is a multi-part message in MIME format.
--------------030605020005060907080306
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Here is a picture.
-joe
--------------030605020005060907080306
Content-Type: image/gif;
name="Picture1.gif"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="Picture1.gif"
R0lGODlhCgAKAPcAAP//////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////
/////////////////////////////////yH5BAEAAAEALAAAAAAKAAoAAAgSAAMIHEiwoMGD
CBMqXMiwYcKAADs=
--------------030605020005060907080306--
And of course, who can forget Mark Pilgrim:
C:\>curl --include http://diveintomark.org
HTTP/1.1 200 OK
Date: Thu, 17 Apr 2003 03:58:23 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) PHP/4.1.2 mod_gzip/1.3.26.1a DAV/1.0.3 mod_ssl/2.8.12 OpenSSL/0.9.6b
Vary: Accept-Encoding
X-Clerks: I'm not even supposed to BE here today!
Last-Modified: Thu, 17 Apr 2003 03:25:43 GMT
Transfer-Encoding: chunked
Content-Type: text/html
Connection: close
Proxy-Connection: close
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
...
Hat tip to Curioso for pointing out another example in Usenet news messages:
Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
From: jerry@eagle.uucp (Jerry Schwarz)
Newsgroups: net.general
Subject: Usenet Etiquette -- Please Read
Message-ID: <642@eagle.UUCP>
Date: Friday, 19-Nov-82 16:14:55 EST
Followup-To: net.news
Expires: Saturday, 1-Jan-83 00:00:00 EST
Date-Received: Friday, 19-Nov-82 16:59:30 EST
Organization: Bell Labs, Murray Hill
The body of the article comes here, after a blank line.
The pattern, if you missed it, is all those headers of the form:
Header: Value
Now those headers had their start in RFC 822, which is, and this is my point, one of the unsung pillars of the internet. Like HTML, it is theoretically the worst of all possible formats. It is 7 bit ASCII. Fixed line length. No centrally controlled way to add custom headers. But here it is today, the meta-data transport of choice for HTTP, SMTP and MIME. Now it has been updated from it's humble 7 bit ASCII roots with RFC 2822, and MIME has it's own cleaned up version, but they all owe their roots to 822.
Posted by Joe on 2003-04-18
Thu, 17 Apr 2003
title: Google Phonebook Search Gives Some the Willies link: http://www.raelity.org/archives/computers/internet/www/search_engines/google/phonebook_too_close_to_home.html subject: /computers/internet/www/search_engines/google date: 2003-04-17Posted by Curioso on 2003-04-18
Posted by Curioso on 2003-04-18