Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair Gosling emacs.tap from Universität Stuttgart #17

Open
wants to merge 3 commits into
base: sources
Choose a base branch
from

Conversation

thaliaarchi
Copy link

Gosling Emacs file ftp.informatik.uni-stuttgart.de/pub/cm/dec/decus/emacs.tap is corrupt, as reported in #15. I repair two broken parts.

Every 4096 bytes, the byte sequence "\0\x10\0\0\0\x10\0\0" is inserted into the stream, and the stream starts with "\0\x10\0\0". This causes extraction to fail with at least GNU and BSD tar. This appears to have been introduced after the tar was created. I remove these sequences.

Additionally, the files in the emacs.tap tar have their basenames truncated to 14 bytes. The filenames would have been truncated before the tar was created, but I assume it does not reflect how Gosling Emacs was originally distributed. This patch invalidates the respective header checksums, I recompute and update them, matching the original formatting. The three affected files are:

  • ./emacs4.2/maclib/{electric-lisp. => electric-lisp.ml}
  • ./emacs4.2/man/{introduction.m => introduction.mss}
  • ./emacs4.2/man/{incr-search.ms => incr-search.mss}

I made these changes in a copy of emacs.tap to hopefully make the provenance more clear. I am not sure whether it should sit adjacent to it, remaining in the uni-stuttgart.de hierarchy, even though it was hosted there.

It now extracts with both GNU and BSD tar. The tar is still truncated, so only a portion of man/emacs.doc is extracted. I leave its length as is in the header, to keep that clear. In comparing it to the versions from Brian Reed and der Mouse, it seems not much else is missing, if anything.

For reproducibility, I include the script I wrote in Rust to make the repairs. I err on the side of err-asserting, so have high confidence in its correctness, despite its hacked-together lack of abstraction.

Show emacs-tap-fix

src/main.rs:

use std::{
    env::args_os,
    ffi::CStr,
    fs::{self, File, FileTimes},
    io::{self, Write},
    path::PathBuf,
    process, str,
    time::{Duration, SystemTime},
};

use bstr::ByteSlice;

fn usage() {
    eprintln!("Usage: emacs-tap-read [--list] [--extract] [--dump] [--fix-paths]");
    process::exit(2);
}

fn main() {
    let args = args_os().skip(1);
    if args.len() == 0 {
        usage();
    }
    let mut list = false;
    let mut extract = false;
    let mut dump = false;
    let mut fix_paths = false;
    for arg in args {
        match arg.as_encoded_bytes() {
            b"--list" => list = true,
            b"--extract" => extract = true,
            b"--dump" => dump = true,
            b"--fix-paths" => fix_paths = true,
            _ => usage(),
        }
    }

    let data = fs::read("emacs.tap").unwrap();

    // Every 4096 bytes, this 8-byte sequence is inserted.
    let data = data
        .strip_prefix(b"\0\x10\0\0")
        .unwrap_or(&data)
        .split_str(b"\0\x10\0\0\0\x10\0\0")
        .inspect(|chunk| assert!(chunk.len() == 4096 || chunk.len() == 0))
        .flatten()
        .copied()
        .collect::<Vec<_>>();

    let mut stdout = io::stdout().lock();
    let mut stderr = io::stderr().lock();

    let mut i = 0;
    while i < data.len() {
        let header = &data[i..i + 512];
        let (path_raw, rest) = header.split_at(100);
        let (mode_raw, rest) = rest.split_at(8);
        let (uid_raw, rest) = rest.split_at(8);
        let (gid_raw, rest) = rest.split_at(8);
        let (size_raw, rest) = rest.split_at(12);
        let (mtime_raw, rest) = rest.split_at(12);
        let (cksum_raw, rest) = rest.split_at(8);
        let (typeflag_raw, rest) = rest.split_at(1);

        let path = path_raw.trim_end_with(|b| b == '\0');
        let mode = parse_octal(mode_raw);
        let uid = parse_octal(uid_raw);
        let gid = parse_octal(gid_raw);
        let size: usize = parse_octal(size_raw).try_into().unwrap();
        let mtime = parse_octal(mtime_raw);
        let cksum = parse_octal(cksum_raw);
        assert!(rest.iter().all(|&b| b == b'\0'));

        let patched_path: &[u8] = if fix_paths {
            // Basenames are truncated to 14 bytes.
            match path {
                b"./emacs4.2/maclib/electric-lisp." => b"./emacs4.2/maclib/electric-lisp.ml",
                b"./emacs4.2/man/introduction.m" => b"./emacs4.2/man/introduction.mss",
                b"./emacs4.2/man/incr-search.ms" => b"./emacs4.2/man/incr-search.mss",
                _ => path,
            }
        } else {
            path
        };

        if list {
            stdout.flush().unwrap();
            writeln!(
                stderr,
                "{path:36} mode=0{mode:<3o} uid={uid:<2} gid={gid:<3} size={size:<6} \
                mtime={mtime:?} cksum={cksum:<4} typeflag={typeflag:?}",
                path = format!("{:?}", patched_path.as_bstr()),
                typeflag = typeflag_raw.as_bstr(),
            )
            .unwrap();
            stderr.flush().unwrap();
        }

        i += 512;
        let contents = &data[i..(i + size).min(data.len())];
        let rounded_size = ((size + 511) / 512) * 512;
        let rounded_contents = &data[i..(i + rounded_size).min(data.len())];

        if extract {
            let mtime = SystemTime::UNIX_EPOCH + Duration::from_secs(mtime as u64);
            let path = PathBuf::from(str::from_utf8(patched_path).unwrap());
            if let Some(dir) = path.parent() {
                fs::create_dir_all(dir).unwrap();
            }
            let mut f = File::create(&path).unwrap();
            f.write_all(contents).unwrap();
            if contents.len() < size {
                // Pad truncated files (in this case, man/emacs.doc) with zeros
                // like BSD tar (but unlike GNU tar).
                let zeros = vec![b'\0'; size - contents.len()];
                f.write_all(&zeros).unwrap();
            }
            f.set_times(FileTimes::new().set_modified(mtime)).unwrap();
        }

        if dump {
            if patched_path == path {
                stdout.write_all(&header).unwrap();
            } else {
                let mut patched_header = Vec::with_capacity(512);
                patched_header.extend_from_slice(patched_path);
                patched_header.resize(100, b'\0');
                patched_header.extend_from_slice(&header[100..148]);
                patched_header.extend_from_slice(b"        "); // Checksum placeholder
                patched_header.push(header[157]);
                patched_header.extend_from_slice(&data[i - (512 - 157)..i]);
                assert_eq!(patched_header.len(), 512);
                let patched_cksum: u64 = patched_header.iter().map(|&b| b as u64).sum();
                let mut patched_cksum = format!("{patched_cksum:o}\0").into_bytes();
                assert!(patched_cksum.len() <= 8);
                let leading_spaces = cksum_raw.iter().take_while(|&&b| b == b' ').count();
                for _ in 0..leading_spaces {
                    if patched_cksum.len() >= 8 {
                        break;
                    }
                    patched_cksum.insert(0, b' ');
                }
                patched_cksum.resize(8, b' ');
                for i in 0..patched_cksum.len() {
                    patched_header[148 + i] = patched_cksum[i];
                }
                stdout.write_all(&patched_header).unwrap();
            }
            stdout.write_all(rounded_contents).unwrap();
        }

        i += rounded_size as usize;
    }
}

fn parse_octal(s: &[u8]) -> i32 {
    let s = CStr::from_bytes_until_nul(s)
        .map(CStr::to_bytes)
        .unwrap_or(s);
    let s = s.trim_end_with(|b| b == ' ').trim_start_with(|b| b == ' ');
    i32::from_str_radix(str::from_utf8(s).unwrap(), 8).unwrap()
}

Cargo.toml:

[package]
name = "emacs-tap-fix"
version = "0.1.0"
edition = "2021"

[dependencies]
bstr = "1.9.1"

Remove the block size markers from the .tap file. Before and after each
4096-byte block is the byte sequence "\0\x10\0\0", 4096 in little
endian. Create a copy of the original, with just the underlying stream,
so it can be read as a tar.
The files in the emacs.tap tar have their basenames truncated to 14
bytes. Fix the three affected files:

  ./emacs4.2/maclib/{electric-lisp. => electric-lisp.ml}
  ./emacs4.2/man/{introduction.m => introduction.mss}
  ./emacs4.2/man/{incr-search.ms => incr-search.mss}

This modification invalidates the respective header checksums, so those
are recomputed and replaced, matching the original formatting.
@larsbrinkhoff
Copy link
Owner

First of all, note that it's a ".tap" file, not a tar file. The "\0\x10\0\0" you see is the tape block length, 4096 in little endian. The block length is both before and after each block. Fixing the file names is, of course, entirely valid and appropriate. With this in mind, I think we can leave the .tap file alone and place your tar file next to it.

@thaliaarchi
Copy link
Author

Interesting. I'll rebase my commit messages to note this information in a bit. Is there software you typically use to read .tap files?

@larsbrinkhoff
Copy link
Owner

I believe the most frequent use is with emulators for older machines, like SIMH.

@thaliaarchi
Copy link
Author

Alright, I've reworded the commits and it should be ready.

Also, I noticed that {http,ftp}://ftp.informatik.uni-stuttgart.de/pub/cm/dec/decus/emacs.tap is not live anymore and wasn't saved on the Internet Archive. Do you check to make sure these sources are archived on IA?

@johnsonjh
Copy link

johnsonjh commented Jul 13, 2024

@thaliaarchi

We (the DPS8M Developers) have created a naive tap2raw utility that can convert DPS8M variant tap files to raw binary streams.

Generally, as our software was originally derived from SIMH, and we generally follow the the "industry de facto standard" tap format, it should work, but please be aware this utility does not actually implement the real tap specification, and is missing any kind of error handling and proper implementation of the various record and marker classes – it is intended only for processing tap files produced by DPS8M and not "archival" tap streams.

Though, in this case, our utility is sufficient, i.e.:

$ tap2raw < emacs.tap > emacs.tar
Position 1046524     Mark 1        Block 1        256 records

$ bsdtar --version
bsdtar 3.7.2 - libarchive 3.7.2 zlib/1.3.1.zlib-ng liblzma/5.4.6 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6

$ bsdtar tvf emacs.tap.tar
-rw-r--r--  0 0      200      3266 May 29  1983 ./emacs4.2/src/Trm.h
-rw-r--r--  0 0      200      4062 May 24  1983 ./emacs4.2/src/TrmAmb.c
-rw-r--r--  0 0      200      4001 May 24  1983 ./emacs4.2/src/TrmBG.c
-rw-r--r--  0 0      200      6132 May 24  1983 ./emacs4.2/src/TrmC100.c
-rw-r--r--  0 0      200      4491 May 24  1983 ./emacs4.2/src/TrmCLNZ.c
-rw-r--r--  0 0      200      4882 May 24  1983 ./emacs4.2/src/TrmI400.c
-rw-r--r--  0 0      200      1726 May 24  1983 ./emacs4.2/src/TrmMiniB.c
-rw-r--r--  0 0      200        31 Feb  1  1984 ./emacs4.2/src/dbcreate
-rw-r--r--  0 0      200      4400 May 24  1983 ./emacs4.2/src/TrmTEK4025.c
-rw-r--r--  0 0      200      8223 May 24  1983 ./emacs4.2/src/TrmTERM.c
-rw-r--r--  0 0      200      3751 May 24  1983 ./emacs4.2/src/TrmVT100.c
-rw-r--r--  0 0      200      8588 May 24  1983 ./emacs4.2/src/abbrev.c
-rw-r--r--  0 0      200      1257 May 24  1983 ./emacs4.2/src/abbrev.h
-rw-r--r--  0 0      200      7882 May 24  1983 ./emacs4.2/src/abspath.c
-rw-r--r--  0 0      200      9558 May 24  1983 ./emacs4.2/src/arithmetic.c
-rw-r--r--  0 0      200     14791 Jun  7  1983 ./emacs4.2/src/buffer.c
-rw-r--r--  0 0      200      8820 May 24  1983 ./emacs4.2/src/buffer.h
-rw-r--r--  0 0      200      2466 May 24  1983 ./emacs4.2/src/casefiddle.c
-rw-r--r--  0 0      200      4990 May 24  1983 ./emacs4.2/src/collectmail.c
-rw-r--r--  0 0      200      1340 May 24  1983 ./emacs4.2/src/columns.c
-rw-r--r--  0 0      200     10993 Jun  6  1983 ./emacs4.2/src/compile.c
-rw-r--r--  0 0      200      3544 May 24  1983 ./emacs4.2/src/compile.h
-rw-r--r--  0 0      200      3699 Jul  1  1983 ./emacs4.2/src/config.h
-rw-r--r--  0 0      200       517 May 24  1983 ./emacs4.2/src/dbadd.c
-rw-r--r--  0 0      200      1251 May 24  1983 ./emacs4.2/src/dblist.c
-rw-r--r--  0 0      200      5441 May 24  1983 ./emacs4.2/src/dbmanager.c
-rw-r--r--  0 0      200       454 May 24  1983 ./emacs4.2/src/dbprint.c
-rw-r--r--  0 0      200     26130 Jan 27  1984 ./emacs4.2/src/display.c
-rw-r--r--  0 0      200      1753 May 31  1983 ./emacs4.2/src/display.h
-rw-r--r--  0 0      200       790 Jan 27  1984 ./emacs4.2/src/dsp.c
-rw-r--r--  0 0      200       546 May 24  1983 ./emacs4.2/src/eipcname.c
-rw-r--r--  0 0      200     12712 Jun  6  1983 ./emacs4.2/src/emacs.c
-rw-r--r--  0 0      200       519 May 24  1983 ./emacs4.2/src/emacsedit.c
-rw-r--r--  0 0      200      2926 May 24  1983 ./emacs4.2/src/errlog.c
-rw-r--r--  0 0      200      7158 Jun 19  1983 ./emacs4.2/src/filecomp.c
-rw-r--r--  0 0      200     18256 Jun  6  1983 ./emacs4.2/src/fileio.c
-rw-r--r--  0 0      200      3930 May 24  1983 ./emacs4.2/src/filesort.c
-rw-r--r--  0 0      200     23572 Jun 28  1984 ./emacs4.2/src/mchan.c
-rw-r--r--  0 0      200      2091 Jan 27  1984 ./emacs4.2/src/mchan.h
-rw-r--r--  0 0      200      8062 Feb 10  1984 ./emacs4.2/src/subproc.c
-rw-r--r--  0 0      200      3850 Feb  1  1984 ./emacs4.2/src/makefile
-rw-r--r--  0 0      200      9507 Jan 27  1984 ./emacs4.2/src/keyboard.c
-rw-r--r--  0 0      200      8071 May 24  1983 ./emacs4.2/src/keyboard.h
-rw-r--r--  0 0      200     12183 Jun 20  1983 ./emacs4.2/src/lispfuncs.c
-rw-r--r--  0 0      200      2635 Mar  8  1984 ./emacs4.2/src/loadst.c
-rw-r--r--  0 0      200      5119 Jun 20  1983 ./emacs4.2/src/macros.c
-rw-r--r--  0 0      200       399 May 24  1983 ./emacs4.2/src/macros.h
-rw-r--r--  0 0      200     16329 Jan 27  1984 ./emacs4.2/src/options.c
-rw-r--r--  0 0      200      2326 Mar  8  1984 ./emacs4.2/src/makemail.c
-rw-r--r--  0 0      200      8123 Jan 27  1984 ./emacs4.2/src/syntax.c
-rw-r--r--  0 0      200      4670 May 24  1983 ./emacs4.2/src/metacoms.c
-rw-r--r--  0 0      200     12702 May 24  1983 ./emacs4.2/src/minibuf.c
-rw-r--r--  0 0      200     23056 Jun 22  1983 ./emacs4.2/src/mlisp.c
-rw-r--r--  0 0      200      4208 May 24  1983 ./emacs4.2/src/mlisp.h
-rw-r--r--  0 0      200     16976 May 24  1983 ./emacs4.2/src/ndbm.c
-rw-r--r--  0 0      200      1078 May 24  1983 ./emacs4.2/src/ndbm.h
-rw-r--r--  0 0      200       642 Feb  1  1984 ./emacs4.2/src/sindex.c
-rw-r--r--  0 0      200      2073 May 24  1983 ./emacs4.2/src/quit.c
-rw-r--r--  0 0      200     18883 May 24  1983 ./emacs4.2/src/search.c
-rw-r--r--  0 0      200      2552 May 24  1983 ./emacs4.2/src/search.h
-rw-r--r--  0 0      200     14736 May 24  1983 ./emacs4.2/src/simplecoms.c
-rw-r--r--  0 0      200      1231 Feb 16  1984 ./emacs4.2/src/syntax.h
-rw-r--r--  0 0      200      3149 May 24  1983 ./emacs4.2/src/undo.c
-rw-r--r--  0 0      200      1007 May 24  1983 ./emacs4.2/src/undo.h
-rw-r--r--  0 0      200       348 May 24  1983 ./emacs4.2/src/useripc.c
-rw-r--r--  0 0      200        75 May 24  1983 ./emacs4.2/src/version.c
-rw-r--r--  0 0      200     20938 May 24  1983 ./emacs4.2/src/window.c
-rw-r--r--  0 0      200      3523 May 24  1983 ./emacs4.2/src/window.h
-rw-r--r--  0 0      200      5887 May 24  1983 ./emacs4.2/src/windowman.c
-rw-r--r--  0 0      200      2468 Jan 27  1984 ./emacs4.2/src/README
-rw-r--r--  0 0      200        66 Jun 28  1984 ./emacs4.2/src/mchan.c.CKP
-rw-r--r--  0 75     0        1345 Jul 21  1983 ./emacs4.2/maclib/DesWord.ml
-rw-r--r--  0 75     0         391 Jun 24  1983 ./emacs4.2/maclib/abbrev.ml
-rw-r--r--  0 75     0         515 Jun 24  1983 ./emacs4.2/maclib/add-log.ml
-rw-r--r--  0 75     0         919 Jun 24  1983 ./emacs4.2/maclib/auto-arg.ml
-rw-r--r--  0 75     0        4330 Jun 24  1983 ./emacs4.2/maclib/bb-aux.ml
-rw-r--r--  0 75     0        4201 Jun 24  1983 ./emacs4.2/maclib/buff.ml
-rw-r--r--  0 75     0       10294 Jun 24  1983 ./emacs4.2/maclib/buffer-edit.ml
-rw-r--r--  0 75     0         609 Jun 24  1983 ./emacs4.2/maclib/c-mode.ml
-rw-r--r--  0 75     0        1079 Jun 24  1983 ./emacs4.2/maclib/capword.ml
-rw-r--r--  0 75     0         440 Jun 24  1983 ./emacs4.2/maclib/centre-line.ml
-rw-r--r--  0 75     0        2196 Jun 24  1983 ./emacs4.2/maclib/cmucs-misc.ml
-rw-r--r--  0 75     0         906 Jun 24  1983 ./emacs4.2/maclib/cmucs-modes.ml
-rw-r--r--  0 75     0         925 Jun 24  1983 ./emacs4.2/maclib/cmucs-smart.ml
-rw-r--r--  0 75     0        8131 Jun 24  1983 ./emacs4.2/maclib/cmucs.ml
-rw-r--r--  0 75     0        2399 Jun 24  1983 ./emacs4.2/maclib/comments.ml
-rw-r--r--  0 75     0        6042 Jun 24  1983 ./emacs4.2/maclib/crypt.ml
-rw-r--r--  0 75     0        1616 Jun 24  1983 ./emacs4.2/maclib/debug.ml
-rw-r--r--  0 75     0         882 Jul 21  1983 ./emacs4.2/maclib/describeX.ml
-rw-r--r--  0 75     0        5986 Aug  4  1983 ./emacs4.2/maclib/dired.ml
-rw-r--r--  0 75     0       25610 Jun 24  1983 ./emacs4.2/maclib/elec-c.ml
-rw-r--r--  0 75     0        4497 Jun 24  1983 ./emacs4.2/maclib/electric-c.ml
-rw-r--r--  0 75     0         206 Jun 24  1983 ./emacs4.2/maclib/expandX.ml
-rw-r--r--  0 75     0        3968 Jun 24  1983 ./emacs4.2/maclib/ftp-visit.ml
-rw-r--r--  0 75     0         528 Jun 24  1983 ./emacs4.2/maclib/generate.ml
-rw-r--r--  0 75     0        1491 Jun 24  1983 ./emacs4.2/maclib/goto.ml
-rw-r--r--  0 75     0        5730 Jun 24  1983 ./emacs4.2/maclib/incr-search.ml
-rw-r--r--  0 75     0        1304 Jun 24  1983 ./emacs4.2/maclib/ind-region.ml
-rw-r--r--  0 75     0       12457 Jul 21  1983 ./emacs4.2/maclib/info.ml
-rw-r--r--  0 75     0         875 Jun 24  1983 ./emacs4.2/maclib/itc-proto.ml
-rw-r--r--  0 75     0        1550 Jun 24  1983 ./emacs4.2/maclib/justify.ml
-rw-r--r--  0 75     0        1848 Jun 24  1983 ./emacs4.2/maclib/kill.ml
-rw-r--r--  0 75     0        6994 Jun 24  1983 ./emacs4.2/maclib/killring.ml
-rw-r--r--  0 75     0         288 Jul 21  1983 ./emacs4.2/maclib/learn.ml
-rw-r--r--  0 75     0        3031 Jun 24  1983 ./emacs4.2/maclib/lisp-mode.ml
-rw-r--r--  0 75     0        2297 Jun 24  1983 ./emacs4.2/maclib/mail-draft.ml
-rw-r--r--  0 75     0        1703 Jun 24  1983 ./emacs4.2/maclib/man.ml
-rw-r--r--  0 75     0         758 Jun 24  1983 ./emacs4.2/maclib/mark-ring.ml
-rw-r--r--  0 75     0        3242 Jul 21  1983 ./emacs4.2/maclib/post.ml
-rw-r--r--  0 75     0         805 Jun 24  1983 ./emacs4.2/maclib/mouse.ml
-rw-r--r--  0 75     0       14178 Jul 21  1983 ./emacs4.2/maclib/bboard.ml
-rw-r--r--  0 75     0        6143 Jun 24  1983 ./emacs4.2/maclib/new-el-mode.ml
-rw-r--r--  0 75     0        2112 Jun 24  1983 ./emacs4.2/maclib/newcompile.ml
-rw-r--r--  0 75     0         183 Jun 24  1983 ./emacs4.2/maclib/normal-mode.ml
-rw-r--r--  0 75     0        3149 Jun 24  1983 ./emacs4.2/maclib/occur.ml
-rw-r--r--  0 75     0         625 Jun 24  1983 ./emacs4.2/maclib/paragraphs.ml
-rw-r--r--  0 75     0        8659 Jun 24  1983 ./emacs4.2/maclib/pascal.ml
-rw-r--r--  0 75     0        9330 Jul 21  1983 ./emacs4.2/maclib/process.ml
-rw-r--r--  0 75     0        1072 Jun 24  1983 ./emacs4.2/maclib/profile.ml
-rw-r--r--  0 75     0        1197 Jun 24  1983 ./emacs4.2/maclib/pwd.ml
-rw-r--r--  0 75     0         482 Jun 24  1983 ./emacs4.2/maclib/readonly.ml
-rw-r--r--  0 75     0       14280 Jul 21  1983 ./emacs4.2/maclib/rmail.ml
-rw-r--r--  0 75     0        6618 Jun 24  1983 ./emacs4.2/maclib/scribe-bib.ml
-rw-r--r--  0 75     0        6420 Jun 24  1983 ./emacs4.2/maclib/scribe.ml
-rw-r--r--  0 75     0         842 Jun 24  1983 ./emacs4.2/maclib/sentences.ml
-rw-r--r--  0 75     0         408 Jun 24  1983 ./emacs4.2/maclib/shift.ml
-rw-r--r--  0 75     0       11472 Jul 21  1983 ./emacs4.2/maclib/smail.ml
-rw-r--r--  0 75     0        1063 Jun 24  1983 ./emacs4.2/maclib/spell.ml
-rw-r--r--  0 75     0        5651 Jun 24  1983 ./emacs4.2/maclib/squeeze.ml
-rw-r--r--  0 75     0        1497 Jun 24  1983 ./emacs4.2/maclib/srccom.ml
-rw-r--r--  0 75     0        4895 Jun 24  1983 ./emacs4.2/maclib/tags.ml
-rw-r--r--  0 75     0         324 Jun 24  1983 ./emacs4.2/maclib/text-mode.ml
-rw-r--r--  0 75     0        2736 Jul 21  1983 ./emacs4.2/maclib/time.ml
-rw-r--r--  0 75     0        4383 Jun 24  1983 ./emacs4.2/maclib/transp.ml
-rw-r--r--  0 75     0         282 Jun 24  1983 ./emacs4.2/maclib/undo.ml
-rw-r--r--  0 75     0        1913 Jun 24  1983 ./emacs4.2/maclib/vi.ml
-rw-r--r--  0 75     0       10606 Jun 24  1983 ./emacs4.2/maclib/whist.ml
-rw-r--r--  0 75     0         250 Jun 24  1983 ./emacs4.2/maclib/writeregion.ml
-rw-r--r--  0 75     0        5493 Jun 24  1983 ./emacs4.2/maclib/electric-lisp.
-rw-r--r--  0 75     0        3311 Jun 24  1983 ./emacs4.2/maclib/aton.ml
-rw-rw-rw-  0 0      0         109 Jan 27  1984 ./emacs4.2/maclib/README
-rw-rw-rw-  0 75     232      2491 Jan 23  1983 ./emacs4.2/man/introduction.m
-rw-rw-rw-  0 75     232      6128 Apr 21  1983 ./emacs4.2/man/contents.mss
-rw-rw-rw-  0 75     232      2432 Nov 28  1981 ./emacs4.2/man/summary.mss
-rw-rw-rw-  0 75     232       368 Aug  5  1982 ./emacs4.2/man/abbrev.mss
-rw-rw-rw-  0 75     232     11333 Jan 17  1983 ./emacs4.2/man/emacsm.mak
-rw-rw-rw-  0 75     232     55000 Apr 10  1983 ./emacs4.2/man/hints.mss
-rw-rw-rw-  0 75     232        43 Jan  7  1981 ./emacs4.2/man/tutorial.mss
-rw-rw-rw-  0 75     232     45643 Apr  9  1983 ./emacs4.2/man/basics.mss
-rw-rw-rw-  0 75     232      6167 Apr 21  1983 ./emacs4.2/man/lcontents.mss
-rw-rw-rw-  0 75     232      1841 Feb  6  1983 ./emacs4.2/man/emacs.mss
-rw-rw-rw-  0 75     232       845 Jun 30  1981 ./emacs4.2/man/emacs.1
-rw-rw-rw-  0 75     232      9602 Apr 10  1983 ./emacs4.2/man/process.mss
-rw-rw-rw-  0 75     232      8977 Apr 21  1983 ./emacs4.2/man/emacs.otl
-rw-rw-rw-  0 75     232      7352 Dec  6  1982 ./emacs4.2/man/rmail.mss
-rw-rw-rw-  0 75     232      5280 Apr 10  1983 ./emacs4.2/man/refcard.mss
-rw-rw-rw-  0 75     232      3603 Aug  5  1982 ./emacs4.2/man/bufferedit.mss
-rw-rw-rw-  0 75     232       714 Aug  5  1982 ./emacs4.2/man/capword.mss
-rw-rw-rw-  0 75     232      1216 Aug  5  1982 ./emacs4.2/man/ind-region.mss
-rw-rw-rw-  0 75     232      4481 Jan 16  1983 ./emacs4.2/man/incr-search.ms
-rw-rw-rw-  0 75     232      2401 Apr 21  1983 ./emacs4.2/man/emacs.err
-rw-rw-rw-  0 75     232    246924 Apr 21  1983 ./emacs4.2/man/emacs.doc

There are probably many other utilities that handle tap files, but SIMH3 and Open SIMH are the canonical tap producers and consumers.

@thaliaarchi
Copy link
Author

Thanks for the spec reference! I couldn't find it searching with the keywords I knew. All I could find details for related to some TAP format was a Commodore waveform-based TAP format; obviously not the format used here.

@johnsonjh
Copy link

johnsonjh commented Jul 13, 2024

@thaliaarchi

According to some authorities (whom I won't name) "SIMH tap" is an awful format, and I don't completely disagree:

"SIMH .tap is an abomination that should have been replaced decades ago. It, for example, is completely inadequate to describe a tape that was read with a bad block."

In the "basic magtape representation" — the only part that is actually required, and thus really implemented as as global "de facto" standard in wide use, this is the still the case.

While the "Extended Specification" does provide a facility for describing bad and possibly bad reads via class 8 markers, it also enables "private application specific" extensions that don't make a whole lot of sense generically.

The tap spec does answer back to the complaint noted above, with a "Standard Specification" as a subset of the above noted extended specification, but error handling is still unfortunately complex, and this "Standard" specification isn't widely implemented by third party tools. Even when implemented correctly, actual tap files are often non-compliant or malformed, especially those produced by ad hoc tooling and not a SIMH simulator.

The real problem with a new base "standard" for the tap format, derived from the two previous standards, means you now just have three standards. We never learn.

On the other hand IMO, the tap format does work sufficiently well enough in the emulation context, just not too great beyond that.

A better portable interchange standard is, again, IMHO, IBM AWSTAPE format. There are good 3rd party tools (and least for intact tapes) that support making AWSTAPE files from real tape, such as Hercules tapecopy.

The true "gold standard" for archiving and preservation of tape streams, including those with damage or unknown data, is Eric Smith's tapeutils. I think this should the primary tool and format used for recovery, with the resulting files able to be transformed to the desired format. I don't actually know if such tools are widely available, if they are available at all.

This is all purely academic, of course, because the file we have here is a "tap", after all, but I thought this was all at least worth mentioning, as someone might stumble upon it in the future, and find the commentary useful.

@johnsonjh
Copy link

johnsonjh commented Jul 13, 2024

@thaliaarchi

I also agree that "tap" is a terrible name and file extension, since "TAP" is another widely used format for (different) tapes, as you found out. I also seem to recall there was some differences in "old" vs. "new" SIMH tap files, but don't quote me. I wish these files would have been called ".sht" or ".simtape" by default. Also because "TAP" in wider use as the "Test Anything Protocol".

There are many more tape file formats that exist too, such as the TPC tape format, the tape files used by the E11 PDP-11 emulator, and the "P7B" format used to archive 7-track tape data. Some utilities to manipulate these various formats can be found at simh/simtools. Other 3rd party tools exist at prirun/p50em/util, muntap, and elsewhere.

I wrote the originally mentioned tap2raw because every tool available had at least one shortcoming or bug, at least in the very specific context of handling DPS8M tap files, and we wanted our users to have a portable and supported utility to decapsulate them.

Welcome to the obscure world of legacy tape archiving.

@thaliaarchi
Copy link
Author

Welcome to the obscure world of legacy tape archiving.

Thanks for the introduction! I had some hints that there was a need for such a format when I was experimenting with writing an archival-quality tar reader and reading the V7 sources, seeing the constraints of tape drives make it into the format like the 512 byte block size, and also given the mass of historical software archived on tapes. Unfortunately or not, I haven't had the pleasure of using a physical tape drive.

This info is all wonderfully collected. You should organize it into a resource accessible for others to benefit from, perhaps as a documentation companion to your reader.

@johnsonjh
Copy link

johnsonjh commented Jul 13, 2024

@thaliaarchi @larsbrinkhoff I should, but the result would likely end up offending everyone (and at least someone). This is a small world, so it might not be worth it.

There is also tension to say the least - really, a never-ending battle - between the often contrary goals of emulation and preservation, with hard battles often being fought where these two meet. There is a battle of personalities, because, like I said before, it's a small world. There is also a battle between simplicity and generality, where the more general something becomes (for either preservation or emulation), needed complexity starts to explode.

It was in all of these places, although in the context of disks and not tapes, that one such battle was fought, which was one of the big reasons that resulted in Open SIMH forking from SIMH4, further splitting the SIMH community from two to three branches, proving there can be very real consequences. (Although many contributors work on all three projects still.)

@johnsonjh
Copy link

This could potentially be a future topic for a post at https://dps8m.gitlab.io/blog/

@johnsonjh
Copy link

johnsonjh commented Jul 13, 2024

@thaliaarchi

Unfortunately or not, I haven't had the pleasure of using a physical tape drive.

Soon, nobody really will, because to my knowledge, no one is making new 7 or 9-track tape media.

While the production of new media is likely non-trivial, even if I'm wrong, it surely wouldn't be cost effective.

DECtape and QIC still see use, and modern formats like LTO live on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants