Fast YM file format

The FYM tools are now available on GitHub (source code + binaries). v05 released on 17/05/2020!

When I started to work on Latecomer, the first problem I tried to solve was getting a music.

Fortunately, Grouik/FT did some amazing work back in 2015 by taking Atari ST songs and replaying them on the Mockingboard. That’s actually what convinced me to try to do demos on Apple II: because I discovered that we could play real tunes.

The Mockingboard is a sound card released in 1981 for all Apple IIs (except the IIc of course). It consists mostly of two 6522 VIAs that are connected to two AY-3-8310 chips (for left/right channels) running at system frequency (~1Mhz). The VIAs are very important too because both expose two timers that can trigger the 6502 interrupt…
A timer is very convenient to play music, and the Apple II doesn’t have any.

I will talk about those VIAs later because they’re why a Mockingboard is mandatory to make « modern » Apple II effects.

The main issue in order to play music on Apple II is… well, to have a player. And there’s not a lot of choice, because all computers that had AY or YM soundchips were based on Z80 or 68000 – except for the Oric.
On the opposite side, the other widely available other 6502 computer is the C64, and it has its own SID chipset.

That leaves us with almost nothing, so recently people started to come with their own answers.

PT3 (Vortex) player

In 2019, deater released a PT3 (ProTracker) player for Apple II… and French Touch did also release a demo with another PT3 player.

It’s a very common format in the ZX Spectrum demoscene and it has many advantages.

First, it has a compatible Windows Tracker, the Vortex tracker, that can handle dual AY setups with 6 channels, because such a card is available on ZX Spectrum.

Ultimate Music Tracker Base: VORTEX TRACKER II
Vortex Tracker in 3-channels setup

Next, you can setup the chipset frequency in the tracker. It means you can compose for the 1Mhz Mockingboard, while the ZX Spectrum’s AY runs at 1.79Mhz (all the sound frequencies are derived from the main frequency).

And of course, it’s a very compact format, so it fits into the Apple II memory nicely.

The drawback is that the PT3 format is poorly documented. French Touch’s 6502 player is actually a conversion of the Z80 player while deater’s was written from scratch.

It takes a lot of CPU time and the execution time can vary greatly from a frame to another. I other word, if you want a really fast player with a nearly-fixed execution time, it won’t fit.

ARKOS Tracker 2 / CHPNSFX

Those are French Touch’s tests of porting the player for the ARKOS Tracker 2 and CHPNSFX tracker formats.

The CHPNSFX is a very lightweight format so it can be useful in some cases. However you need to find a composer that knows how to use the tracker.

The ARKOS Tracker can do soundtrack and export them as RAW, that’s what Grouik implemented with a lot of tricks to replay it. Of course the 6502 is used at 100% when playing the tune 😀

Don’t try this at home

YM Format

The YM format was the first one to be played on the Mockingboard. It’s a format created by Leonard/Oxygene for the first Windows AY emulator ever (ST-Sound), and it’s as simple as it’s well documented.

Basically, it’s a dump of all AY registers for every video frame refresh which is the packed in LZH. There’s some more things to know about special effects and how the registers are written into the files, but that’s how it works.

Of course it makes a very, very good candidate for an Apple II player since we can dump a YM file from quite any other AY format.

(you can use AY_Emul to convert any supported format to YM)

The main issue with the YM format is the LZH compression, because it’s not a format fit for old platforms. You can either unpack it before playing, or unpack it on the fly (which is impossible for a 6502 at 1Mhz).

For a 3mn song, the unpacked data takes around 126kB of RAM… so well, we can’t use it directly.

MYM Format

The MYM format is basically the YM format with a different compression scheme. It’s been developed by Marq for MSX and ported to 6502 by DBUG/Defence Force for Oric. It’s tailored for low-end platforms.

It packs registers together by discarding bits that are unused by the AY (read the description here).
The player then unpacks data on-the-fly and rebuilds the register data from the unpacked stream.

I don’t really know how fast the player is. The source code of the converter is available here.

Grouik used the MYM format for his 6-channels player in « Pure Noise » (2016).

6-channels MYM replay

FYM format

The YM format and MYM format seemed good, but I was bothered by the fact they take a « dumb » approach of packing the data without making any sense of it, and without looking for redundancy first.

And we’re talking about chiptune music, which is generated by a tracker, which works on a sequence/pattern basis. So OF COURSE there should be some form of redundancy in the final register dump.

After many tests and iterations, I ended with the FYM file format, which is a very light file format that can easily be unpacked on the fly on a 6502 and takes the minimum possible memory. It’s also byte-based, where MYM needs bit-level operations.

YM>FYM Converter

The converter takes a YM file as input (YM3 to YM6). No special effect (digi-drums, SID…) is handled.

Then, it looks for redundancy, based on a certain pattern length.

Let say we start with a length of 16, the algorithm is:

  • Take all 14 registers’ dump individually for the whole tune
  • Break them into patterns of 16 bytes (16 frames)
  • Look for redundancy in all 16-bytes patterns, regardless of registers
  • Pack all individual deduplicated patterns using Packbits (byte-based RLE)
  • Reorganize all packed patterns in a way they never cross a 256-bytes boundary (better for the 6502 player) using a bin packing algorithm
  • Replace the original dump by 16 bits pointers to the packed patterns

Eventually, for 14 registers * 16 bytes (224 bytes), we get only 14 * 2 bytes (28 bytes), so the original dump is reduced by 8!

But of course you need to add the packed patterns data. And the smaller the pattern’s length is, the more pointers you need to add in the file. On the other hand, a large length will result in less deduplication.

In order to find the correct balance, I simply use brute force and try any length between 16 and 128. Actually every resulting « best length » was a power of 2 and fit perfectly with the rhythm of the tune. COULD YOU BELIEVE IT.

Rollout2.ym = 3,305 bytes
(unpacked) = 170,488 bytes
MYM = 11,740 bytes
FYM = 13,568 bytes

In addition, the final file can also be packed using a 6502-friendly compression scheme.
For rollout2.ym (above example), the final results with LZ4 are:

  • MYM+LZ4 – 8,722 bytes
  • FYM+LZ4 – 5,481 bytes

FYM Player

The first thing the player has to do is to relocate the pointers (the « partition ») because the pointers are actually offsets from the start of the file.
On Apple II there’s only need to change the high byte value of each pointer because the file is always located at a 0x0100 boundary.

The length of the patterns is given at the start of the file, and it never changes. That allows the player to stop unpacking patterns and fetch another batch of pointers based on length instead of looking for the 0x80 end marker used in packbits (more bytes saved!).
Then it has to maintain data for unpacking 14 parallel patterns, which is not a lot of work and very 6502-friendly.

The only drawback is when the player reaches the end of the current patterns, it must fetch 14 pointers and reset 14 packbits data, which is the worst case you can get. It may be avoided by preparing that data beforehand in a progressive way.

The white bar is the CPU time taken by the final player
(average case, around 20 lines, 7% on a PAL config).
Yes you can see that on Apple II. Remember it’s a 1Mhz 6502.

The player was based on French Touch’s first YM player, with additional data displayed. The reverb is due to the fact it’s not possible to program the two AYs at exactly the same time.

The source code for the generator (C#) and the 6502 player (+binaries) is available on GitHub.

  • Registers (line 1) = register number (00, 11, 22… DD)
  • Registers (line 2) = register value
  • Patterns ptr (line 1) = High byte of current pattern pointer for this register
  • Patterns ptr (line 2) = Low byte of current pattern pointer for this register
  • Partition ptr = Pointer to the current set of 14 patterns pointer, followed by the remaining length to decode
  • CPU = cycles used by the stereo player (complete frame: 20280 cycles in PAL, 17030 cycles in NTSC)

You can see that the « sequence » pointers progress only when the registers change, which is of course a consequence of the Packbits/RLE scheme.

FYM File Format

Given the explanations above, you can easily make your own file format, but here’s the one I used for v05:

  • 1 byte = 0x00 (reserved)
  • 1 byte = Patterns length
  • 2 bytes = Offset from the start of file to the start of partition
  • n bytes = packed register patterns
  • n times * [ 14 bytes (HIGH BYTE) + 14 bytes (LOW BYTE) = offset to the packed pattern for each register ] = partition
  • 2 bytes = 0x0000, end of partition

This format can still be improved, because it happens that in some tunes, some registers are not used, so it’s useless to save them (deleting 1 register reduce the partition size by 7%).

One of the advantage of this file format is that you can easily rework any tune, since the patterns’ length often match the logical length of the original tracker patterns.

That what I did in Latecomer, I cut out a part of the song (with the authorization of Big Alec) to make the demo shorter. And it was originally a YM file! It’s only a matter of deleting some pointers in the regenerated « partition ». I actually made an array in the generator containing the final partition I wanted. As I was out of time though, I did reduce the partition without regenerating the packed patterns, so the filler between the partition and the data is huge and there is surely unused patterns left. But the final file size is 12kB anyway!

PYM format

Late addition to this article, because I find it interesting.

Grouik/FT worked on a very similar file format, except that the player is even faster than the FYM one.

It uses basically the same deduplication scheme of FYM for the YM dump, except that he uses a simplified version of Packbots/RLE that handles only the deduplication case for patterns. So a pattern that has no redundancy like 01 02 03 04 05 06 07… will always pack as 01 01 01 02 01 03 01 04 01 05 01 06 01 07 (n redundancies of 1 byte) and be twice the size than the original data. On the other hand, the player code is far more simpler because there’s only two cases to handle (duplicate byte & load a control byte as new counter) where Packbits has three with a more complex way to handle the control byte. As a result, the PYM player is also more compact and it’s far easier to unroll code.

Grouik also used a file format which is more 6502-friendly, with pointers split between high/low bytes, and a maximum of 256 lines in the main tune map to use 6502’s indexed mode. In the end it allows for around 65k frames (enough for 21mn of music) so I guess FYM file format could take advantage of the same formatted pointers list.

PYM is basically simpler and better optimized for 6502, while the player is faster than FYM. The drawback being that the resulting files are around 20% larger than FYM.

1 réflexion sur “Fast YM file format”

Les commentaires sont fermés.