𝗜 𝗥𝗲𝗽𝗹𝗮𝗰𝗲𝗱 𝗝𝗦𝗢𝗡 𝗪𝗶𝘁𝗵 𝗔 𝗖𝘂𝘀𝘁𝗼𝗺 𝗕𝗶𝗻𝗮𝗿𝘆 𝗙𝗼𝗿𝗺𝗮𝘁 𝗜𝗻 𝗣𝗛𝗣
We needed a better way to store rich text for our apps and websites.
At first, we stored raw HTML in our database. Later, we switched to JSON to separate editing from printing. JSON worked for a while, but it created new problems as we grew.
JSON became too slow for our needs.
When we needed to search and update old links, we had to parse and rebuild entire arrays. This process was slow and inefficient. We needed a format that allowed for faster data manipulation and streaming.
I decided to build a custom binary format using PHP.
Many developers think PHP is not built for low-level byte management. However, PHP has built-in functions for this:
- pack(): Converts numbers or strings into raw bytes.
- unpack(): Converts those bytes back into numbers or strings.
I stopped using multibyte string operations. Instead, I focused on reading specific chunks of bytes. For example, an unsigned 64-bit integer requires exactly 8 bytes. Precision is key when working with binary data.
I also changed how I structured the data.
Most people try to keep documents in a deep, nested tree structure. This is often a mistake. I switched to a flat list of elements like text, tags, and lists. I use a simple tree of offsets to rebuild the HTML. This makes tasks like finding all links or stripping HTML very fast.
The results from 10,000 loops show a clear winner:
Old JSON Encoding: 2.18s Old JSON Decoding: 0.86s
New Binary Encoding: 1.19s New Binary Decoding: 0.67s
The new format is faster for both encoding and decoding.
The binary format is larger than JSON or HTML. It takes up about twice the space of JSON. Since we use server-side rendering, this storage increase does not affect our performance.
The trade-off is worth it. We can now change links and clean up HTML with simple, fast functions.
If you build a custom format, write a clear specification. You will need it when you return to the code later.
Source: https://dev.to/tomj/i-replaced-json-with-a-custom-binary-format-in-php-mok