If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Lossless file compression

Problem

Byte pair encoding is a compression algorithm that replaces the repeated pairs of characters in a string with a character that isn't in the data and creates a table of replacement mappings.
Here's the output from a byte pair encoding:
Ze mXe Zat yB rFd, Ze mXe Zings yB wiJ know

Ze mXe Zat yB lFrn, Ze mXe places yB’J go
Here's the replacements table:
originalreplacement
thZ
orX
ouB
eaF
llJ
Decode the compressed string to discover the original string, a quote from Dr. Seuss:
"
e m
e
at y
r
d,
e m
e
ings y
wi
know
e m
e
at y
l
rn,
e m
e places y
go"
Stuck?
Stuck?