The LZW algorithm

أرسل من قبل Nadia في الثلاثاء, 2004/06/29 - 12:26pm.
عضو فعال
صورة Nadia

تاريخ التسجيل: 2004-04-29
مشاركات: 568

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: السنة الخامسة
الاختصاص: هندسة برمجيات

The LZW (Lempel-Ziv Welch) algorithm is a very common compression technique.It is used in GIF files.

This algorithm is surprisingly simple, LZW compression replaces strings of characters with single codes. It does not do any analysis of the incoming text. Instead, it just adds every new string of characters it sees to a table of strings. Compression occurs when a single code is output instead of a string of characters.

One reason for the efficiency of the LZW algorithm is that it does not need to pass the string table to the decompression code. The table can be built exactly as it was during compression, using the input stream as data.

By the way the LZW algorithm protect patent .. that means if you want to sell a software using the algorithm LZW for data processing, must have the licence from Unisys Corporation and this licence is paid. Only end users and non-commercial organizations are released from payment.

To be continued...

 
دخول أو تسجيل لإرسال التعليقات | قراءة: 918

خيارات عرض التعليقات

اختر طريقتك المفضلة لعرض التعليقات و اضغط "حفظ الإعدادات" لتفعيل تغييراتك.
الثلاثاء, 2004/06/29 - 12:58pm
عضو فعال
صورة Firas

تاريخ التسجيل: 2004-02-27
مشاركات: 1475

الجامعة: دمشق
الكلية: الهندسة المعلوماتية
المرحلة: متخرج
الاختصاص: ذكاء صنعي

Adobe use this Algorithm in Acrobat reader ,PDF Files

Some PDF files use a compression algorithm called LZW to store text. This text cannot be extracted without using LZW decompression. Unisys Corp. claims a patent on LZW decompression and therefore controls the terms under which we can make PDF-LZW support available

Not all PDF files contain LZW-compressed text. Newer PDF files -- some PDF files created with Acrobat 3, and nearly all PDF files created with Acrobat 4 and 5 -- use "Flate" compression, which is not patented. Therefore, LZW support is only needed to index older PDF files.

i think it's a Dutch Algorithm Rolling Eyes

 
دخول أو تسجيل لإرسال التعليقات
الثلاثاء, 2004/06/29 - 5:05pm
مشرف
صورة Bilal

تاريخ التسجيل: 2004-02-19
مشاركات: 353

الجامعة: دمشق
الكلية: الهندسة المعلوماتية
المرحلة: السنة الخامسة
الاختصاص: ذكاء صنعي

Great!
Thank you.
I , or another, may speak a little about Huffman's Algorithm

 
دخول أو تسجيل لإرسال التعليقات
الخميس, 2004/07/01 - 5:35pm
عضو فعال
صورة strontium90

تاريخ التسجيل: 2004-04-21
مشاركات: 3100

الجامعة: حلب
الكلية: الهندسة المعلوماتية
المرحلة: ماجستير
الاختصاص: ذكاء صنعي


Interesting thread. Smile

Yestreday I read this bit in GNU gzip's (a compression/decompression utility) manual page:

Gzip reduces the size of the named files using Lempel-Ziv coding algorithm (LZ77) also in used in zip and PKZIP. The amount of compression obtained depends on the size of the input and the distribution of common substrings. Typically, text such as source code or English is reduced by 60−70%. Compression is generally much better than that achieved by LZW (as used in compress), Huffman coding (as used in pack), or adaptive Huffman coding (used in compact).

.....

Anybody knows what algorithm does WinZip rely on?

P.S.
Nadia let's hear the rest of the story. Wink

Read the rules
Use the search engine

Believe in healthy, hearty laughter, at the expense of the whole human race, if needs be.
H. Allen Smith

 
دخول أو تسجيل لإرسال التعليقات
الخميس, 2004/07/01 - 8:55pm
عضو فعال
صورة Nadia

تاريخ التسجيل: 2004-04-29
مشاركات: 568

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: السنة الخامسة
الاختصاص: هندسة برمجيات


Thank you فراس Bilal and strontium90 for your comments. Smile

Let's get started.
Here are some definitions you may need to understand the algorithm.

Character: a fundamental data element. In normal text files, this is just a single byte. In raster images, which is what we're interested in, it's an index that specifies the color of a given pixel. We'll refer to an arbitray character as "K".
Charstream: a stream of characters, as in a data file.
String: a number of continuous characters, anywhere from one to very many characters in length. We can specify an arbitrary string as "[...]K".
Prefix: almost the same as a string, but with the implication that a prefix immediately precedes a character, and a prefix can have a length of zero. So, a prefix and a character make up a string. We will refer to an arbitrary prefix as "[...]".
Root: a single-character string. For most purposes, this is a character, but we may occasionally make a distinction. It is [...]K, where [...] is empty.
Code: a number, specified by a known number of bits, which maps to a string.
Codestream: the output stream of codes, as in the "raster data"
Entry: a code and its string.
String table: a list of entries; usually, but not necessarily, unique.

>>First we initialize our string table.
To do this, we need to choose a code size (how many bits) and know how many values our characters can possibly take.
Let's say our code size is 12 bits, meaning we can store 0->FFF, or 4096 entries in our string table.
Lets also say that we have 32 possible different characters. (This corresponds to, say, a picture in which there are 32 different colors possible for each pixel.)
To initialize the table, we set code#0 to character#0, code #1 to character#1, and so on, until code#31 to character#31.

>>Now we start compressing data.
Let's first define something called the "current prefix". It's just a prefix that we'll store things in and compare things to now and then. We will refer to it as "[.c.]". Initially, the current prefix has nothing in it.
Let's also define a "current string", which will be the current prefix plus the next character in the charstream. We will refer to the current string as "[.c.]K", where K is some character.
OK, look at the first character in the charstream. Call it P.
Make [.c.]P the current string. (At this point, of course, it's just the root P.)
Now search through the string table to see if [.c.]P appears in it. Of course, it does now, because our string table is initialized to have all roots. So we don't do anything.
Now make [.c.]P the current prefix.
Look at the next character in the charstream. Call it Q. Add it to the current prefix to form [.c.]Q, the current string.
Now search through the string table to see if [.c.]Q appears in it. In this case, of course, it doesn't.
Aha! Now we get to do something. Add [.c.]Q (which is PQ in this case) to the string table for code#32, and output the code for [.c.] to the codestream.
Now start over again with the current prefix being just the root Q.
Keep adding characters to [.c.] to form [.c.]K, until you can't find [.c.]K in the string table. Then output the code for [.c.] and add [.c.]K to the string table

The algorithm goes something like this:

[1] Initialize string table;
[2] [.c.] <- empty;
[3] K <- next character in charstream;
[4] Is [.c.]K in string table?
(yes: [.c.] <- [.c.]K;
go to [3];
)
(no: add [.c.]K to the string table;
output the code for [.c.] to the codestream;
[.c.] <- K;
go to [3];
)
It's as simple as that! Of course, when you get to step [3] and there aren't any more characters left, you just output the code for [.c.] and throw the table away. You're done. You don’t need the table anymore Cool

To be continued with an example

I know it's long but I couldn't make it any shorter
I am ready for questions Rolling Eyes

 
دخول أو تسجيل لإرسال التعليقات
الجمعة, 2004/07/02 - 4:38pm
عضو فعال
صورة strontium90

تاريخ التسجيل: 2004-04-21
مشاركات: 3100

الجامعة: حلب
الكلية: الهندسة المعلوماتية
المرحلة: ماجستير
الاختصاص: ذكاء صنعي


Thanks, Nadia, for taking the time and writing the above algorithm.

Allow me two inquiries:
- No coding scheme was explicitly stated. Is it some hashing technique ? Or, as simple as the integral equivalent of the current character in the machine's character set?
- Do you intend to present an examlpe in source form, or just outline one ? (Asking this because I'm itching to code something as I haven't done so for some time now..)

Read the rules
Use the search engine

Believe in healthy, hearty laughter, at the expense of the whole human race, if needs be.
H. Allen Smith

 
دخول أو تسجيل لإرسال التعليقات
السبت, 2004/07/03 - 1:06pm
عضو فعال

تاريخ التسجيل: 2004-06-13
مشاركات: 86

nadia ,thanks alot for yuor efforts & explaning
but would you please add an example
Embarassed if you don't mind of course

 
دخول أو تسجيل لإرسال التعليقات
السبت, 2004/07/03 - 2:43pm
عضو فعال
صورة Kinan

تاريخ التسجيل: 2004-04-23
مشاركات: 667

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: متخرج
الاختصاص: هندسة برمجيات

LZW is use by Winzip ....and Acrobat reader...
if u have the 6th version of the program u can notice taht in the start up logo..
for more information us this link..

http://www.dogma.net/markn/articles/lzw/lzw.htm

 
دخول أو تسجيل لإرسال التعليقات
السبت, 2004/07/03 - 3:28pm
عضو فعال
صورة Nadia

تاريخ التسجيل: 2004-04-29
مشاركات: 568

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: السنة الخامسة
الاختصاص: هندسة برمجيات


Thanks smile and strontium90
Kinan this is a useful link thanks.. Smile
strontium90 I think this example will answer .

Let’s have an alphabet {A,B,C,D}
And a charstream ABACABA ..
Now we want to compress it.
First we initialize our string table to: #0=A, #1=B, #2=C, #3=D then we look at the first character A is it in the string table?Yes it is,so [.c.] becomes A
Next we get AB which is not in the table ,so we output code #0 (for [.c.]), and add AB to the string table as code #4. [.c.] becomes B .
Next we get [.c.]A = BA, which is not in the string table, so output code #1, and add BA to the string table as code #5. [.c.] becomes A.
Next we get AC, which is not in the string table. Output code #0, and add AC to the string table as code #6. Now [.c.] becomes C.
we get [.c.]A = CA, which is not in the table. Output #2 for C, and add CA to table as code#7. Now [.c.] becomes A.
we get AB, which is in the string table, so [.c.] gets AB, and we look at ABA, which is not in the string table, so output the code for AB, which is #4, and add ABA to the string table as code #8. [.c.] becomes A.
We can't get any more characters, so we just output #0 for the code for A, and we're done. So, the codestream is #0#1#0#2#4#0.

If the algorithm in Kinan's link was not clear I will try to answer Mr. Green

 
دخول أو تسجيل لإرسال التعليقات
الأحد, 2004/07/04 - 11:11am
عضو فعال
صورة rima

تاريخ التسجيل: 2004-06-18
مشاركات: 185

غريبة ........ فهمتا .........
شكراً كتير ناديا ........ كانت ممتعة .......

 
دخول أو تسجيل لإرسال التعليقات
الأحد, 2004/07/04 - 12:27pm
عضو فعال
صورة Nadia

تاريخ التسجيل: 2004-04-29
مشاركات: 568

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: السنة الخامسة
الاختصاص: هندسة برمجيات

كتب rima:
غريبة ........ فهمتا .........
شكراً كتير ناديا ........ كانت ممتعة .......
ليش غريبة ما شاء الله عليك Smile
ولو أنا جاهزة لأي سؤال.
لسا في خوارزمية فك الضغط و هي موجودة باللينك تبع كنان.

 
دخول أو تسجيل لإرسال التعليقات
السبت, 2004/07/24 - 1:59pm
عضو فعال
صورة Kinan

تاريخ التسجيل: 2004-04-23
مشاركات: 667

الجامعة: تشرين
الكلية: الهندسة المعلوماتية
المرحلة: متخرج
الاختصاص: هندسة برمجيات

http://www.cs.sfu.ca/cs/CC/365/li/squeeze/LZW.html

و هادا لينك تاني فيه applete توضيحي

 
دخول أو تسجيل لإرسال التعليقات
الأربعاء, 2007/09/05 - 12:05am

تاريخ التسجيل: 2007-08-06
مشاركات: 3

الجامعة: غير ذلك
الكلية: غير ذلك
المرحلة: السنة الرابعة
الاختصاص: غير ذلك

الى الاخت ناديا انا بحاجة الى شرح كامل ودقيق عن ال huffman code وطريقة عملة لضغط الملفات اذا سمحتي

 
دخول أو تسجيل لإرسال التعليقات
الجمعة, 2007/09/07 - 10:44pm

تاريخ التسجيل: 2007-08-06
مشاركات: 3

الجامعة: غير ذلك
الكلية: غير ذلك
المرحلة: السنة الرابعة
الاختصاص: غير ذلك

i need to know more about huffman code and how i can programming it in c sharp and how to deals with pointers in same c sharp

 
دخول أو تسجيل لإرسال التعليقات
الجمعة, 2008/04/04 - 6:07pm

تاريخ التسجيل: 2008-04-04
مشاركات: 2

الجامعة: غير ذلك
الكلية: الهندسة المعلوماتية
المرحلة: السنة الأولى
الاختصاص: غير ذلك

Hi everyone, I just need a help in this demonstration, I can't understand just how it become like this: (000, 1)(000, 0)(001, 0)(001, 1)(010, 1)(011, 1)(101, 0)(110, 1) pls anyone can help?

This is the explanation: Example 8 Derive the code for the string: 101011011010101011 Begin by parsing the string into comma-separated phrases that represent strings that can be represented by a previous string as a prefix plus a bit. The first bit has no predecessor, so it is assigned a null prefix string and the extra one bit is itself: 1, 01011011010101011 The next bit is a 0 and we have: 1, 0, 1011011010101011 The dictionary contains the strings ”1” and ”0”. The next bit is a ”1” and we already encounter it. So, we proceed further and read ”10” which is a combination of the prefix ”1” and a ”0”. Hence our parsing becomes: 1, 0, 10, 11011010101011 Finally we end up with: 1, 0, 10, 11, 01, 101, 010, 1011 With the eight phrases we can use a three-bit code to label the addresses of the phrases in the dictionary Next we write the parsed string in terms of the location of the prefix phrase plus the added bit required to generate a new phrase. (000, 1)(000, 0)(001, 0)(001, 1)(010, 1)(011, 1)(101, 0)(110, 1) The coded string is without the comma: 00010000001000110101011110101101

 
دخول أو تسجيل لإرسال التعليقات
السبت, 2008/04/05 - 5:37am

تاريخ التسجيل: 2008-04-04
مشاركات: 2

الجامعة: غير ذلك
الكلية: الهندسة المعلوماتية
المرحلة: السنة الأولى
الاختصاص: غير ذلك

Hi again, I just want to explain that I understand everything except the last step which is (000, 1)(000, 0)(001, 0)(001, 1)(010, 1)(011, 1)(101, 0)(110, 1) how he mead like this???

 
دخول أو تسجيل لإرسال التعليقات