How to read a snappy-compressed text file?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

How to read a snappy-compressed text file?

innowireless TaeYun Kim



Maybe this is a newbie question: How to read a snappy-compressed text file?


The OS is Windows 7.

Currently, I’ve done the following steps:


1. Built Hadoop 2.4.0 with snappy option.

‘hadoop checknative’ command displays the following line:

snappy: true D:\hadoop-2.4.0\bin\snappy.dll

So, I assume hadoop can do snappy compression.

BTW, snapp.dll was copied from snapp64.dll file in snappy-windows-


2. Added the following configurations to both core-site.xml and yarn-site.xml.






3. Added the following environment variable.


Since I use IntelliJ, the above line was included to the Environment variables section in Run Configuration.


4. Compressed the input text file with snzip.exe which was included in snappy-windows-


4. Wrote the code.

sc.textFile(compressed_file_name);  // no other argument.


Now when I run my spark application, the results are as follows:


1. ‘snappy’ string cannot be found in DEBUG log.

The most relevant logs are as follows:

14/06/12 18:57:55 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library...

14/06/12 18:57:55 DEBUG NativeCodeLoader: Loaded the native-hadoop library

2. Application fails. The log is as follows:

14/06/12 18:57:57 WARN: int from string failed for: [(some binary characters)]


So apparently sc.textFile() does not recognize the file format and read it as-is, so map function receives a garbage.


How can I fix this?