Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the SequenceFile.Reader support LzoCodec ? #12

Open
ekta1007 opened this issue May 27, 2015 · 1 comment
Open

Does the SequenceFile.Reader support LzoCodec ? #12

ekta1007 opened this issue May 27, 2015 · 1 comment

Comments

@ekta1007
Copy link

I have a sequence file with LzoCodec, that I am unable to read through the module .

from hadoop.io import SequenceFile
fh='/home/ekta/my_file'
reader = SequenceFile.Reader(fh)

first few lines in the file I am trying to read

SEQ org.apache.hadoop.io.Text com.bloomreach.proto.PwfPixelLog #com.hadoop.compression.lzo.LzoCodecF��7�u_�v �W Y�d����F��7�u_�v �W Y�du u

'
'd`

+8 � ` $

It seems to me that it is searching for a decompressor , but unable to find one. If this is supported, What am I doing wrong ?
Also, I installed hadoop-lzo from here, https://github.com/twitter/hadoop-lzo - though I see that the
com.hadoop.compression.lzo

Traceback (most recent call last):
File "/home/ekta/CUSTOM_WORK/protobuf.py", line 3, in
reader = SequenceFile.Reader(fh)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 288, in init
self._initialize(path, start, length)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/SequenceFile.py", line 478, in _initialize
self._codec = CodecPool().getDecompressor(codec_class)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/io/compress/CodecPool.py", line 34, in getDecompressor
codec_class = ReflectionUtils.hadoopClassFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 24, in hadoopClassFromName
return classFromName(class_path)
File "/home/ekta/Downloads/Hadoop/python-hadoop/hadoop/util/ReflectionUtils.py", line 44, in classFromName
module = import(module_name, globals(), locals(), [str(class_name)], -1)
ImportError: No module named com.hadoop.compression.lzo

in the hadoop-lzo package, I do see "com.hadoop.compression.lzo" - is it that the program is unable to find this class in hadoop-lzo . In the dist packages , I have Hadoop-0.1.4-py2.7.egg _lzo.so*, lzo.py, python_lzo-1.0.egg-info

I believe that com.hadoop.compression.lzo.LzoCodec.java might be needed to read my file as above ?

:~/Downloads/hadoop-lzo$ tree
[..more ]

| | |-- com
| | | | |-- hadoop
| | | | | |-- compression
| | | | | | `-- lzo
| | | | | | |-- CChecksum.java
| | | | | | |-- DChecksum.java
| | | | | | |-- DistributedLzoIndexer.java
| | | | | | |-- GPLNativeCodeLoader.java
| | | | | | |-- LzoCodec.java
| | | | | | |-- LzoCompressor.java
| | | | | | |-- LzoDecompressor.java
| | | | | | |-- LzoIndex.java
| | | | | | |-- LzoIndexer.java
| | | | | | |-- LzoInputFormatCommon.java
| | | | | | |-- LzopCodec.java
| | | | | | |-- LzopDecompressor.java
| | | | | | |-- LzopInputStream.java
| | | | | | |-- LzopOutputStream.java

@talglobus
Copy link

Running into this same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants