Skip to content

Manu-sh/http_normalizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Http Normalizer

# installing dependencies for tests
yay -S --noconfirm doctest

# clone the project and fetch any submodules
git clone [email protected]:Manu-sh/http_normalizer.git
cd http_normalizer
git submodule update --init --recursive 




# to build only http_normalizer
cd http_normalizer
mkdir -p build && cd build
cmake ..
make -j`nproc --all` && make test

# to build and install the php-extension
cd php-extension
make
sudo make install

for building the php extension phpcpp is required

The trailing slash is always removed.

http_normalizer rely on http_normalizer_parts for a comprensive list of normalization performed see http_normalizer_parts. you can find more examples here.

Copyright © 2020, Manu-sh, [email protected]. Released under the MIT license.

About

http url normalization for web crawlers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published