Search #26

gerbrent · 2022-07-05T01:26:14Z

figure out a good way to integrate search. Clientside like Lunr.js will probably not perform due to index size.

@theZMC recommends:

Since golang seems to be a theme here (which it should be), maybe zinc would fit the bill?

elreydetoda · 2022-07-18T05:43:50Z

If zinc is used (from my understanding) it looks like it'll need to be implemented on the server side (at least for the server which holds all the references to the objects/records).

Also, of note it appears that zinc is still in beta (src):

Project Status:
ZincSearch is in Pre GA (General Availability) and will be marked as production ready at v1.0.0 .

So, it's possible they'll have major breaking changes before the 1.0 release, which means we'll need to make sure we pin the version and read the changelog before upgrading to see if anything's going to break.

While it's not a self-hosted option (and will probably cost money because of how many episodes JB has), as a temporary solution, we could use algolia. I've used them before, and overall it was pretty easy. I've actually got a GH action for doing CI with hugo content as well: https://github.com/Climate-Refugee-Stories/crs-website/blob/c82f394a620b4631bb43de5ca4433d33a51bb292/.github/workflows/cd.yml#L91-L126
(figure we probably won't go with this, but just figured I'd mention it).

elreydetoda · 2022-07-22T03:56:46Z

Meta: BTW, @gerbrent you might want to add a "JB - action needed" tag to this issue since it's discussing cost of running an extra service on a server specifically for search, and if that's something they want to contemplate (because that'll be another service to maintain).

reesericci · 2022-08-04T19:26:04Z

Typesense might be a good option

ironicbadger · 2022-08-15T16:51:08Z

@RealOrangeOne usually has some pretty strong opinions on search.

gerbrent · 2022-08-15T20:16:27Z

I have opinions aswell, from a functionality and end-user perspective.

The search results at notes.jupiterbroadcasting.com is not at all to my liking pretty much every single time I try to use it, which is generally answering a question like "I remember we mentioned that in the last few months, lets see which episode that was from" - This generally gives results sorted by "relevance" which never gets me what I want (and yet I keep trying.....)

I would far prefer chronologically sorted search results.

see Search listing seemingly random, prefer chronological selfhostedshow/show-notes#16

Also on a slow connection, the UX of that current search - the present-results-as-pop-up-in-search-bar behaviour isn't obvious for quite some time till the results load. An annoying UX experience, and slow enough to make me wonder more than once "is this working?"

theZMC · 2022-08-15T22:59:16Z

I'm willing to start putting some serious development work into this. Some clarifying questions:

Do we need full text search?
Any plans for some sort of transcription process (automatic or manual) and if so is there a timeline for when that would be in place?

ironicbadger · 2022-08-15T23:35:49Z

See the search at notes.jupiterbroadcasting.com - that's the type of thing I think is needed. @ChrisLAS @noblepayne or @gerbrent feel free to jump in here.

gerbrent · 2022-08-16T14:39:04Z

see here why search via recency is not supported nor desired by mkdocs:

selfhostedshow/show-notes#16 (comment)

RealOrangeOne · 2022-08-16T15:08:23Z

I agree lunr probably isn't ideal, as the index will be huge. I've written client-side search with Hugo, and it's very simple, but the index may be large given the show history

mkdocs's search is lunr-based. The issue with mkdocs is that the pages have no sense of date, as opposed to being a technological issue. Hugo however does have dates as a concept, so could be done.

For search, I suspect we'd want want something server-side to do it. For ease (of local dev and hosting), scraping the content into sqlite and using its fulltext search would probably be very simple, very powerful and scalable.

Elasticsearch etc are definitely options, but they're very heavy for what we need. As are hosted tools like Algolia, but given the name of one of our shows, that's a less desirable option.

ironicbadger · 2022-08-16T15:42:25Z

Could we run some mock ups with elastic and get a sense of just how heavy? We have the infra to do it I'd wager.

RealOrangeOne · 2022-08-17T14:51:20Z

It's not just heavy in terms of resource. It also makes local development much more of a pain, not to mention is more complex to setup and work with anyway. The container alone is ~550mb compressed.

theZMC · 2022-08-17T15:22:29Z

It's not just heavy in terms of resource. It also makes local development much more of a pain, not to mention is more complex to setup and work with anyway. The container alone is ~550mb compressed.

Unfortunately when it comes to search, I think it's the classic pick two between fast, good, and inexpensive. Though I do agree that full fat elastic is a bit too heavy-handed for our needs.

CGBassPlayer · 2022-09-08T15:49:58Z

So I just found a tool that might be worth using if search is still something we are after. Its called Pagefind and it is a single binary that indexes the site after it is built. There is a video on the home page of their site showing how it works and a basic example.

Its also written in 🎉 Rust 🎉

elreydetoda · 2022-09-08T16:49:35Z

That looks pretty awesome! It's nice that we could just bundle that in an artifact as well. It just comes with the new site build! 🥳

gerbrent · 2022-09-08T18:14:40Z

this DOES look fascinating!

The demo at the top of the page at https://pagefind.app/ is fast - much faster than our current notes.jupiterbroadcasting.com for me on a low end internet connection and low-end hardware.

Pagefind can run a full-text search on a 10,000 page site with a total network payload under 300KB, including the Pagefind library itself. For most sites, this will be closer to 100KB.

that sounds like us ; )

🎯 Another lovely demo: https://xkcd.pagefind.app/

gerbrent · 2022-09-08T18:17:46Z

My big question - can results be sorted by date/recency? I see Pagefind has the concept of "date"

CGBassPlayer · 2022-09-08T18:19:17Z

I don't see why not since it is content on the page. I wonder if we will need a piece of metadata for the date.

But I found this tool about 15 minutes before I commented (Just long enough to watch the video)

gerbrent · 2022-09-08T18:22:21Z

That can be very handy for the JB Archive (a distinct hugo instance):

Pagefind can be configured to search across multiple sites, merging results and filters into a single response. Multisite search configuration happens entirely in the browser, by pointing one Pagefind instance at multiple search bundles.

The following examples reflect Pagefind running on a website at blog.example.com that wants to include pages from docs.example.com in the search results.

https://pagefind.app/docs/multisite/

Changing the weighting of individual indexes

When searching across multiple sites you may want to rank each index higher or lower than the others. This can be achieved by passing an indexWeight option for each index:

https://pagefind.app/docs/multisite/#changing-the-weighting-of-individual-indexes

FlakM · 2022-10-30T05:33:01Z

Hello all! Have you considered https://www.meilisearch.com/ ? It's also an open source project with a valid source of income (they have recently received 15M round o founding). It is very easy to deploy. I'd be more than happy to write backend RSS watcher and some mockups for front end.
As for costs times are crazy but I think I can commit to covering a year of runtime and on call support as value for value 🥰

gerbrent · 2022-11-01T13:21:41Z

oh wow, very generous @FlakM !!!

I'll be curious to hear what others think of MeiliSearch - def worth considering!

FlakM · 2022-11-02T07:52:40Z

I've been recently reviewing alternatives for more traditional ELK stack and hosted options for my employer. Meilisearch has come up on this week in rust so I have also looked into it. Here are some reasons why I think it would be a good fit here:

It uses a very solid backing technologies - ie LMDB which has been designed as a embedded database for openldap by very smart people. If you prefer podcasts here is a great episode about it.
It has a rest API so it could be used without any other backend services apart from the component that will keep data in sync (and maybe some nginx to add some rate limiting/tls etc)
It is dead simple to deploy and maintain - just a single container
It has a front-end code already written so including it is also very simple
It has all of those nice features like typo safety, synonyms etc
It is blazingly fast 🚀 🦀

For your convenience, I've deployed a sample service and loaded the index with contents of all feeds RSS its available here (BTW it is a proper use of Linode credits) secret key is MASTER_KEY. Keep in mind that it is a result of a fast and dirty effort. For a full-blown index, I think it would be useful to also add transcription (I've experience with deep speech so probably not a big problem) and more complete show notes. The current showcase version of code loading data from the RSS feed is available here

gerbrent · 2022-11-03T17:59:01Z

amazing again @FlakM !! Will look at this further in a few days.. thank you!!

elreydetoda · 2023-02-19T13:37:08Z

@kylepotts suggests start of convo & end of convo:

What things have we tried for searching transcriptions? I wonder if taking the output of the transcription and putting it inside something like ElasticSearch/Opensearch and exposing it via an API is overkill? Or if a product like that already exists. Definitely will require a unique way to have a "dynamic" results page in Hugo from where you search.

gerbrent added the enhancement New feature, enhancement, or request label Jul 5, 2022

gerbrent added this to the Hugo Website milestone Jul 5, 2022

gerbrent mentioned this issue Jul 12, 2022

Hosts, Guests - implement filtering on listing page #92

Closed

elreydetoda mentioned this issue Jul 19, 2022

On-air alert functionality (w JBot! sorta...) #41

Closed

gerbrent added JB - need decision decision/consult needed from JB Team question Further information is requested labels Jul 22, 2022

gerbrent modified the milestones: JB.com 1.0, JB.com 2.0 Aug 6, 2022

FlakM mentioned this issue Nov 3, 2022

Transcriptions #301

Open

elreydetoda mentioned this issue Feb 19, 2023

Initial transcription support #494

Open

CGBassPlayer linked a pull request Jan 17, 2024 that will close this issue

Initial Search Support #576

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search #26

Search #26

gerbrent commented Jul 5, 2022

elreydetoda commented Jul 18, 2022

elreydetoda commented Jul 22, 2022

reesericci commented Aug 4, 2022

ironicbadger commented Aug 15, 2022

gerbrent commented Aug 15, 2022

theZMC commented Aug 15, 2022

ironicbadger commented Aug 15, 2022

gerbrent commented Aug 16, 2022

RealOrangeOne commented Aug 16, 2022

ironicbadger commented Aug 16, 2022

RealOrangeOne commented Aug 17, 2022

theZMC commented Aug 17, 2022

CGBassPlayer commented Sep 8, 2022 •

edited

Loading

elreydetoda commented Sep 8, 2022

gerbrent commented Sep 8, 2022 •

edited

Loading

gerbrent commented Sep 8, 2022

CGBassPlayer commented Sep 8, 2022 •

edited

Loading

gerbrent commented Sep 8, 2022 •

edited

Loading

FlakM commented Oct 30, 2022 •

edited

Loading

gerbrent commented Nov 1, 2022

FlakM commented Nov 2, 2022

gerbrent commented Nov 3, 2022

elreydetoda commented Feb 19, 2023

Search #26

Search #26

Comments

gerbrent commented Jul 5, 2022

elreydetoda commented Jul 18, 2022

elreydetoda commented Jul 22, 2022

reesericci commented Aug 4, 2022

ironicbadger commented Aug 15, 2022

gerbrent commented Aug 15, 2022

theZMC commented Aug 15, 2022

ironicbadger commented Aug 15, 2022

gerbrent commented Aug 16, 2022

RealOrangeOne commented Aug 16, 2022

ironicbadger commented Aug 16, 2022

RealOrangeOne commented Aug 17, 2022

theZMC commented Aug 17, 2022

CGBassPlayer commented Sep 8, 2022 • edited Loading

elreydetoda commented Sep 8, 2022

gerbrent commented Sep 8, 2022 • edited Loading

gerbrent commented Sep 8, 2022

CGBassPlayer commented Sep 8, 2022 • edited Loading

gerbrent commented Sep 8, 2022 • edited Loading

FlakM commented Oct 30, 2022 • edited Loading

gerbrent commented Nov 1, 2022

FlakM commented Nov 2, 2022

gerbrent commented Nov 3, 2022

elreydetoda commented Feb 19, 2023

CGBassPlayer commented Sep 8, 2022 •

edited

Loading

gerbrent commented Sep 8, 2022 •

edited

Loading

CGBassPlayer commented Sep 8, 2022 •

edited

Loading

gerbrent commented Sep 8, 2022 •

edited

Loading

FlakM commented Oct 30, 2022 •

edited

Loading