Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeps re-creating my embeddings. #496

Closed
bbecausereasonss opened this issue Feb 29, 2024 · 111 comments
Closed

Keeps re-creating my embeddings. #496

bbecausereasonss opened this issue Feb 29, 2024 · 111 comments

Comments

@bbecausereasonss
Copy link

I'm using Obsidian on Desktop/Mac and Sync with Dropbox. My embeddings keep getting re-created, seemingly every day sometimes fully. Not sure why. This never used to happen before. Also when I click 'save' now nothing happens where previous versions used to save an embedding file.

@brianpetro
Copy link
Owner

@bbecausereasonss are you talking about the "save" by the API key? There isn't currently a method for manually saving. However, checking the developer console logs should let us know whether the file is being saved. It should say something like "Saved in XXXXms".

@bbecausereasonss
Copy link
Author

No I mean the save .smart-connections data

@brianpetro
Copy link
Owner

That button simply saves the setting. It's a manual save for the setting because if you change the folder, it needs to rename it, which you don't want to trigger automatically.

@DantesHub
Copy link

CleanShot 2024-03-04 at 20 28 27
Hey @brianpetro I'm having the same problem, this prompts keeps appearing everytime i reopen obsidian.

@DantesHub
Copy link

CleanShot 2024-03-04 at 20 30 03

@robwheatley
Copy link

I have the same problem. My vault is saved to iCloud rather than dropbox and I assume it has something to do with the file dates. I'm wondering if iCloud/DropBox messes with the dates on files so Smart Connections things that the embeddings are out of date. I came here to see if that was the case. Not found anything yet, but perhaps this will give people a clue?

@sh4d0wl3ss
Copy link

Same issue and I am using Obisidan Sync but also generate daily journal entries and have about 8k notes in the vault. I'm spending much more time with degraded Obsidian performance while it embeds smart notes on the same folder every day than I actually use the smart connections features. Juice isn't worth the squeeze. Didn't do this before the latest Obsidian update.

@brianpetro
Copy link
Owner

brianpetro commented Mar 5, 2024

@robwheatley @DantesHub @sh4d0wl3ss

Keep an eye on the dev console during the embedding process. If there's an error with saving, screenshot and I can fix that.

If everything saves correctly, then something else is clearing the embeddings. Check the dev console immediately after startup to check for loading issues.

Otherwise, it might be something that's specific to your setup. So also include any third-party syncing that you're using, like how @robwheatley mentioned he is using iCloud.

Thanks for your help in solving this issue,
🌴

@brianpetro
Copy link
Owner

Also, which operating system you're using will help narrow down the issue.

@robwheatley
Copy link

robwheatley commented Mar 5, 2024

Hi @brianpetro I'm new to Obsidian and didn't realise that there was a dev console. I've just taken a look and have spotted a few errors being thrown out by smart connections. Not sure if this helps you, but I can provide more info on request...

I'm on oSX 14.3.1 and Obsidian 1.5.8 and smart connections 1.0.128. Also, as mentioned, I have my vault in iCloud.

As a test, I moved my vault out of iCloud and onto my regular drive. When I did that, the dev log did look a little different - I more entries for "Embedded X inputs..." but I still got the final undefined 'last_history' at the end of the log like below. Also, on restart of Obsidian, all my notes required embedding again - so it doesn't look like an iCloud specific problem.

Here is the log when running on iCloud...

Edit: Please only upload screenshots of logs

@brianpetro
Copy link
Owner

@robwheatley if you could screenshot the console, that would be much easier for me to look through.

Also, toggle this on for better logs
Screenshot 2024-02-02 at 10 34 25 PM
🌴

@robwheatley
Copy link

@brianpetro here is the 1st screenshot - taken after start-up with the debugging on. The 2nd screenshot is from after I clicked the 'create embeddings' button up to the point it finished

Shot 1
Screenshot 2024-03-05 at 14 50 39

Shot 2
Screenshot 2024-03-05 at 14 58 12

@brianpetro
Copy link
Owner

Does anyone that this has happened to remember hitting the pause button prior to losing the embeddings?

It seems like under some conditions, like pausing/restarting, multiple embedding processes could be executed at once. This is visualized by the denominator in the progress notification changing between multiple values.

So far this has only happened once for me during development, so it'll require more testing.

There could be some other situations where this happens. I'm continuing to investigate.

@robwheatley
Copy link

Does anyone that this has happened to remember hitting the pause button prior to losing the embeddings?

It seems like under some conditions, like pausing/restarting, multiple embedding processes could be executed at once. This is visualized by the denominator in the progress notification changing between multiple values.

So far this has only happened once for me during development, so it'll require more testing.

There could be some other situations where this happens. I'm continuing to investigate.

I'm 99% sure I didn't pause. OK, 98%

@bbecausereasonss
Copy link
Author

Does anyone that this has happened to remember hitting the pause button prior to losing the embeddings?

It seems like under some conditions, like pausing/restarting, multiple embedding processes could be executed at once. This is visualized by the denominator in the progress notification changing between multiple values.

So far this has only happened once for me during development, so it'll require more testing.

There could be some other situations where this happens. I'm continuing to investigate.

I hit the pause button once, but that was weeks ago and it's been happening ever since.

@brianpetro
Copy link
Owner

@robwheatley @bbecausereasonss thanks for letting me know!

🌴

@brianpetro
Copy link
Owner

Thinking out loud here:

Something else that was recently changed was replacing a file hash (b/c incompatible with mobile) with checking both the files size and last change time.

In theory, this should not cause a noticeable difference because even if the time is modified by some other process, the file size should stay the same.

But, in practice, maybe the file size is also being altered even without changing the note.

I'm going to need to come up with some sort of test for this.

🌴

@robwheatley
Copy link

I've been doing some messing about with a new Vault saved directly onto my hard drive. Clean install, with only the Smart Connections plugin installed. And I've been adding notes to that to see what it does....

I think what's happening is that when the embeddings.ajson file is loaded on start-up, the JSAON parser doesn't like something and reports the error below. That results in the embedding.ajson file being deleted, so you have to start the embedding again.

SyntaxError: Bad control character in string literal in JSON at position 3054377 (line 1 column 3054378)
    at JSON.parse (<anonymous>)
    at ObsAJSON2.load (plugin:smart-connections:105:31)
    at async SmartNotes.load (plugin:smart-connections:2054:9)
    at async eval (plugin:smart-connections:2767:92)
    at async Promise.all (index 0)
    at async ScBrain2.init (plugin:smart-connections:2767:9)
    at async ScBrain2.reload (plugin:smart-connections:2750:9)
    at async SmartView2.load_brain (plugin:smart-connections:4293:9)
    at async SmartView2.initialize (plugin:smart-connections:4283:9)

I don't know what the bad character is - I can't see anything obviously wrong with the file (I can do more testing later) and I don't know what put it here in the first place. I wonder if I could hack something to prevent the file from being deleted when it discovers the error to see what's going on....?

@brianpetro
Copy link
Owner

@robwheatley good catch!

That error doesn't specifically delete the file, but the embeddings fail to load and then the reprocessing overwrites the existing file.

Solving the source of the issue: I'm thinking it's this line

get ajson() { return `"${this.key.replace(/"/g, '\\"')}": ${JSON.stringify(this.data)}`; }

Specifically, "${this.key.replace(/"/g, '\\"')}": should be ${JSON.stringify(this.key)} to handle any control characters.

I'll get this change shipped in the next update, today if I can fit it in.

Another thing that can help situations like this: saving the file so that records (or batches of records) are separated by newlines. This way the erroneous record/batch can be thrown out while preserving the rest. This would likely have a negative impact on start-up performance, but would still probably be worth it to prevent this embedding-rewrite headache.

🌴

@robwheatley
Copy link

@brianpetro It's great that you are looking into this. I've just spent the last few hours seeing if I could add anymore info. I went down a bit of a rabbithole TBH!

From my clean install, I started to add notes in from my 'real' vault. I wondered if a particular note was causing the issue. After lots of messing about I thought I found something. When I added a specific note, I started to get errors. But it turned out to be nothing special about that note. If I just added 'one more note' of any sort, I would cause the issue. Basically, I got in the situation where I had 236 notes, but adding a 237th would make things fail.

I then started to look other things, because adding the 237th note doesn't 100% reproduce the problem. So I then started to add more content to notes when I just had 236. I'm not convinced that this actually got me anywhere though!!

I did run into a few odd things along the way though. For example, when I added a new note, I got the alert to say that it was being embedded, but the alert never went away, even though I could see in the console that some sort of embedding had been done. Sometimes I saw a time-out on this single file (dunno why, I was using a super simple embedding and I'm on a speedy machine). Also, on start-up, I sometimes get asked if I want to re-embed all my notes, even though I can see there is a valid embeddings file and there has been no parsing errors. Quit and restart sorts that out (on next run, I'm not asked to re-embed). I'm not sure how you are keeping track of what's been embedded or not. Maybe the embeddings file itself, and these issues were being caused by file-on-disk mismatches. No idea, and I realise these ramblings won't help!

I'm super-keen to get this plugin working though. I've only just moved to Obsidian, and although I'm putting some structure in place for new notes, the old ones I have imported are a mess, so this would be really useful!

@brianpetro
Copy link
Owner

@robwheatley thanks for sharing all that! Your rabbit hole can be my gold mine. It's not very often (considering how many people have downloaded Smart Connections) that I get such detailed feedback 😊

You definitely managed to point out some curiosities.

Separating meta data from embeddings files is something I've played around with in the past, and could be a way to thwart some of these issues.

If you ever have a note that seems to cause an issue, but you can't figure out why, please do share the note with me. If you need a private channel to do so, I can accommodate. But being able to see some of these issues myself can be invaluable to the debugging process.

There is still a lot of legacy code in v2.0, but I'm continuing to modularize the processes, enabling useful test processes, so the stability will only improve (though things tend to get worse before they get better...). More importantly, these design decisions should also allow contributing by community members long into the future.

PS- all these GitHub issues end up in my personal obsidian vault, and Smart Connections enables me to resurface them at the right times. So any notes you make on your experience, much like what you just shared, will be useful even if they aren't specifically addressed right away.

Thanks for your help in making Smart Connections better!
🌴

@brianpetro
Copy link
Owner

@robwheatley @bbecausereasonss @DantesHub @sh4d0wl3ss latest update v2.0.129 implements #496 (comment)

@robwheatley
Copy link

@brianpetro No joy I'm afraid. I updated to the latest version, scrubbed everything and started embedding from scratch.

My local vault created the embeddings file, but on restart wants me to re-do them all again (even though there is a valid 2Meg file there, that doesn't get overwritten). I can get more from the logs on this later.

My iCloud vault created the embeddings (took at lot longer as I have 4x more notes in this one), it said that it saved the file in the console, but the saved file was zero bytes, and obviously I get asked to re-do them on restart.

Must be something else causing the issue. I don't think I will have much time to play tonight, but will over the weekend if you don't work it out before then....

@brianpetro
Copy link
Owner

@robwheatley bummer, but thanks for letting me know.

When you get a chance, let me know if you're still seeing the same error or if it's something new.

🌴

@brianpetro
Copy link
Owner

@robwheatley @bbecausereasonss @DantesHub @sh4d0wl3ss released another update v2.0.130 to try to address this. In short, a temporary file is created on save to prevent overwriting the existing file 🌴

@robwheatley
Copy link

@brianpetro I tried .30 and it looked liked it worked, but saw a couple of funny things. I wasn't paying much attention as I was waiting in the car for my daughter. Just got back in the house and seen that you have released a .31 version.

Happy to report that I've hit no issues so far (local or iCloud). To flex it a bit, I've just switched to a beefier model to see if a larger file size causes any issues. It's chugging away as I type and I can see that you are saving the file after a few notes have been processed and recording the file sizes as you go. At least I can see the size increasing, so that's encouraging!

YES! IT WORKED - Nice one!!
I've quit and restarted several times and not been asked to re-embed everything again

@robwheatley
Copy link

@brianpetro Spoke too soon. It creates the notes embeddings fine, but it's failing to create block embeddings at the moment. I'm getting this error. Just thought I should let you know.

plugin:smart-connections:2520 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'last_history')
    at SmartBlock.init (plugin:smart-connections:2520:37)
    at SmartBlocks.create_or_update (plugin:smart-connections:763:14)
    at eval (plugin:smart-connections:2460:42)
    at Array.forEach (<anonymous>)
    at eval (plugin:smart-connections:2460:18)
    at async Promise.all (index 284)
    at async SmartBlocks.import (plugin:smart-connections:2457:9)
init @ plugin:smart-connections:2520
create_or_update @ plugin:smart-connections:763
eval @ plugin:smart-connections:2460
eval @ plugin:smart-connections:2460
await in eval (async)
ensure_embeddings @ plugin:smart-connections:2277

@brianpetro
Copy link
Owner

@robwheatley, thanks for the update! Seems like we're at least making some progress 😊

Please toggle on this option:

Screenshot 2024-02-27 at 6 13 25 PM

It makes the logs provide useful line numbers (the other ones are based on a compiled file).

I just made an update (v2.0.134) to log the blocks causing the above error. This update should give us a better idea of what's going on with that error by letting us see the blocks that are causing it.

Thanks for your help
🌴

@robwheatley
Copy link

I've just deleted the previous comment saying that all is well. My embeddings file just got overwritten with an empty file after a quit and restart. I wasn't paying attention as I was doing other things at the time. I will keep an eye on things and add more info when I can...Sorry to be giving you bad news on a weekend..

@brianpetro
Copy link
Owner

@robwheatley you jinxed it! Lol.

In the latest version, I added logic so that, when new embeddings are being saved, the disk writes happen in a new temporary file. That new temporary file should only replace the existing "working" file if it is at least 50% of the size of the "working" file. So it's weird that you would end up with a completely empty file.

A few things to check:

  1. Make sure you are on the latest version so we can rule that out. The latest version is v2.0.134.

  2. If you're a supporter, are you also running the Smart Connect software when this happens?

  3. Are there any third-party syncing processes that might be overwriting the file? If so, can the .smart-connections folder be excluded from that syncing?

  4. Screenshots of anything that might be of interest are always helpful.

Thanks for the update
🌴

@brianpetro
Copy link
Owner

@bbecausereasonss it might be worth it for you to try this too #528 (reply in thread)

@Hopsakee
Copy link

@Hopsakee this was just shared with me #528 (reply in thread)

I think it's worth trying out! Let me know if it helps.

🌴

Thanks for sharing. I'm trying it right now.
Trying a smaller model unfortunately didn't solve my issue on my laptop.

@bbecausereasonss
Copy link
Author

bbecausereasonss commented Mar 30, 2024

Okay this is really driving me nuts. My Obsidian is literally open all night long (didnt change a single thing). I open the window back up and it tells me I need to redo the embeddings. 3rd day in a row.

image

@brianpetro
Copy link
Owner

@bbecausereasonss wow that's annoying!

I hope we manage to get to the bottom of this soon. It's very frustrating.

🌴

@bbecausereasonss
Copy link
Author

@bbecausereasonss wow that's annoying!

I hope we manage to get to the bottom of this soon. It's very frustrating.

🌴

I often notice 0kb files in the folder, is this normal?

image

@bbecausereasonss
Copy link
Author

bbecausereasonss commented Mar 30, 2024

Also during the re-embedding process I get a ton of 'exceed max tokens' errors. Which is odd, shoudn't embeddings not have a max? Isn't that the whole point?

image

Definitely some strange stuff going on lately.

image

@bbecausereasonss
Copy link
Author

Would not let me create the embeddings today, it kept asking me to delete them and going in a loop of failing to save. Nuked the folder started fresh, now no more 0kb files and the size of the embedding file is much larger.

I have done this 2x already this past week though so feel like something is going to mess it up again.

image

@brianpetro
Copy link
Owner

@bbecausereasonss all embedding models also have a max content length, including the OpenAI embeddings, there just isn't a log associated with the truncating. This local model truncating log will be turned off in the next version.

The EBUSY error is interesting. It is likely indicating that some external software is blocking access to the file. Possibly while syncing.

I won't be able to make any changes today, but I might be able to add some sort of logic to catch that error and retry in the future.

🌴

@brianpetro
Copy link
Owner

@bbecausereasonss @Hopsakee I just shipped an experimental feature in v2.1.23 that you might want to try.

Screenshot 2024-04-02 at 1 32 26 PM

Instead of one large file, the experimental feature creates a file-per-note.

Note: Switching will require re-embedding.

If we still have the same issue with the many files, then that would at least narrow the possible issues down to a much smaller range of possibilities.

🌴

@bbecausereasonss
Copy link
Author

bbecausereasonss commented Apr 2, 2024

Awesome. Thank for pushing this. I'll give this a shot.

@robwheatley
Copy link

robwheatley commented Apr 3, 2024 via email

@quicly
Copy link

quicly commented Apr 21, 2024

I have the same problem, that is described in this thread.
Right now I'm trying to use new for per-file saving feature, but unfortunately it made situation even worse, since right now files are just continue to save again and again for infinity.

One strange thing that I noticed is that many file containes repeated identical lines, for example
image

@brianpetro
Copy link
Owner

Hi @quicly

Thanks for letting me know about that!

That tells me the issue is happening during the "loading" process rather than the "saving" process.

What should happen is that, when loading, the last identical line, based on the first part, the file name in quotes, should replace all of the others.

An issue in the loading process also explains why re-embedding keeps happening since the previous embedding was never loaded in the first place.

It would be helpful if you could open the developer console, disable and re-enable Smart Connections, and screenshot the logs, which should show us some errors.

Before doing this, please turn on the "Debug at startup time" setting in the Obsidian Community plugin settings. This will help make sure the logs are as detailed as possible.

Thanks for your help in figuring this out,
🌴

@quicly
Copy link

quicly commented Apr 21, 2024

It would be helpful if you could open the developer console, disable and re-enable Smart Connections, and screenshot the logs, which should show us some errors.

I'll do it after my current re-embedding will finish. Right now I disabled block model, because last time note model finished its work while block model just kept creating files again and again.

I'm seeing right now in real time how already embedded note is saved again after each iteration with duplicate content. For example. It adds new lines to the file just a minute ago

image

While there is nothing unsual happening in the process
image

Here's what I see right now at dev console at startup
image
image
image
image
image
image

@quicly
Copy link

quicly commented Apr 21, 2024

So I just tried to disable and enbale smart connections. It didn't even launch

image

Meanwhile, files in the folder saving again and again with new duplicate lines

UPD: It hasn't launched after 5 minutes. After I disabled plugin, here's what was in console
image

@brianpetro
Copy link
Owner

@quicly thanks for the screenshots. I'll have to review them further and see if I can find anything that might be causing the issue and get back to you 🌴

@brianpetro
Copy link
Owner

@quicly After reviewing the logs, I made some updates that might help solve this in version 2.1.49.

If the latest version doesn't automatically clear things up, I recommend manually deleting the .smart-connections folder and restarting Obsidian.

If you're still encountering issues, please screenshot the new errors/logs so I can further investigate the cause.

Thanks for your help in figuring this out,
🌴

@quicly
Copy link

quicly commented Apr 22, 2024

I deleted the folder and start embedding anew. Unfortunately, doubling of the lines in the files still continues.
The embedding process finished nevertheless. But as soon as I change at least 1 note, it starts to save many another files again
image
image
image
image

Just as in previous attempt, after I closed Obsidian and opened it smart connections wasn't show any connections

image
image
image

Meanwhile Smart connections kept updating the files and folder size became very big. I turned it off and Console showed following errors
image

image

@brianpetro
Copy link
Owner

@quicly interesting, thanks for sharing those.

The errors after disabling the plugin can be attributed to the main process being discontinued prior to other processes finishing. Besides that, the other errors are OK and shouldn't cause issues.

If you're still on the "Embedding file per note (EXPERIMENTAL)" setting, I would try turning that off now that I have made some updates, which may have solved the reason you turned it on in the first place.

Sorry that you're still having trouble with this, I know it's frustrating!

🌴

@quicly
Copy link

quicly commented Apr 22, 2024

@brianpetro I turned "Embedding file per note" off. At least for now everything's working fine. Moreover, saving time is greatly reduced and there are no freezing while saving, which was the problem even until re-embeddings problem started.

UPD: although I'm not sure why, block model doesn't start embedding for now. It happened before and then it just started working, so I'll wait

@bbecausereasonss
Copy link
Author

This is interesting because for me, after my embedding file got rather large. Turning that feature "ON" is what saved me.

@brianpetro
Copy link
Owner

@bbecausereasonss it seems to depend on a lot of factors. I've been mostly keeping the file per note feature ON, mostly because I believe it will be the default eventually so I want to make sure it's working well.

But, it isn't quite ready for mainstream use yet, also lacks some features, so I'm not surprised if some people have issues with it.

🌴

@quicly
Copy link

quicly commented Apr 23, 2024

@brianpetro
Follow up
For the last 12 hours, embeddings have been kept in place and have not been deleted. However, there are two other problems (which might be separate, but I'll mention them here in case they are connected to the current one):

  1. After activating block model (i'm using BGE micro for both right now), the saving time for both the note and block model becomes frustratingly slow, leading to Obsidian freezing for 10-15 seconds.
  2. When I create a new "Untitled" note, SC starts embedding immediately, but when I make significant changes to a note, sometimes no new embeddings occur until I restart SC. I tried changing the "minimum size," but it had no effect on this behavior.

@brianlaughlin
Copy link

@brianpetro
I want to report since using version 2.1.47 I have not seen it redo the embeddings. I don't have Blocks used and am using OpenAI Text-3 Large with Model gpt-4-turbo-preview.

Thought I would share the good news.

@brianpetro
Copy link
Owner

🤞😊🌴

@vanishrap
Copy link

vanishrap commented Jun 6, 2024

I have a similar issue.

Scenario:

  1. Using various embeddings: reproduced similar things with both local, and diffrerent openAI models. Blocks are not used, as it with blocks it is evel less stable.
  2. Embeddings are required for 2900 notes, at first; the process of embedding is completed without errors.
  3. When selecting some of the notes, it says that embedding does not exist, and restarts the process for a smaller subset.
    Pasted image 20240606175255
    Pasted image 20240606175423

This process continues 5-10 times until all embeddings are completed.

  1. Obsidian is restarted: all embedings gone. Ajson files in the smart connections folder are all 1 byte in size.

@quicly
Copy link

quicly commented Jun 6, 2024

Everything worked fine for 2 months. But a new update required me to recreate all my embeddings. As I see it right now, it saves everything in separate files, but these files are not stored and are missing after creation.

UPD. I tested it more.It is very similar to what @vanishrap describes

@brianpetro
Copy link
Owner

@vanishrap @quicly

Update to v2.1.69 and this should be fixed.

Unfortunately, I screwed up a line of code that prevented the new embeddings from saving. You will need to re-embed again, however, this time they will save using the improved embeddings file system.

Thanks for bringing this to my attention
🌴

@brianlaughlin
Copy link

Thanks for jumping on this so quickly @brianpetro!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests