-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BD-19] Transition to the new Elasticsearch libs version for cs_comments_service #327
[BD-19] Transition to the new Elasticsearch libs version for cs_comments_service #327
Conversation
Thanks for the pull request, @idegtiarov! I've created BLENDED-561 to keep track of it in Jira. More details are on the BD-19 project page. When this pull request is ready, tag your edX technical lead. |
1be5be7
to
0b4ceb4
Compare
0b4ceb4
to
e92abfb
Compare
When we search by title or body ES allow to make 2 mistakes by default ( Notes: before start to test forum locally you have to rebuild forum indices. It can be made by command |
Edit: The below is resolved. It's unclear what fixed it for me, unfortunately. So I'm currently seeing the following error when I try to create a new post:
Not quite sure what that's about - I'm seeing documentation on Elasticsearch's site about the removal of types in Elasticsearch 7. https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html I'm not sure how this affects the declarations of mappings in comment.rb and comment_thread.rb / how it may work with the ruby elasticsearch library. Does creating a post work for you? Curious if there's just something I'm missing. (To be clear: I've rebuilt my indices as described in your last comment) |
.travis/elasticsearch.yml
Outdated
@@ -0,0 +1,5 @@ | |||
cluster.name: "docker-cluster" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cluster.name
necessary here? I think "docker-cluster" may be the default...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster.name
is not necessary, by default cluster.name
is elasticsearch
, link to the doc.
Done
fd188ce
to
77f3eba
Compare
.travis/docker-compose-travis.yml
Outdated
- discovery.type=single-node | ||
- bootstrap.memory_lock=true | ||
- "ES_JAVA_OPTS=-Xms512m -Xmx512m" | ||
ulimits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, the container I'm using from devstack doesn't include these entire entries:
- ulimits
- ports
And seems to work fine. Are these particularly necessary for travis or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It is not necessary
.travis/docker-compose-travis.yml
Outdated
|
||
volumes: | ||
data01: | ||
driver: local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't seem to need this driver: local
line either, far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
README.rst
Outdated
|
||
.. code-block:: bash | ||
|
||
$ bin/rake search:initialize | ||
|
||
To validate the 'content' alias exists and contains the proper mappings: | ||
To validate indices existence and content of the proper mappings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"To validate indices exist and contain the proper mappings:" would be with the spirit of the original sentence, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/task_helpers.rb
Outdated
LOG.info "Unable to delete non-existent index: #{name}." | ||
end | ||
def self.delete_indices | ||
Elasticsearch::Model.client.indices.delete(index: INDEX_NAMES, ignore_unavailable: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks)
lib/task_helpers.rb
Outdated
end | ||
def self.delete_indices | ||
Elasticsearch::Model.client.indices.delete(index: INDEX_NAMES, ignore_unavailable: true) | ||
LOG.info "Delete indices." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should maybe be "deleted" instead of "delete", since it's saying it's been completed, rather than starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LOG.error "Error indexing. Response was: #{response}" | ||
end | ||
LOG.info "Imported batch #{batch_number} into the index" | ||
sleep(sleep_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it we no longer need to sleep here? (I'm pre-supposing there was a good reason, but maybe there wasn't?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we load data to the index ES is acknowledged that index was updated. ES periodically refresh index.
Indices refresh description doc.
lib/task_helpers.rb
Outdated
def self.exists_alias(alias_name) | ||
Elasticsearch::Model.client.indices.exists_alias(name: alias_name) | ||
def self.exists_indices | ||
Elasticsearch::Model.client.indices.exists(index: INDEX_NAMES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indices.exists
function can accept an array? What does it do if one exists but the other doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble finding any documentation that these functions like exists
and refresh
take an array as their index
option...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, okay, I think I found it:
], | ||
query: { | ||
bool: { | ||
must: must, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this changed the search behavior, splitting the filters
from the old code into must
and should
? From line 41 on the other side, I would have expected they were all "musts" before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'll just generalize that question to ask - have the changes to this file resulted in any differences in how we perform our queries/what results they return? Can you tell me a little about what needed to change here? It's hard to follow from the diff alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see from test results we get expected result of search behaviour.
The filtered query is replaced by the bool query. Instead of the following. I use should to optimise filter query which look like, for example:
filter: {
bool: {
must: filters
}
}
where filters
{
bool: {
should: [
{:not => {:exists => {:field => :group_id}}},
{:terms => {:group_id => group_ids}}
]
}
}
And ES haven't critical difference between should
and filter
clauses. Useful doc link which explain should
and filter
difference.
Now search query looks simpler, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this all looks good - I have some questions I just wanted to talk through before Approving and moving forward. Thanks!
027656e
to
d46589e
Compare
I'm hearing from our team that sharding of the index is something we should be thinking about, as the older version of Elasticsearch handled it automatically, but the newer one doesn't. Admittedly, I don't know much about this, so I'm not going to be a great resource. I'd suggest if we need some help deciding what needs to be done here, we query the group in our shared Slack channel. |
d46589e
to
fb39d1e
Compare
fb39d1e
to
b18045f
Compare
…e-alias-functionality returns move alias functionality
@davidjoy I returned move_alias at the end of rebuild indices, as it should be. Thanks for your comments! |
Use a different SEARCH_SERVER variable for ES7 code.
5cdb90f
to
baf9e7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go! Tests all pass for me, and everything I'm able to test through the UI and the rake commands works too. Great job!
Description:
Updates versions of the ES and the ES DSL libs from V1.5.2 to V7.8.0
Splits content index with comment and comment_thread doc_types to comment and comment_thread indices.
Was done:
update ES libs version;
rewrite index documents for compatibility with ES7;
update make commands;
update travis ci.
Current code coverage: 95.78%.
Reviewers:
@mikix
@dianakhuang
Post merge: