Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BD-19] Transition to the new Elasticsearch libs version for cs_comments_service #327

Conversation

idegtiarov
Copy link

Description:

Updates versions of the ES and the ES DSL libs from V1.5.2 to V7.8.0
Splits content index with comment and comment_thread doc_types to comment and comment_thread indices.

Was done:

update ES libs version;
rewrite index documents for compatibility with ES7;
update make commands;
update travis ci.

Current code coverage: 95.78%.

Reviewers:
@mikix
@dianakhuang

Post merge:

  • Delete working branch

@openedx-webhooks
Copy link

openedx-webhooks commented Aug 26, 2020

Thanks for the pull request, @idegtiarov! I've created BLENDED-561 to keep track of it in Jira. More details are on the BD-19 project page.

When this pull request is ready, tag your edX technical lead.

@openedx-webhooks openedx-webhooks added blended PR is managed through 2U's blended developmnt program needs triage labels Aug 26, 2020
@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from 1be5be7 to 0b4ceb4 Compare August 31, 2020 07:07
@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from 0b4ceb4 to e92abfb Compare August 31, 2020 07:22
@Golub-Sergey
Copy link
Contributor

Golub-Sergey commented Sep 8, 2020

When we search by title or body ES allow to make 2 mistakes by default (max_edits option), do we need increase amount of edits?
Doc link

Notes: before start to test forum locally you have to rebuild forum indices. It can be made by command bin/rake search:rebuild_indices. It can be used once during running forum service.

@davidjoy
Copy link

davidjoy commented Sep 22, 2020

Edit: The below is resolved. It's unclear what fixed it for me, unfortunately.

So I'm currently seeing the following error when I try to create a new post:

Elasticsearch::Transport::Transport::Errors::BadRequest at /api/v1/i4x-edx-eiorguegnru-course-foobarbaz/threads
[400] {"error":"MapperParsingException[mapping [_doc]]; nested: InvalidTypeNameException[mapping type name [_doc] can't start with '_']; ","status":400}

Not quite sure what that's about - I'm seeing documentation on Elasticsearch's site about the removal of types in Elasticsearch 7.

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html

I'm not sure how this affects the declarations of mappings in comment.rb and comment_thread.rb / how it may work with the ruby elasticsearch library.

Does creating a post work for you? Curious if there's just something I'm missing. (To be clear: I've rebuilt my indices as described in your last comment)

@@ -0,0 +1,5 @@
cluster.name: "docker-cluster"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cluster.name necessary here? I think "docker-cluster" may be the default...

Copy link
Contributor

@Golub-Sergey Golub-Sergey Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster.name is not necessary, by default cluster.name is elasticsearch, link to the doc.
Done

@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from fd188ce to 77f3eba Compare September 24, 2020 13:00
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, the container I'm using from devstack doesn't include these entire entries:

  • ulimits
  • ports

And seems to work fine. Are these particularly necessary for travis or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It is not necessary


volumes:
data01:
driver: local

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't seem to need this driver: local line either, far as I can tell.

Copy link
Contributor

@Golub-Sergey Golub-Sergey Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

README.rst Outdated

.. code-block:: bash

$ bin/rake search:initialize

To validate the 'content' alias exists and contains the proper mappings:
To validate indices existence and content of the proper mappings:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To validate indices exist and contain the proper mappings:" would be with the spirit of the original sentence, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LOG.info "Unable to delete non-existent index: #{name}."
end
def self.delete_indices
Elasticsearch::Model.client.indices.delete(index: INDEX_NAMES, ignore_unavailable: true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks)

end
def self.delete_indices
Elasticsearch::Model.client.indices.delete(index: INDEX_NAMES, ignore_unavailable: true)
LOG.info "Delete indices."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe be "deleted" instead of "delete", since it's saying it's been completed, rather than starting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LOG.error "Error indexing. Response was: #{response}"
end
LOG.info "Imported batch #{batch_number} into the index"
sleep(sleep_time)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it we no longer need to sleep here? (I'm pre-supposing there was a good reason, but maybe there wasn't?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we load data to the index ES is acknowledged that index was updated. ES periodically refresh index.
Indices refresh description doc.

def self.exists_alias(alias_name)
Elasticsearch::Model.client.indices.exists_alias(name: alias_name)
def self.exists_indices
Elasticsearch::Model.client.indices.exists(index: INDEX_NAMES)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indices.exists function can accept an array? What does it do if one exists but the other doesn't?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble finding any documentation that these functions like exists and refresh take an array as their index option...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

],
query: {
bool: {
must: must,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this changed the search behavior, splitting the filters from the old code into must and should? From line 41 on the other side, I would have expected they were all "musts" before.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll just generalize that question to ask - have the changes to this file resulted in any differences in how we perform our queries/what results they return? Can you tell me a little about what needed to change here? It's hard to follow from the diff alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see from test results we get expected result of search behaviour.
The filtered query is replaced by the bool query. Instead of the following. I use should to optimise filter query which look like, for example:

filter: {
                  bool: {
                      must: filters
                  }
              }

where filters

{
            bool: {
                should: [
                    {:not => {:exists => {:field => :group_id}}},
                    {:terms => {:group_id => group_ids}}
                ]
            }
        }

And ES haven't critical difference between should and filter clauses. Useful doc link which explain should and filter difference.
Now search query looks simpler, I think.

Copy link

@davidjoy davidjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this all looks good - I have some questions I just wanted to talk through before Approving and moving forward. Thanks!

@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch 2 times, most recently from 027656e to d46589e Compare September 25, 2020 16:16
@davidjoy
Copy link

I'm hearing from our team that sharding of the index is something we should be thinking about, as the older version of Elasticsearch handled it automatically, but the newer one doesn't. Admittedly, I don't know much about this, so I'm not going to be a great resource. I'd suggest if we need some help deciding what needs to be done here, we query the group in our shared Slack channel.

@Golub-Sergey
Copy link
Contributor

@davidjoy pr to the rg forum which can be useful if the alias functionality should be returned.

Review can go on, thanks for your helps, comments and questions !)

@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from d46589e to fb39d1e Compare October 1, 2020 09:46
@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from fb39d1e to b18045f Compare October 1, 2020 09:51
lib/tasks/search.rake Outdated Show resolved Hide resolved
lib/task_helpers.rb Outdated Show resolved Hide resolved
@Golub-Sergey
Copy link
Contributor

@davidjoy I returned move_alias at the end of rebuild indices, as it should be. Thanks for your comments!

Use a different SEARCH_SERVER variable for ES7 code.
lib/tasks/search.rake Outdated Show resolved Hide resolved
@Golub-Sergey Golub-Sergey force-pushed the golub-sergey/OeX_ElasticSearch/feature/transition-from-ES1.5-to-ES7 branch from 5cdb90f to baf9e7f Compare October 8, 2020 22:26
Copy link

@davidjoy davidjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go! Tests all pass for me, and everything I'm able to test through the UI and the rake commands works too. Great job!

@dianakhuang dianakhuang merged commit 573567e into openedx:master Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blended PR is managed through 2U's blended developmnt program merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants