Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to get total facets count #67

Open
markomilivojevic opened this issue Nov 10, 2021 · 4 comments
Open

Add a method to get total facets count #67

markomilivojevic opened this issue Nov 10, 2021 · 4 comments

Comments

@markomilivojevic
Copy link
Contributor

The library doesn't have a method to get total count of facets.

Here is the code example, imagine an index jobs with columns employer_id and employer_name:

$client = new \Manticoresearch\Client();
$index = $client->index('jobs');

$query = new \Manticoresearch\Query\BoolQuery();
$query->must(new \Manticoresearch\Query\MatchQuery('amazon', 'employer_name'));

$search = $index->search($query);
$search->limit(0);
$search->offset($offset);
$search->facet('employer_id', null, $per_page);
$result = $search->get();

var_dump($result->getTotal()); // this returns total number of rows, not total number of facets
var_dump($result->getFacets()); // this returns facets

So, adding a method similar to this one could solve the issue:

$result->getTotalFacets()
@sanikolaev
Copy link
Collaborator

Please elaborate more on why it's important. Can't you just calculate count of $result->getFacets() or did I get you wrong on what you mean by "total count of facets" ?

@markomilivojevic
Copy link
Contributor Author

After experimenting with manticore json API, I realised that the issue belongs to manticoresearch core library rather than php client.

To clarify issue:

  • Index jobs can have 10M records, and can have 10k distinct employers.
  • In order to make pagination for employers results I need to set limit and offset, as described above.

The code above will hit /json/search endpoint, and it will produce two queries on manticore side:

  1. SELECT * FROM jobs WHERE MATCH('{"bool":{"must":[{"match":{"employer_name":"amazon"}}]}}') LIMIT 0,0
  2. SELECT employer_id FROM jobs WHERE MATCH('{"bool":{"must":[{"match":{"employer_name":"amazon"}}]}}') GROUP BY employer_id WITHIN GROUP ORDER BY weight() desc ORDER BY @groupby desc LIMIT 0,30

And behind the scene, it executes show meta only for the first query, in order to get total number of found rows, which is useless in my case. What I need is show meta from the second query which will return total number of found employers.

To answer your question: $result->getFacets() returns limited number of facets, I set in variable $per_page. Total count of facets needs to be returned by /json/search endpoint.

@sanikolaev
Copy link
Collaborator

OK, I think I see your point. Please review and confirm I understood it right:

So the minimal reproducible example is:

mysql -P9306 -h0 -e "drop table if exists t; create table t(f text, a int); insert into t(a) values(1),(2),(1)";

No "total" returned in "aggregations":

curl -sX POST http://localhost:9308/search  -d '{"index":"t","limit":0,"offset":0,"aggs":{"a":{"terms":{"field":"a","size":10}}}}'|jq .
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": []
  },
  "aggregations": {
    "a": {
      "buckets": [
        {
          "key": 2,
          "doc_count": 1
        },
        {
          "key": 1,
          "doc_count": 2
        }
      ]
    }
  }
}

(should be 2 in this case).

and when the number of facets is limited it's impossible to know how many can be returned in total:

curl -sX POST http://localhost:9308/search  -d '{"index":"t","limit":0,"offset":0,"aggs":{"a":{"terms":{"field":"a","size":1}}}}'|jq .
{
  "took": 0,
  "timed_out": false,
  "hits": {
    "total": 3,
    "hits": []
  },
  "aggregations": {
    "a": {
      "buckets": [
        {
          "key": 2,
          "doc_count": 1
        }
      ]
    }
  }
}

In SQL it can be seen in SHOW META after select ... group by:

mysql> select * from t group by a;
+---------------------+------+------+
| id                  | a    | f    |
+---------------------+------+------+
| 1514488646869188929 |    1 |      |
| 1514488646869188930 |    2 |      |
+---------------------+------+------+
2 rows in set (0.00 sec)

mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 2     |
| total_found   | 2     |
| time          | 0.000 |
+---------------+-------+
3 rows in set (0.00 sec)

Since the JSON protocol doesn't return it, the php client can't return it too.

@markomilivojevic
Copy link
Contributor Author

Yes, you understood it correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants