Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand domain.* wildcards in domain modifiers for iOS, Safari and MV3 #964

Closed
2 tasks done
ameshkov opened this issue Jun 28, 2024 · 1 comment · Fixed by #965
Closed
2 tasks done

Expand domain.* wildcards in domain modifiers for iOS, Safari and MV3 #964

ameshkov opened this issue Jun 28, 2024 · 1 comment · Fixed by #965
Labels
Feature Request New feature or request P1: Critical

Comments

@ameshkov
Copy link
Member

ameshkov commented Jun 28, 2024

Prerequisites

  • I checked the documentation and understood it;
  • I checked to make sure that this issue has not already been filed;

Problem description

iOS 17 has a very nasty bug which is still not fixed fixed in the latest update: https://discussions.apple.com/thread/255240753
This bug effectively limits the size of a content blocker JSON file to 10MB.

Since recently AdGuard filters cannot fit into this limit and the main reason for that is the way domain.* wildcards are handled when converting the rules. Currently, they're just converted to a list using top 200 most popular TLD.

Here's how a simple rule looks like after conversion:

  • Original: docviewer.yandex.*##.js-doc-html > div[class^=\"pages_\"] > div[class*=\" \"]:empty

  • Converted:

        {
            "trigger": {
                "url-filter": ".*",
                "if-domain": [
                    "*docviewer.yandex.com",
                    "*docviewer.yandex.ru",
                    "*docviewer.yandex.net",
                    "*docviewer.yandex.org",
                    "*docviewer.yandex.ir",
                    "*docviewer.yandex.in",
                    "*docviewer.yandex.com.au",
                    "*docviewer.yandex.com.tr",
                    "*docviewer.yandex.co.uk",
                    "*docviewer.yandex.io",
                    "*docviewer.yandex.co",
                    "*docviewer.yandex.gr",
                    "*docviewer.yandex.ca",
                    "*docviewer.yandex.com.ua",
                    "*docviewer.yandex.vn",
                    "*docviewer.yandex.info",
                    "*docviewer.yandex.de",
                    "*docviewer.yandex.fr",
                    "*docviewer.yandex.me",
                    "*docviewer.yandex.by",
                    "*docviewer.yandex.jp",
                    "*docviewer.yandex.xyz",
                    "*docviewer.yandex.ua",
                    "*docviewer.yandex.com.tw",
                    "*docviewer.yandex.co.za",
                    "*docviewer.yandex.co.il",
                    "*docviewer.yandex.online",
                    "*docviewer.yandex.eu",
                    "*docviewer.yandex.it",
                    "*docviewer.yandex.tv",
                    "*docviewer.yandex.id",
                    "*docviewer.yandex.xn--p1ai",
                    "*docviewer.yandex.edu",
                    "*docviewer.yandex.com.br",
                    "*docviewer.yandex.es",
                    "*docviewer.yandex.ch",
                    "*docviewer.yandex.co.in",
                    "*docviewer.yandex.kz",
                    "*docviewer.yandex.com.vn",
                    "*docviewer.yandex.biz",
                    "*docviewer.yandex.app",
                    "*docviewer.yandex.co.id",
                    "*docviewer.yandex.nl",
                    "*docviewer.yandex.pro",
                    "*docviewer.yandex.us",
                    "*docviewer.yandex.pl",
                    "*docviewer.yandex.cl",
                    "*docviewer.yandex.com.mx",
                    "*docviewer.yandex.ro",
                    "*docviewer.yandex.club",
                    "*docviewer.yandex.co.jp",
                    "*docviewer.yandex.co.nz",
                    "*docviewer.yandex.ma",
                    "*docviewer.yandex.com.ar",
                    "*docviewer.yandex.su",
                    "*docviewer.yandex.site",
                    "*docviewer.yandex.cc",
                    "*docviewer.yandex.rs",
                    "*docviewer.yandex.cn",
                    "*docviewer.yandex.ae",
                    "*docviewer.yandex.co.kr",
                    "*docviewer.yandex.mx",
                    "*docviewer.yandex.pk",
                    "*docviewer.yandex.se",
                    "*docviewer.yandex.gov.in",
                    "*docviewer.yandex.com.my",
                    "*docviewer.yandex.cz",
                    "*docviewer.yandex.shop",
                    "*docviewer.yandex.lk",
                    "*docviewer.yandex.live",
                    "*docviewer.yandex.tw",
                    "*docviewer.yandex.ai",
                    "*docviewer.yandex.com.sg",
                    "*docviewer.yandex.top",
                    "*docviewer.yandex.gov",
                    "*docviewer.yandex.ac.id",
                    "*docviewer.yandex.com.co",
                    "*docviewer.yandex.co.th",
                    "*docviewer.yandex.ac.in",
                    "*docviewer.yandex.be",
                    "*docviewer.yandex.in.ua",
                    "*docviewer.yandex.store",
                    "*docviewer.yandex.org.ua",
                    "*docviewer.yandex.org.tr",
                    "*docviewer.yandex.dk",
                    "*docviewer.yandex.hu",
                    "*docviewer.yandex.az",
                    "*docviewer.yandex.gov.ua",
                    "*docviewer.yandex.edu.vn",
                    "*docviewer.yandex.am",
                    "*docviewer.yandex.uz",
                    "*docviewer.yandex.com.pk",
                    "*docviewer.yandex.news",
                    "*docviewer.yandex.md",
                    "*docviewer.yandex.tech",
                    "*docviewer.yandex.nic.in",
                    "*docviewer.yandex.go.id",
                    "*docviewer.yandex.com.hk",
                    "*docviewer.yandex.ge",
                    "*docviewer.yandex.com.cn",
                    "*docviewer.yandex.ac.ir",
                    "*docviewer.yandex.sg",
                    "*docviewer.yandex.org.uk",
                    "*docviewer.yandex.my",
                    "*docviewer.yandex.no",
                    "*docviewer.yandex.go.th",
                    "*docviewer.yandex.pw",
                    "*docviewer.yandex.com.bd",
                    "*docviewer.yandex.to",
                    "*docviewer.yandex.gov.tr",
                    "*docviewer.yandex.dev",
                    "*docviewer.yandex.kiev.ua",
                    "*docviewer.yandex.mk",
                    "*docviewer.yandex.com.ng",
                    "*docviewer.yandex.ie",
                    "*docviewer.yandex.asia",
                    "*docviewer.yandex.at",
                    "*docviewer.yandex.co.ke",
                    "*docviewer.yandex.com.np",
                    "*docviewer.yandex.ph",
                    "*docviewer.yandex.sch.id",
                    "*docviewer.yandex.fi",
                    "*docviewer.yandex.tk",
                    "*docviewer.yandex.lv",
                    "*docviewer.yandex.space",
                    "*docviewer.yandex.life",
                    "*docviewer.yandex.pe",
                    "*docviewer.yandex.sk",
                    "*docviewer.yandex.ng",
                    "*docviewer.yandex.lt",
                    "*docviewer.yandex.tn",
                    "*docviewer.yandex.hk",
                    "*docviewer.yandex.link",
                    "*docviewer.yandex.vip",
                    "*docviewer.yandex.cloud",
                    "*docviewer.yandex.gov.bd",
                    "*docviewer.yandex.website",
                    "*docviewer.yandex.kr",
                    "*docviewer.yandex.sa",
                    "*docviewer.yandex.media",
                    "*docviewer.yandex.edu.in",
                    "*docviewer.yandex.pt",
                    "*docviewer.yandex.gg",
                    "*docviewer.yandex.blog",
                    "*docviewer.yandex.com.ph",
                    "*docviewer.yandex.hr",
                    "*docviewer.yandex.mobi",
                    "*docviewer.yandex.org.au",
                    "*docviewer.yandex.fun",
                    "*docviewer.yandex.bg",
                    "*docviewer.yandex.com.sa",
                    "*docviewer.yandex.ac.th",
                    "*docviewer.yandex.mn",
                    "*docviewer.yandex.ws",
                    "*docviewer.yandex.ee",
                    "*docviewer.yandex.one",
                    "*docviewer.yandex.uk",
                    "*docviewer.yandex.kg",
                    "*docviewer.yandex.ba",
                    "*docviewer.yandex.com.pe",
                    "*docviewer.yandex.al",
                    "*docviewer.yandex.today",
                    "*docviewer.yandex.fm",
                    "*docviewer.yandex.ml",
                    "*docviewer.yandex.edu.tr",
                    "*docviewer.yandex.bel.tr",
                    "*docviewer.yandex.ac.uk",
                    "*docviewer.yandex.net.ua",
                    "*docviewer.yandex.dz",
                    "*docviewer.yandex.win",
                    "*docviewer.yandex.org.tw",
                    "*docviewer.yandex.gov.co",
                    "*docviewer.yandex.guru",
                    "*docviewer.yandex.org.il",
                    "*docviewer.yandex.edu.pk",
                    "*docviewer.yandex.world",
                    "*docviewer.yandex.gov.vn",
                    "*docviewer.yandex.is",
                    "*docviewer.yandex.com.uy",
                    "*docviewer.yandex.gov.np",
                    "*docviewer.yandex.gob.mx",
                    "*docviewer.yandex.or.id",
                    "*docviewer.yandex.gov.my",
                    "*docviewer.yandex.edu.co",
                    "*docviewer.yandex.si",
                    "*docviewer.yandex.in.th",
                    "*docviewer.yandex.gen.tr",
                    "*docviewer.yandex.network",
                    "*docviewer.yandex.org.in",
                    "*docviewer.yandex.ga",
                    "*docviewer.yandex.digital",
                    "*docviewer.yandex.edu.au",
                    "*docviewer.yandex.web.id",
                    "*docviewer.yandex.work",
                    "*docviewer.yandex.best",
                    "*docviewer.yandex.agency",
                    "*docviewer.yandex.edu.ua",
                    "*docviewer.yandex.net.au",
                    "*docviewer.yandex.icu",
                    "*docviewer.yandex.sh"
                ]
            },
            "action": {
                "type": "css-display-none",
                "selector": ".js-doc-html > div[class^=\"pages_\"] > div[class*=\" \"]:empty"
            }
        }

Proposed solution

I suggest transforming .* rules when the filters are compiled in the FiltersRegistry. We should transform domain.* wildcards to a shorter list of domains that are actually alive and active and thus compress the content blocker a lot.

It should be done for these three platforms:

  • iOS (obvious)
  • Desktop Safari (it does not suffer from the same bug, but I think it's still a good idea)
  • MV3 (does not have support for domain.* modifiers, what I propose may be useful to it as well).

Brief explanation

  1. Extract domain wildcards from the filter list, i.e. extract domain.*.
  2. Use the most popular TLD list to construct multiple domain names, i.e. domain.com, domain.cn, etc.
  3. Check which of these domains are alive.
  4. Replace the original domain with what you got in result. Don't forget to handle the case when the resulting rule does not have any domains at all in the end.

However, I suggest splitting this process in two parts so that we could make it manageable and more error-prone.

What TLDs to use

I suggest keeping a list of popular TLDs as a separate file that can be modified by filters maintainers in the future.
For starters, we can use the same 200 TLDs that are used in SafariConverterLib.

Part 1: Domains map

Let's build a dictionary that we will use later in the compilation process.

  • Input: all compiled filters in the filters registry.

  • Input: a list of the most popular TLDs.

  • Output: a JSON file that maps every domain wildcard found in the filters to a list of actual domains.

    {
        "google.*": [ "google.com", "google.com.uk" ]
    }

We need to do this:

  1. Go through every filter in FiltersRegistry
  2. Extract all domain wildcards from it (use AGTree for working with filters there)
  3. Compose a list of domain names from the wildcard and the list of the most popular TLDs.
  4. Check which of them are alive. For checking which domains are alive or not I suggest using the same approach as we use in DeadDomainsLinter, i.e. use the urlfilter service.
  5. Save the resulting map to a file.

Important: it may happen that the domain does not have any alive domains after checking. They need to be removed from the compiled list so print a warning to the console and add them with an empty array, i.e. "example.*": []

Part 2: Filters post-processing

  1. Go through the platform filters.
  2. Check every domain wildcard and check if we have a mapping for it in the file prepared during part 1.
  3. If we do, replace the domain wildcard with the list of domains that we got from the mapping file.
  4. It may happen that the rule becomes redundant after the changes are made, it needs to be removed in this case. Relevant parts of code in DDL: cosmetic, network.

Additional information

In order to verify the result, use the command-line version of SafariConverterLib to run the conversion and compare the output size.

@Alex-302

This comment was marked as outdated.

maximtop added a commit that referenced this issue Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature or request P1: Critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants