Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urlize mode bug fixes #840

Merged
merged 6 commits into from
Feb 12, 2020
Merged

Urlize mode bug fixes #840

merged 6 commits into from
Feb 12, 2020

Conversation

dverdin
Copy link
Contributor

@dverdin dverdin commented Jan 10, 2020

Fixing bugs in urlize mode :

1- message text was treated as attachment and remained on the server as either "msg.0.bin" or "msg.0.txt" files. Fixed by testing the part type before urlizing it.
2- some special characters in message id led to the attachment to be considered missing by Sympa.

I also introduce a test file for Message.t, testing only urlization. this test is set in travis and the test succeeds. Note that commits 065312d and fcadbac are related to travis. The first one to enable the test, the second one to have Sympa::Constants without running make.

@dverdin
Copy link
Contributor Author

dverdin commented Jan 10, 2020

All remarks made in PR #830 are included in the current PR.

@dverdin dverdin added the bug label Jan 10, 2020
t/stub/Sympa/Constants.pm Outdated Show resolved Hide resolved
dverdin added a commit to dverdin/sympa that referenced this pull request Jan 14, 2020
…omment)): Improving consistency of group and user definition in the Sympa::Constants stud.
t/Message.t Show resolved Hide resolved
@ikedas
Copy link
Member

ikedas commented Jan 17, 2020

I think simply recursive urlization of subparts is not always appropriate.

We would be better to consider more precise cases where the subparts are treated as "attachment":

  • multipart/mixed

    This type on the top level may be special. Subparts may be treated as either ---

    • The first text/* part should be treated as "text", and the others should be "attachment" even if it is a text.
    • Or, text/* part that is decodable (i.e. with proper charset parameter) and does not have "content-disposition: attachment" should be treated as "text", the others should be "attachment".

    --- and only "attachment" can be urlized.

  • multipart/alternative, multipart/related
    Urlizing any subparts is not useful, because MUA is expected to show only most suitable subpart, or to render message using all subparts. Entire part (multipart) should be urlized if necessary.

  • multipart/digest, multipart/report etc.
    Urlizing any subparts is not useful, because a multipart itself is an integrated structure. Entire part (multipart) should be urlized if necessary.

  • message/rfc822, message/global
    Urlizing any parts this part contains is not useful, because this type represents embedded e-mail message and its content should not be broken. Entire part (message) should be urlized if necessary.

Thus, IMO only multipart/mixed may be modified recursively (with restrictions described in above) to "urlize" messages, and all others can be "urlized" by its own (if it is not treated as "text") or be intact (if it is treated as "text" or it is smaller than urlize_min_size).

@dverdin
Copy link
Contributor Author

dverdin commented Jan 19, 2020

Great review @ikedas !
I'll confront it to the problems I met in the sample files provided in the PR to check whether your suggestions work with them. These were originally messages for which urlization did not work correctly.
More news in a day or two!
David

@dverdin
Copy link
Contributor Author

dverdin commented Jan 22, 2020

Okie dokie... I had a look and I still have a problem with nested multiparts.

Apple Mail sometimes sends emails with attachments within multipart/alternative messages.
In those messages, the first part is text/plain, the second is multipart/mixed, and contains the text/html version of the first part and any attachments.

I think this is wrong because it makes no sense to put the attachments within an alternative part, however it is a fact. I'd like to simply ignore such misbehaviours but end users will probably consider it is Sympa's fault is urlize mode does not work.

I'd the like to consider multipart/related and multipart/alternative as potential attachments bearers.

That would lead to the following (changes to your version are in bold font):

  • multipart/mixed

    This type on the top level may be special. Subparts may be treated as either ---

    • The first text/* part should be treated as "text", and the others should be "attachment" even if it is a text. (I prefer the scond option, simpler to implement)
    • text part that is decodable (i.e. with proper charset parameter) and does not have "content-disposition: attachment" should be treated as "text", the others should be "attachment".

    --- and only "attachment" can be urlized.

  • multipart/alternative, multipart/related
    Don't urlize any subpart except if it is a multipart/mixed.
    Urlizing any subparts is not useful, because MUA is expected to show only most suitable subpart, or to render message using all subparts. Entire part (multipart) should be urlized if necessary.

  • multipart/digest, multipart/report etc.
    Urlizing any subparts is not useful, because a multipart itself is an integrated structure. Entire part (multipart) should be urlized if necessary.

  • message/rfc822, message/global
    Urlizing any parts this part contains is not useful, because this type represents embedded e-mail message and its content should not be broken. Entire part (message) should be urlized if necessary.

@dverdin
Copy link
Contributor Author

dverdin commented Jan 23, 2020

OK, I have implemented what we discussed. A slight change : I urlize sub-parts of multipart/alternative, whichever they are, bu only one level below the message top level part. That means that if they contains anything other than valid text, it will be urlized. That leaves a chance to catch a multipart/mixed inside, but we won't go further.

This is not satisfactory yet, though it works well with the test files. It still needs working.

I have reorganized the test files to make their usage more explicit.

@ikedas ikedas added this to the 6.2.54 milestone Jan 27, 2020
@dverdin
Copy link
Contributor Author

dverdin commented Jan 28, 2020

OK, now it does exactly what I wanted. Only going to urlize sub parts of a multipart/related if it is a multipart/mixed.
I'll wait to see whether all the tests complete, then maybe squash some commits to make the whole PR more understandable.

1- message text was treated as attachment and remained on the server as either "msg.0.bin" or "msg.0.txt" files. Fixed by testing the part type before urlizing it.
2- some special characters in message id led to the attachment to be considered missing by Sympa.

I had to refactor the code a bit to allow urlizing of nested multiparts.

The detail of urlization treatment is as follows:

1- any multipart/mixed message part will be analyzed to look for urlizable parts,
2- any mulitpart/related message part will be be analyzed to look for multiârt/mixed sub parts,
3- within the multipart/mixed parts, any text/* part whose "Content-Disposition" header is not "attachment" and has a correct Content-Type.charset value is kept untouched,
4- any other subpart is kept on the server an replaced by a download link.
  - t/Message.t,
  - t/samples/ contains six different messages for testing urlize,
  - t/data/ now contains a test list configuration file.
…onstants because most of them are needed when using

 Sympa libraries. Specifying the sources 'default' directory for 'DEFAULTDIR' constant to allow access to distributed templates.
@dverdin
Copy link
Contributor Author

dverdin commented Jan 28, 2020

OK, I squashed the commits to keep only four commits, combining the code modifications we compromised over. It should be easier to understand fr future developers.

src/lib/Sympa/Message.pm Outdated Show resolved Hide resolved
…attr()' and, therefore, a regular expression by 'eq'.
Copy link
Member

@ikedas ikedas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I don't have comments any more.

@dverdin
Copy link
Contributor Author

dverdin commented Feb 3, 2020

Actually, I still found some cases in production, just today, that don't come out of urlization as expected.
So I'll have to dig in them a little to understand why my code don't process them correctly.
I'll post an update here soon.

Regards,

…g a multipart/alternative; In that case, with the current code the multipart/related was urlized as a 'msg.0.bin' attachment, which was wrong.

Fixing this behaviour by:
1- allowing to go down two levels of multiparts
2- preventing any urlization if the part is not included in a 'multipart/mixed' parent to prevent urlizing the content of multipart/related or multipart/alternative.
@dverdin
Copy link
Contributor Author

dverdin commented Feb 5, 2020

OK, now that seems right. I had to go one level further in the message architecture but I added an explicit rule in _urlize_one_part that will prevent urlization to occur when the parent part is not a multipart/mixed.
As usual, I put this code in production right now.

@ikedas ikedas merged commit a176097 into sympa-community:sympa-6.2 Feb 12, 2020
@ikedas
Copy link
Member

ikedas commented Feb 12, 2020

I merged. Thanks @dverdin !

@dverdin
Copy link
Contributor Author

dverdin commented Feb 12, 2020

Thanks @ikedas !
Glad to see this behind me...
Now, I'm heading towards the next bunch of PR...
Regards,
David

@dverdin dverdin deleted the urlize2 branch February 12, 2020 12:59
@ikedas ikedas mentioned this pull request Feb 14, 2020
ikedas added a commit that referenced this pull request Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants