From 461a96ea163b144ea2898d088efe65fce311d5be Mon Sep 17 00:00:00 2001 From: Mike Dalessio Date: Fri, 15 Mar 2024 11:03:26 -0400 Subject: [PATCH] fix: Reader#read sets @encoding if it is unset This allows Reader#encoding to remain unchanged while the libxml2 implementation of encoding reporting has changed in v2.12.6. --- CHANGELOG.md | 5 +++++ ext/nokogiri/xml_reader.c | 1 + 2 files changed, 6 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b70cf13543..da6e227e65 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,11 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA * [CRuby] Vendored libxml2 is updated to [v2.12.6](https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.12.6) from v2.12.5. (@flavorjones) +### Changed + +* [CRuby] `XML::Reader` sets the `@encoding` instance variable during reading if it is not passed into the initializer. Previously, it would remain `nil`. The behavior of `Reader#encoding` has not changed. This works around changes to how libxml2 reports the encoding used in v2.12.6. + + ## v1.16.2 / 2024-02-04 ### Security diff --git a/ext/nokogiri/xml_reader.c b/ext/nokogiri/xml_reader.c index 0520f8f7d5..c987e2bd3d 100644 --- a/ext/nokogiri/xml_reader.c +++ b/ext/nokogiri/xml_reader.c @@ -537,6 +537,7 @@ read_more(VALUE self) if (RTEST(constructor_encoding)) { c_document->encoding = xmlStrdup(BAD_CAST StringValueCStr(constructor_encoding)); } else { + rb_iv_set(self, "@encoding", NOKOGIRI_STR_NEW2("UTF-8")); c_document->encoding = xmlStrdup(BAD_CAST "UTF-8"); } }