Your WordPress instance is leaking data (also with Gutenberg)

In the world of GDPR a leak of information is not very good, because this can happen in every moment also in stuff that you don’t think.

We have 2 sources of leaking information in WordPress that are builtin (that are documented and discussed but you probably don’t know):

  • Rest API
  • Search

Let’s see them and how they are doing it for every WordPress instance by default!

I am afraid a bit because I am explaining to you how to get access to data of a WP websites without any knowledge (also from websites of important people).

Don’t forget that I am contributing in WordPress inside VVV and GlotPress (Maintainer), GlotDict & BulkRejectGP (author), WP Roma and Terni meetup co-organizer, WordCamp Rome 2018 organizer and many other things that you can find on my website (also one of the ClassicPress founders). Also I have my web agency that has his business based on WordPress (my associate has a tattoo too) and we sell plugins.

I want to be sure that you reader knows that I am long time contributor and not someone that has no background like happen in ranting or discussion around Gutenberg.

In conclusion there will be how to fix it, what the community can do and what you can do and a recap.

Rest API

Since WordPress 4.7 rest API are builtin in every WP instance, the problem is that they are turned on also for non-logged users!

Let’s start with WordPress.org as example (also because the code is open source and publicly available).

If you open https://wordpress.org/wp-json (in Firefox you can enjoy more the JSON output) you can see 143 endpoints:

/	
/oembed/1.0	
/oembed/1.0/embed	
/oembed/1.0/proxy	
/wporg/v1	
/wporg/v1/data-exporter	
/wporg/v1/data-erase	
/wporg/v1/data-erase-preflight	
/akismet/v1	
/akismet/v1/key	
/akismet/v1/settings	
/akismet/v1/stats	
/akismet/v1/stats/(?P<interval>[\w+])	
/akismet/v1/alert	
/jetpack/v4	
/jetpack/v4/plans	
/jetpack/v4/jitm	
/jetpack/v4/verify_registration	
/jetpack/v4/remote_authorize	
/jetpack/v4/connection	
/jetpack/v4/connection/test	
/jetpack/v4/connection/test-wpcom	
/jetpack/v4/rewind	
/jetpack/v4/connection/url	
/jetpack/v4/connection/data	
/jetpack/v4/connection/owner	
/jetpack/v4/tracking/settings	
/jetpack/v4/connection/user	
/jetpack/v4/site	
/jetpack/v4/site/features	
/jetpack/v4/identity-crisis/confirm-safe-mode	
/jetpack/v4/identity-crisis/start-fresh	
/jetpack/v4/identity-crisis/migrate	
/jetpack/v4/module/all	
/jetpack/v4/module/all/active	
/jetpack/v4/module/(?P<slug>[a-z\-]+)	
/jetpack/v4/module/(?P<slug>[a-z\-]+)/active	
/jetpack/v4/module/(?P<slug>[a-z\-]+)/data	
/jetpack/v4/module/(?P<service>[a-z\-]+)/key/check	
/jetpack/v4/settings	
/jetpack/v4/settings/(?P<slug>[a-z\-]+)	
/jetpack/v4/options/(?P<options>[a-z\-]+)	
/jetpack/v4/jumpstart	
/jetpack/v4/updates/plugins	
/jetpack/v4/notice/(?P<notice>[a-z\-_]+)	
/jetpack/v4/plugins	
/jetpack/v4/plugin/(?P<plugin>[a-z\/\.\-_]+)	
/jetpack/v4/widgets/(?P<id>[0-9a-z\-_]+)	
/jetpack/v4/verify-site/(?P<service>[a-z\-_]+)	
/jetpack/v4/verify-site/(?P<service>[a-z\-_]+)/(?<keyring_id>[0-9]+)	
/jetpack/v4/service-api-keys/(?P<service>[a-z\-_]+)	
/wpcom/v2	
/wpcom/v2/business-hours/localized-week	
/wpcom/v2/mailchimp	
/wpcom/v2/gutenberg/available-extensions	
/wpcom/v2/hello	
/wpcom/v2/publicize/connections	
/wpcom/v2/publicize/connection-test-results	
/wpcom/v2/publicize/services	
/wpcom/v2/service-api-keys/(?P<service>[a-z\-_]+)	
/jetpack/v4/hints	
/wp/v2	
/wp/v2/posts	
/wp/v2/posts/(?P<id>[\d]+)	
/wp/v2/posts/(?P<parent>[\d]+)/revisions	
/wp/v2/posts/(?P<parent>[\d]+)/revisions/(?P<id>[\d]+)	
/wp/v2/posts/(?P<id>[\d]+)/autosaves	
/wp/v2/posts/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/pages	
/wp/v2/pages/(?P<id>[\d]+)	
/wp/v2/pages/(?P<parent>[\d]+)/revisions	
/wp/v2/pages/(?P<parent>[\d]+)/revisions/(?P<id>[\d]+)	
/wp/v2/pages/(?P<id>[\d]+)/autosaves	
/wp/v2/pages/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/media	
/wp/v2/media/(?P<id>[\d]+)	
/wp/v2/blocks	
/wp/v2/blocks/(?P<id>[\d]+)	
/wp/v2/blocks/(?P<id>[\d]+)/autosaves	
/wp/v2/blocks/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/feedback	
/wp/v2/feedback/(?P<id>[\d]+)	
/wp/v2/feedback/(?P<id>[\d]+)/autosaves	
/wp/v2/feedback/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/jp_pay_order	
/wp/v2/jp_pay_order/(?P<id>[\d]+)	
/wp/v2/jp_pay_order/(?P<id>[\d]+)/autosaves	
/wp/v2/jp_pay_order/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/jp_pay_product	
/wp/v2/jp_pay_product/(?P<id>[\d]+)	
/wp/v2/jp_pay_product/(?P<id>[\d]+)/autosaves	
/wp/v2/jp_pay_product/(?P<parent>[\d]+)/autosaves/(?P<id>[\d]+)	
/wp/v2/types	
/wp/v2/types/(?P<type>[\w-]+)	
/wp/v2/statuses	
/wp/v2/statuses/(?P<status>[\w-]+)	
/wp/v2/taxonomies	
/wp/v2/taxonomies/(?P<taxonomy>[\w-]+)	
/wp/v2/categories	
/wp/v2/categories/(?P<id>[\d]+)	
/wp/v2/tags	
/wp/v2/tags/(?P<id>[\d]+)	
/wp/v2/users	
/wp/v2/users/(?P<id>[\d]+)	
/wp/v2/users/me	
/wp/v2/comments	
/wp/v2/comments/(?P<id>[\d]+)	
/wp/v2/search	
/wp/v2/block-renderer/(?P<name>core/block)	
/wp/v2/block-renderer/(?P<name>core/latest-comments)	
/wp/v2/block-renderer/(?P<name>jetpack/business-hours)	
/wp/v2/block-renderer/(?P<name>jetpack/contact-info)	
/wp/v2/block-renderer/(?P<name>jetpack/address)	
/wp/v2/block-renderer/(?P<name>jetpack/email)	
/wp/v2/block-renderer/(?P<name>jetpack/phone)	
/wp/v2/block-renderer/(?P<name>jetpack/gif)	
/wp/v2/block-renderer/(?P<name>jetpack/mailchimp)	
/wp/v2/block-renderer/(?P<name>jetpack/map)	
/wp/v2/block-renderer/(?P<name>jetpack/repeat-visitor)	
/wp/v2/block-renderer/(?P<name>jetpack/slideshow)	
/wp/v2/block-renderer/(?P<name>jetpack/contact-form)	
/wp/v2/block-renderer/(?P<name>jetpack/field-text)	
/wp/v2/block-renderer/(?P<name>jetpack/field-name)	
/wp/v2/block-renderer/(?P<name>jetpack/field-email)	
/wp/v2/block-renderer/(?P<name>jetpack/field-url)	
/wp/v2/block-renderer/(?P<name>jetpack/field-date)	
/wp/v2/block-renderer/(?P<name>jetpack/field-telephone)	
/wp/v2/block-renderer/(?P<name>jetpack/field-textarea)	
/wp/v2/block-renderer/(?P<name>jetpack/field-checkbox)	
/wp/v2/block-renderer/(?P<name>jetpack/field-checkbox-multiple)	
/wp/v2/block-renderer/(?P<name>jetpack/field-radio)	
/wp/v2/block-renderer/(?P<name>jetpack/field-select)	
/wp/v2/block-renderer/(?P<name>core/archives)	
/wp/v2/block-renderer/(?P<name>core/calendar)	
/wp/v2/block-renderer/(?P<name>core/categories)	
/wp/v2/block-renderer/(?P<name>core/latest-posts)	
/wp/v2/block-renderer/(?P<name>core/rss)	
/wp/v2/block-renderer/(?P<name>core/search)	
/wp/v2/block-renderer/(?P<name>core/shortcode)	
/wp/v2/block-renderer/(?P<name>core/tag-cloud)	
/wp/v2/block-renderer/(?P<name>core/video)	
/wp/v2/settings	
/wp/v2/themes

Testing a bit with the various endpoints we can see that this instance is protected a bit but what about other websites?

As example Wired Italia is exposing few users with their nickname, so it is possible to do bruteforce now that we have them. It is easy to find also the admin user (or get all the users also the hidden ones in the frontend):

We shouldn’t forget that Gravatar use an hash by md5 of the email so is possible to find it in case you know that in a website the emails use the same domain and specific for every user and later you can do it for phishing easily (Thanks to Giuseppe Mazzapica for the hint). With websites like http://wordpressexpose.chrisgherbert.com/ it is possible to do a reverse from the hash and get the email from a gravatar (Thanks to Marco Chiesi for the link share)!

Investigating from the list of websites with WordPress we can see that Facebook is protected a bit, Alanis Morisette website is exposing few things like using redirection (that is a plugin for redirects that use rest API endpoints in the dashboard but doesn’t need them as public) and the users:

Also, Wil Wheaton (I will not knock his doors like Sheldon) is exposing a lot of stuff but also Obama Foundation, Mozilla Foundation, Sylvester Stallone (I am afraid of this one, Rambo V is coming in the cinema soon), Walt Disney Company and so on.

So to scrape a website is quite easy if you know how works WP (it is possible to find documentation online) but I think that as today no one was thinking that is possible to do scraping outside the web page (like a crawler). With Rest API is possible to get more information and direct from the database with a query system because is native.

You can get a specific post by id, all the articles of a category but also all the users. I am not sure about the block part that is from Gutenberg because I am not using it.

Consider that tools that let you to scrape a WP website with REST api are already avalaible on GitHub and let you to do it without use a browser. You can do also a clone of a website because you get the access to all the articles with issues on SEO side as example.

Search

The search system inside WordPress is using SQL query that look inside the post_content where there are shortcodes but also the horrible blocks of Gutenberg (yes because you didn’t saw how they save the content).


Because the post_content include them you can do query also about their HTML content, shortcodes but also block parameters. Probably you will not see them in frontend but are still there so for an intruder is quite simple understand how the website works.

Take as example the website of WordCamp Rome 2018, with the /?s= parameter is possible to search inside the website also if the widget is not present.

The term background is not included in the post as term (also because after all the website is in Italian) but we can use the same thing to look for the paragraph block to understand where it is used.

So basically we can find the post type and where there is the content that we are looking for, because Gutenberg doesn’t save the parameters in post meta (as example) so we can find as example pages where there are blocks that should be not visible to non-logged users and so on. So an attacker can get a lot of information before starting attack your website.

PS: in XML-RPC gutenberg is not rendered so require a filter to process it.

What the community can do

Rant like me in a constructive way about the choices that someone is taking in the WP ecosystem without thinking at 360 degrees.

I am joking but at WordCamp Bari 2019 (where the idea for this article born) we had a video call in streaming (during the contributor day from the Accessibility table) with the Accessibility Days in Ancona because it was the Global Awareness Accessibility Day.
I was working on VVV at Meta table, but I could listen what they were saying. One of them was asking the community why someone has chosen to put Gutenberg in a tool used by 33% of the web with all the accessibility issues that still has. We shouldn’t forget that still has a lot of them including issues on managing the project as open source (like closing tickets also if the bug is confirmed but cannot be addressed in time). Also, who chosen to integrate a tool with all of these issues for a lot of people as mandatory.

This is like against the Code of Conduct of an open source project because is not inclusive and doesn’t improve the diversity. Also, we shouldn’t forget again that WP core worked hard on getting the project WCAG compatible in the previous years and with gutenberg all of this work was put in the trash bin.

The WP project losts more then an year because of Gutenberg (patch approved and merged reverted, all the stuff that wasn’t involving GB in WP 5.0 moved to the next release). We are getting now the Fatal Error protection in 5.2 when was ready since a while as example. We got the bump to PHP 5.6 as minimum now instead of a year ago. How much time we wasted!

It is quite simple to see how many contributors left the community or are not developing anymore to WP. WP without coders is a project that cannot move on with implementing new things, improve performance, refactoring of the core (WP is basically a big legacy app that is in rewriting since years) and bugfixing.

The project is focusing on stuff not requested by the community that let stuff uncovered like the Rest API, that was the focus of few years ago that right now are still incomplete and doesn’t include natively a good authentication system.

WP is adding more stuff without think of the previous one without considering the feelings of the users. The majority of WP users are not on WP.Com but on WP.Org so they don’t need to follow the plan of this project.

WordPress doesn’t have a really community and project management lead by the community like other projects: LibreOffice, Debian, Linux, NodeJS and so on.

I followed only one meeting at WpGovernance, because my feeling was that in WP there is a lot of people that want to help but they doesn’t know how works the open source. I see that at meetup but also at other events and his is possible to see it in the community too.

What you can do

Disable the Rest api for non-logged users is the first steps and there are a lot of plugins that do that.

Check my slides about Hacking (and secure) a WordPress website that can help to understand all the various issues that you can face.

This discussion open the doors to a lot of more questions:

  • My websites data can be copied easily with Rest API opened? Yes!
  • It is possible to get the list of all the users with their nickname also if I disabled the Author archive page? Yes!
  • It is possible to find other information that wasn’t intended to be public like a page not listed? Yes!
  • It is the fault of the community? No!

Conclusion

Basically every WP instance is exposing information is various way and shapes and require plugins to change this behaviour.

The minify the effect the WP Core should be able to hide Rest API endpoints (not only disable them that are 2 different things) natively like suggested for ClassicPress, also change the SQL query to ignore stuff from HTML comments but can be expensive for a DB. So maybe instead of save everything in the post_content, maybe use other meta or save there the rendered stuff in that and the gutenberg stuff in another area.

The real priorities for WordPress should be others instead of “blocks everywhere”. Like improve performances, improve security with builtin stuff (Site Health is cool and interesting but not enough), improve code quality, add more unit tests, refactoring old stuff and not create issues that should be fixed with a plugin.

Thanks

First to Andrea Gandino because showed me the issue of the s parameter and what is possible to do it with Gutenberg. Andrea is like me a long time WordPress Core contributor, part of the Italian community, developer and with a company that do business with WP development and hate also Gutenberg for the technical choice behind it.

Second to Enrico Sorcinelli that explained to me that also XML-RPC expose stuff and how it is easy to protect a WP website with his plugin, REST XML-RPC Data Checker. Enrico is a core contributor and work mainly with legacy stuff built with WP so knows a lot about the strange behaviors of the lasts WP versions. For his job he had to protect everything and study the backwards compatibility but also the exposing of data that is not possible to do it in enterprise (and WP is used a lot).

Thanks also to Andrea Cardinali because his talk about the difference between AJAX/REST API let me understand that the topic of performance is quite important also with them.
Because you probably doesn’t know that REST API use another rewrite engine instead of WP_Rewrite of WP Core for a lot of different technical reasons and also AJAX/REST loads all the plugin also if it is not required.

2 thoughts on “Your WordPress instance is leaking data (also with Gutenberg)

  1. Its not just that .. Whenever I’m visiting a website for the first time, I’m playing a mental contest with myself called “Is this a WordPress website?” .. and if so, WHICH version is installed?

    Even a “security through obscurity” site is leaking so much data and information ..
    The usual give-away is the xml-rpc, the wp.org or “http://gmpg.org/xfn/11” mentioned right in the source code (where else?), others are version numbers for non-version-numbered JS and CSS assets, the “wp-content” directory showing up somewhere …

    … and even IF ALL THAT is somehow hidden away (eg. by using a CDN and forcibly changing the default paths via constant etc. pp) – just by looking up which jQuery version is being used, can be a strong contender for the WP version in use.

    cu, w0lf.

    1. The point is not detect if the website is built with WordPress but get the information from that site that shouldn’t be public.

Leave a Reply

Your email address will not be published. Required fields are marked *

Your WordPress instance is leaking data (also with Gutenberg)

time to read: 11 min
2