Web Bot Auth use Cases

Internet-Draft	web-bot-auth-use-cases	April 2025
Hoyland & Hendrickson	Expires 18 October 2025	[Page]

Abstract

TODO Abstract¶

This work originated from a discussion that occured at an IETF 122 Side Meeting on Authentication Bot Traffic on the web.¶

About This Document

This note is to be removed before publishing as an RFC.¶

Status information for this document may be found at https://datatracker.ietf.org/doc/draft-jhoyla-bot-auth-use-cases/.¶

Discussion of this document takes place on the WG Working Group mailing list (mailto:WG@example.com), which is archived at https://example.com/WG.¶

Source for this draft and an issue tracker can be found at https://github.com/https://github.com/jhoyla/draft-jhoyla-bot-auth-use-cases.¶

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶

This Internet-Draft will expire on 18 October 2025.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1. Introduction

The Web is increasingly accessed not just by human users operating browsers, but also by automated, non-human clients. These range from traditional web crawlers to AI agent systems¶

Currently, authenticating the operator of these non-human clients often relies on IP address checking, reverse DNS, and inspecting the User-Agent request header. These methods are known to be brittle (e.g. IP addresses rotating), easily spoofed (e.g. User Agents), and can negatively impact security and network operations (due to IP address sharing or rotation).¶

Any alternative authentication solution must operate within the context of the existing Web infrastructure, primarily designed for human interaction via web browsers. Solutions must be layered onto protocols like HTTP and TLS without breaking existing sites or causing detrimental user experiences (e.g., unexpected authentication prompts for humans).¶

This document is designed to serve as an informational starting point for potential future work in this area.¶

2. 2. Scope

The document is specifically focused on authenticating non-human clients on the human-oriented Web. Note this may include automated systems acting on behalf of a human user.¶

This intentionally excludes:¶

Authentication for APIs, although it is acknowledged that APIs are often used in composing modern Web sites.¶
Authorisation methods such as OAUTH, which authorise a client to access specific resources on a site.¶

3. 3. Use Case

Enable sites to reliably identify known, well-behaved crawlers used for purposes like search engine indexing, URL scanning services, web archiving, LLM training, and LLM inference. This allows sites or their subcomponents (e.g. captcha providers) to grant them different access levels or rate limits compared to unidentified traffic. This is currently implemented today through serveral common practices:¶

As mentioned in Section 1, common practices today include:¶

IP Address Allow/Block Lists: Maintaining lists of IP addresses or CIDR ranges associated with known bots. This is fragile due to dynamic IP allocation (e.g. on Clouds), a lack of consistent IP list discovery mechanism, stale IP lists, and shared IPs on CGNATs.¶
User-Agent String Inspection: Checking the User-Agent request header for strings declared by known bots. This is easily copied and offers no cryptographic assurance.¶

These methods lack cryptographic robustness, are often inaccurate, do not offer an owner discovery mechanism, and can lead to both misclassifying human traffic as non-human, and traffic getting incorrectly classified.¶

4. 4. Candidate Mechanisms

4.1. 4.1. HTTP Signatures

Using HTTP Signatures [RFC9421] (or a similar JWT approach) allows a client to sign parts of an HTTP request using an asymmetric key pair, providing authenticity and integrity assurance at the application layer. Key considerations include:¶

Replay Attacks: Signatures could potentially be replayed across different hosts or contexts. Signing components that bind the signature to the specific request context (e.g., the Host header, a timestamp, a nonce) is required for mitigation.¶
Key Management: Similar to mTLS, a mechanism is needed for discovering, provisioning, and managing the keys used for signing.¶

4.2. 4.2. Mutual TLS (mTLS)

Using TLS client certificates (mTLS) [RFC8446] for authentication provides cryptographic authentication tied to the TLS layer. Key challenges include:¶

User Experience: Browsers often present confusing or intrusive certificate selection dialogs to users if mTLS is requested unconditionally. draft-jhoyla-req-mtls-flag was presented as a potential way to mitigate this by allowing servers to signal optional client certificate authentication support, potentially avoiding issues for users without (or clients that don't support) certificates. Support for this flag will be needed at all TLS Servers and Clients using this method to positively authorize an interaction.¶
Deployment: Provisioning, managing, and revoking client certificates at scale can be complex, and must be carefully considered by the client owner.¶

8. References

8.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/rfc/rfc8174>.

8.2. Informative References

[RFC8446]: Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, <https://www.rfc-editor.org/rfc/rfc8446>.
[RFC9421]: Backman, A., Ed., Richer, J., Ed., and M. Sporny, "HTTP Message Signatures", RFC 9421, DOI 10.17487/RFC9421, February 2024, <https://www.rfc-editor.org/rfc/rfc9421>.