From b169cb876790c324728af660a539b2d6830dc2f6 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Tue, 6 Feb 2024 17:19:39 -0700
Subject: [PATCH 01/16] Add UTF-8 over unsafe contiguous storage proposal

---
 proposals/nnnn-utf-8-unsafe-contiguous.md | 559 ++++++++++++++++++++++
 1 file changed, 559 insertions(+)
 create mode 100644 proposals/nnnn-utf-8-unsafe-contiguous.md
diff --git a/proposals/nnnn-utf-8-unsafe-contiguous.md b/proposals/nnnn-utf-8-unsafe-contiguous.md
new file mode 100644
index 0000000000..79f1d10935
--- /dev/null
+++ b/proposals/nnnn-utf-8-unsafe-contiguous.md
@@ -0,0 +1,559 @@
+<!-- utf8_processing.md -->
+
+# UTF-8 Processing Over Unsafe Contiguous Bytes
+
+## Introduction and Motivation
+
+Native `String`s are stored as validly-encoded UTF-8 bytes in a contiguous memory buffer. The standard library implements `String` functionality on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose exposing this functionality as API for more advanced libraries and developers.
+
+This pitch focuses on a portion of the broader API and functionality discussed in [Pitch: Unicode Processing APIs](https://forums.swift.org/t/pitch-unicode-processing-apis/69294). That broader pitch can be divided into 3 kinds of API additions:
+
+1. Unicode processing API for working with contiguously-stored valid UTF-8 bytes
+2. `Element`-based stream processing functionality. E.g., a stream of `UInt8` can be turned into a stream of `Unicode.Scalar` or `Character`s.
+3. Stream-of-buffers processing functionality, which provides a lower-level / more efficient implementation for the second area.
+
+This pitch focuses on the first.
+
+## Proposed Solution
+
+We propose `UnsafeValidUTF8BufferPointer` which exposes a similar API surface as `String` for validly-encoded UTF-8 code units in contiguous memory.
+
+
+## Detailed Design
+
+`UnsafeValidUTF8BufferPointer` consists of a (non-optional) raw pointer and a length, with some flags bit-packed in.
+
+```swift
+/// An unsafe buffer pointer to validly-encoded UTF-8 code units stored in
+/// contiguous memory.
+///
+/// UTF-8 validity is checked upon creation.
+///
+/// `UnsafeValidUTF8BufferPointer` does not manage the memory or guarantee
+/// memory safety. Any overlapping writes into the memory can lead to undefined 
+/// behavior.
+///
+@frozen
+public struct UnsafeValidUTF8BufferPointer {
+  @usableFromInline
+  internal var _baseAddress: UnsafeRawPointer
+
+  // A bit-packed count and flags (such as isASCII)
+  @usableFromInline
+  internal var _countAndFlags: UInt64
+}
+```
+
+It differs from `UnsafeRawBufferPointer` in that its contents, upon construction, are guaranteed to be validly-encoded UTF-8. This guarantee speeds up processing significantly relative to performing validation on every read. It is unsafe because it is an API surface on top of `UnsafeRawPointer`, inheriting all the unsafety therein and developers must manually guarantee invariants such as lifetimes and exclusivity. It is further based on `UnsafeRawPointer` instead of `UnsafePointer<UInt8>` so as not to [bind memory to a type](https://developer.apple.com/documentation/swift/unsaferawpointer#Typed-Memory).
+
+
+### Validation and creation
+
+`UnsafeValidUTF8BufferPointer` is validated at initialization time, and encoding errors are thrown.
+
+```swift
+extension Unicode.UTF8 {
+  @frozen
+  public enum EncodingErrorKind: Error {
+    case unexpectedContinuationByte
+    case expectedContinuationByte
+    case overlongEncoding
+    case invalidCodePoint
+
+    case invalidStarterByte
+
+    case unexpectedEndOfInput
+  }
+}
+```
+
+```swift
+// All the initializers below are `throw`ing, as they validate the contents
+// upon construction.
+extension UnsafeValidUTF8BufferPointer {
+  @frozen
+  public struct DecodingError: Error, Sendable, Hashable, Codable {
+    public var kind: UTF8.EncodingErrorKind
+    public var offsets: Range<Int>
+  }
+
+  // ABI traffics in `Result`
+  @usableFromInline
+  internal static func _validate(
+    baseAddress: UnsafeRawPointer, length: Int
+  ) -> Result<UnsafeValidUTF8BufferPointer, DecodingError>
+
+  @_alwaysEmitIntoClient
+  public init(baseAddress: UnsafeRawPointer, length: Int) throws(DecodingError)
+
+  @_alwaysEmitIntoClient
+  public init(nulTerminatedCString: UnsafeRawPointer) throws(DecodingError)
+
+  @_alwaysEmitIntoClient
+  public init(nulTerminatedCString: UnsafePointer<CChar>) throws(DecodingError)
+
+  @_alwaysEmitIntoClient
+  public init(_: UnsafeRawBufferPointer) throws(DecodingError)
+
+  @_alwaysEmitIntoClient
+  public init(_: UnsafeBufferPointer<UInt8>) throws(DecodingError)
+}
+```
+
+#### Unsafety and encoding validity
+
+Every way to construct a `UnsafeValidUTF8BufferPointer` ensures that its contents are validly-encoded UTF-8. Thus, it has no new source of unsafety beyond the unsafety inherent in unsafe pointer's requirement that lifetime and exclusive access be manually enforced by the programmer. A write into this memory which violates encoding validity would also violate exclusivity.
+
+If we did not guarantee UTF-8 encoding validity, we'd be open to new security and safety concerns beyond unsafe pointers.
+
+With invalidly-encoded contents, memory safety would become more nuanced. An ill-formed leading byte can dictate a scalar length that is longer than the memory buffer. The buffer may have bounds associated with it, which differs from the bounds dictated by its contents.
+
+Additionally, a particular scalar value in valid UTF-8 has only one encoding, but invalid UTF-8 could have the same value encoded as an [overlong encoding](https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings), which would compromise code that checks for the presence of a scalar value by looking at the encoded bytes (or that does a byte-wise comparison).
+
+`UnsafeValidUTF8BufferPointer` is unsafe in the all ways that unsafe pointers are unsafe, but not in more ways.
+
+
+### Accessing contents
+
+Flags and raw contents can be accessed:
+
+```swift
+extension UnsafeValidUTF8BufferPointer {
+  /// Returns whether the validated contents were all-ASCII. This is checked at
+  /// initialization time and remembered.
+  @inlinable
+  public var isASCII: Bool
+
+  /// Access the underlying raw bytes
+  @inlinable
+  public var rawBytes: UnsafeRawBufferPointer
+}
+```
+
+Like `String`, `UnsafeValidUTF8BufferPointer` provides views for accessing `Unicode.Scalar`s, `UTF16.CodeUnit`s, and `Character`s.
+
+```swift
+extension UnsafeValidUTF8BufferPointer {
+  /// A view of the buffer's contents as a bidirectional collection of `Unicode.Scalar`s.
+  @frozen
+  public struct UnicodeScalarView {
+    public var buffer: UnsafeValidUTF8BufferPointer
+
+    @inlinable
+    public init(_ buffer: UnsafeValidUTF8BufferPointer)
+  }
+
+  @inlinable
+  public var unicodeScalars: UnicodeScalarView
+
+  /// A view of the buffer's contents as a bidirectional collection of `Character`s.
+  @frozen
+  public struct CharacterView {
+    public var buffer: UnsafeValidUTF8BufferPointer
+
+    @inlinable
+    public init(_ buffer: UnsafeValidUTF8BufferPointer)
+  }
+
+  @inlinable
+  public var characters: CharacterView
+
+  /// A view off the buffer's contents as a bidirectional collection of transcoded
+  /// `UTF16.CodeUnit`s.
+  @frozen
+  public struct UTF16View {
+    public var buffer: UnsafeValidUTF8BufferPointer
+
+    @inlinable
+    public init(_ buffer: UnsafeValidUTF8BufferPointer)
+  }
+
+  @inlinable
+  public var utf16: UTF16View
+}
+```
+
+These are bidirectional collections, as in `String`. Their indices, however, are distinct from each other because they mean different things. For example, a scalar-view index is scalar aligned but not necessarily `Character` aligned, and a transcoded index which points mid-scalar doesn't have a corresponding position in the raw bytes.
+
+```swift
+extension UnsafeValidUTF8BufferPointer.UnicodeScalarView: BidirectionalCollection {
+  public typealias Element = Unicode.Scalar
+
+  @frozen
+  public struct Index: Comparable, Hashable {
+    @usableFromInline
+    internal var _byteOffset: Int
+
+    @inlinable
+    public var byteOffset: Int { get }
+
+    @inlinable
+    public static func < (lhs: Self, rhs: Self) -> Bool
+
+    @inlinable
+    internal init(_uncheckedByteOffset offset: Int)
+  }
+
+  @inlinable
+  public subscript(position: Index) -> Element { _read }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public var startIndex: Index
+
+  @inlinable
+  public var endIndex: Index
+}
+
+
+extension UnsafeValidUTF8BufferPointer.CharacterView: BidirectionalCollection {
+  public typealias Element = Character
+
+  @frozen
+  public struct Index: Comparable, Hashable {
+    @usableFromInline
+    internal var _byteOffset: Int
+
+    @inlinable
+    public var byteOffset: Int { get }
+
+    @inlinable
+    public static func < (lhs: Self, rhs: Self) -> Bool
+
+    @inlinable
+    internal init(_uncheckedByteOffset offset: Int)
+  }
+
+  // Custom-defined for performance to avoid double-measuring
+  // grapheme cluster length
+  @frozen
+  public struct Iterator: IteratorProtocol {
+    @usableFromInline
+    internal var _buffer: UnsafeValidUTF8BufferPointer
+
+    @usableFromInline
+    internal var _position: Index
+
+    @inlinable
+    public var buffer: UnsafeValidUTF8BufferPointer { get }
+
+    @inlinable
+    public var position: Index { get }
+
+    public typealias Element = Character
+
+    public mutating func next() -> Character?
+
+    @inlinable
+    internal init(
+      _buffer: UnsafeValidUTF8BufferPointer, _position: Index
+    )
+  }
+
+  @inlinable
+  public func makeIterator() -> Iterator
+
+  @inlinable
+  public subscript(position: Index) -> Element { _read }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public var startIndex: Index
+
+  @inlinable
+  public var endIndex: Index
+}
+
+extension UnsafeValidUTF8BufferPointer.UTF16View: BidirectionalCollection {
+  public typealias Element = Unicode.Scalar
+
+  @frozen
+  public struct Index: Comparable, Hashable {
+    // Bitpacked byte offset and transcoded offset
+    @usableFromInline
+    internal var _byteOffsetAndTranscodedOffset: UInt64
+
+    /// Offset of the first byte of the currently-indexed scalar
+    @inlinable
+    public var byteOffset: Int { get }
+
+    /// Offset of the transcoded code unit within the currently-indexed scalar
+    @inlinable
+    public var transcodedOffset: Int { get }
+
+    @inlinable
+    public static func < (lhs: Self, rhs: Self) -> Bool
+
+    @inlinable
+    internal init(
+      _uncheckedByteOffset offset: Int, _transcodedOffset: Int
+    )
+  }
+
+  @inlinable
+  public subscript(position: Index) -> Element { _read }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public var startIndex: Index
+
+  @inlinable
+  public var endIndex: Index
+}
+```
+
+### Canonical equivalence
+
+```swift
+// Canonical equivalence
+extension UnsafeValidUTF8BufferPointer {
+  /// Whether `self` is equivalent to `other` under Unicode Canonical Equivalance
+  public func isCanonicallyEquivalent(
+    to other: UnsafeValidUTF8BufferPointer
+  ) -> Bool
+
+  /// Whether `self` orders less than `other` (under Unicode Canonical Equivalance
+  /// using normalized code-unit order)
+  public func isCanonicallyLessThan(
+    _ other: UnsafeValidUTF8BufferPointer
+  ) -> Bool
+}
+```
+
+
+
+## Alternatives Considered
+
+### Other names
+
+We're not particularly attached to the name `UnsafeValidUTF8BufferPointer`. Other names could include:
+
+- `UnsafeValidUTF8CodeUnitBufferPointer`
+- `UTF8.UnsafeValidBufferPointer`
+- `UTF8.UnsafeValidCodeUnitBufferPointer`
+- `UTF8.ValidlyEncodedCodeUnitUnsafeBufferPointer`
+- `UnsafeContiguouslyStoredValidUTF8CodeUnitsBuffer`
+
+etc.
+
+For `isCanonicallyLessThan`, another name could be `canonicallyPrecedes`, `lexicographicallyPrecedesUnderNFC`, etc.
+
+### Static methods instead of initializers
+
+`UnsafeValidUTF8BufferPointer`s could instead be created by static methods on `UTF8`:
+
+```swift
+extension Unicode.UTF8 {
+  static func validate(
+    ...
+  ) throws -> UnsafeValidUTF8BufferPointer
+}
+```
+
+### Hashable and other conformances
+
+`UnsafeValidUTF8BufferPointer` follows `UnsafeRawBufferPointer` and `UnsafeBufferPointer` in not conforming to `Sendable`, `Hashable`, `Equatable`, `Comparable`, `Codable`, etc.
+
+### `UTF8.EncodingErrorKind` as a `struct`
+
+We may want to use the [raw-representable struct pattern](https://github.com/apple/swift-system/blob/9a812b5fef1e7f27f8594fee5463bd88c5b691ec/Sources/System/Errno.swift#L14) for `UTF8.EncodingErrorKind` instead of an exhaustive enum. That is, we may want to define it as:
+
+```swift
+extension Unicode.UTF8 {
+  @frozen
+  public struct EncodingErrorKind: Error, Sendable, Hashable, Codable {
+    public var rawValue: UInt8
+
+    @inlinable
+    public init(rawValue: UInt8) {
+      self.rawValue = rawValue
+    }
+
+    @inlinable
+    public static var unexpectedContinuationByte: Self {
+      .init(rawValue: 0x01)
+    }
+
+    @inlinable
+    public static var overlongEncoding: Self {
+      .init(rawValue: 0x02)
+    }
+
+    // ...
+  }
+}
+```
+
+This would allow us to grow the kinds or errors or else add some error-nuance to the future, at the loss of exhaustive switches inside `catch`es.
+
+For example, an unexpected-end-of-input error, which happens when a scalar is in the process of being decoded but not enough bytes have been read, could be reported in different ways. It could be reported as a distinct kind of error (particularly useful for stream processing which may want to resume with more content) or it could be a `expectedContinuationByte` covering the end-of-input position. As a value, it could have a distinct value or be an alias to the same value.
+
+
+
+
+## Future Directions
+
+### A non-escapable `ValidUTF8BufferView`
+
+Future improvements to Swift enable a non-escapable type (["BufferView"](https://github.com/atrick/swift-evolution/blob/fd63292839808423a5062499f588f557000c5d15/visions/language-support-for-BufferView.md)) to provide safely-unmanaged buffers via dependent lifetimes for use within a limited scope. We should add a corresponding type for validly-encoded UTF-8 contents, following the same API shape.
+
+
+### Shared-ownership buffer
+
+We could propose a managed or shared-ownership validly-encoded UTF-8 buffer. E.g.:
+
+```swift
+struct ValidlyEncodedUTF8SharedBuffer {
+  var contents: UnsafeValidlyEncodedUTF8BufferPointer
+  var owner: AnyObject?
+}
+```
+
+where "shared" denotes that ownership is shared with the `owner` field, as opposed to an allocation exclusively managed by this type (the way `Array` or `String` would). Thus, it could be backed by a native `String`, an instance of `Data` or `Array<UInt8>` (if ensured to be validly encoded), etc., which participate fully in their COW semantics by retaining their storage.
+
+This would enable us to create shared strings, e.g.
+
+```swift
+extension String {
+  /// Does not copy the given storage, rather shares it
+  init(sharing: ValidlyEncodedUTF8SharedBuffer)
+}
+```
+
+Also, this could allow us to present API which repairs invalid contents, since a repair operation would need to create and manage its own allocation.
+
+
+#### Alternative: More general formulation (💥🐮)
+
+We could add the more general ["deconstructed COW"](https://forums.swift.org/t/idea-bytes-literal/44124/50)
+
+```swift
+/// A buffer of `T`s in contiguous memory
+struct SharedContiguousStorage<T> {
+  var rawContents: UnsafeRawBufferPointer
+  var owner: AnyObject?
+}
+```
+
+where the choice of `Raw` pointers is necessary to avoid type-binding the memory, but other designs are possible too. 
+
+However, this type alone loses static knowledge of the UTF-8 validity, so we'd still need a separate type for validly encoded UTF-8.
+
+Instead, we could parameterize over a unsafe-buffer-pointer-like protocol:
+
+```swift
+struct SharedContiguousStorage<UnsafeBuffer: UnsafeBufferPointerProtocol> {
+  var contents: UnsafeBuffer
+  var owner: AnyObject?    
+}
+
+extension String {
+  /// Does not copy the given storage, rather shares it
+  init(sharing: SharedContiguousStorage<UnsafeValidUTF8BufferPointer>)
+}
+```
+
+Accessing the stored pointer would still need to be done carefully, as it would have lifetime dependent on `owner`. In current Swift, that would likely need to be done via a closure-taking API.
+
+
+### `protocol ContiguouslyStoredValidUTF8`
+
+We could define a protocol for validly-encoded UTF-8 bytes in contiguous memory, somewhat analogous to a low-level `StringProtocol`. Both an unsafe and a shared-ownership type could conform to provide the same API.
+
+However, we'd want to be careful to future-proof such a protocol so that a  `ValidUTF8BufferView` could conform as well. In the mean-time, even if we go with adding a shared-ownership type, Unicode processing operations can be performed by accessing the unsafe buffer pointer.
+
+### Extend to `Element`-based or buffer-based streams
+
+We could define a segment of validly encoded UTF-8, which is not necessarily aligned across any particular boundary. This would be a significantly different API shape than `String`'s views. Accessing the start of content would require passing in initial state and reaching the end would produce a state to be fed into the next segment. 
+
+It would make an awkward fit directly on top of `Collection`, so this would be a new API shape. For example, it could be akin to a `StatefulCollection` that in addition to having `startIndex/endIndex` would have `startState/endState`. Concerns such as bidirectionality, where exactly `endIndex` points to (the start or end of the partial value at the tail), etc, requires further thought.
+
+### Regex or regex-like support
+
+Future API additions would be to support `Regex`es on such buffers. 
+
+Another future direction could be to add many routines corresponding to the underlying operations performed by the regex engine, such as:
+
+```swift
+extension UnsafeValidUTF8BufferPointer.CharacterView {
+  func matchCharacterClass(
+    _: CharacterClass,
+    startingAt: Index,
+    limitedBy: Index    
+  ) throws -> Index?
+
+  func matchQuantifiedCharacterClass(
+    _: CharacterClass,
+    _: QuantificationDescription,
+    startingAt: Index,
+    limitedBy: Index    
+  ) throws -> Index?
+}
+```
+
+which would be useful for parser-combinator libraries who wish to expose `String`'s model of Unicode by using the stdlib's accelerated implementation.
+
+### Transcoded views, normalized views, case-folded views, etc
+
+We could provide lazily transcoded, normalized, case-folded, etc., views. If we do any of these for `UnsafeValidUTF8BufferPointer`, we should consider adding equivalents on `String`, `Substring`, etc. If we were to make any new protocols or changes to protocols, we'd want to also future-proof for a `ValidUTF8BufferView`.
+
+For example, transcoded views can be generalized:
+
+```swift
+extension UnsafeValidUTF8BufferPointer {
+  /// A view off the buffer's contents as a bidirectional collection of transcoded
+  /// `Encoding.CodeUnit`s.
+  @frozen
+  public struct TranscodedView<Encoding: _UnicodeEncoding> {
+    public var buffer: UnsafeValidUTF8BufferPointer
+
+    @inlinable
+    public init(_ buffer: UnsafeValidUTF8BufferPointer)
+  }
+}
+```
+
+Note that since UTF-16 has such historical significance that even with a fully-generic transcoded view, we'd likely want a dedicated, specialized type for UTF-16.
+
+We could similarly provide lazily-normalized views of code units or scalars under NFC or NFD (which the stdlib already distributes data tables for), possibly generic via a protocol for 3rd party normal forms.
+
+Finally, case-folded functionality can be accessed in today's Swift via [scalar properties](https://developer.apple.com/documentation/swift/unicode/scalar/properties-swift.struct), but we could provide convenience collections ourselves as well.
+
+
+### UTF-8 to/from UTF-16 breadcrumbs API
+
+String's implementation caches distances between UTF-8 and UTF-16 views, as some imported Cocoa APIs use random access to the UTF-16 view. We could formalize and expose API for this.
+
+
+### `NUL`-termination concerns and C bridging
+
+`UnsafeValidUTF8BufferPointer` is capable of housing interior `NUL` characters, just like `String`. We could add additional flags and initialization options to detect a trailing `NUL` byte beyond the count and treat it as a terminator. In those cases, we could provide a `withCStringIfAvailable` style API.
+
+### Index rounding operations
+
+Unlike String, `UnsafeValidUTF8BufferPointer`'s view's `Index` types are distinct, which avoids a [mess of problems](https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946). Interesting additions to both `UnsafeValidUTF8BufferPointer` and `String` would be explicit index-rounding for a desired behavior.
+
+
+### Canonical Spaceships
+
+Should a `ComparisonResult` (or [spaceship](https://forums.swift.org/t/pitch-comparison-reform/5662)) be added to Swift, we could support that operation under canonical equivalence in a single pass rather than subsequent calls to `isCanonicallyEquivalent(to:)` and `isCanonicallyLessThan(_:)`.
+
+
+### Other Unicode functionality
+
+For the purposes of this pitch, we're not looking to expand the scope of functionality beyond what the stdlib already does in support of `String`'s API. Other functionality can be considered future work.

From b9f727aa9fe57959f1b588d447c55370010ed68a Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Tue, 6 Feb 2024 17:21:43 -0700
Subject: [PATCH 02/16] Header

---
 proposals/nnnn-utf-8-unsafe-contiguous.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/proposals/nnnn-utf-8-unsafe-contiguous.md b/proposals/nnnn-utf-8-unsafe-contiguous.md
index 79f1d10935..92bf3eafd4 100644
--- a/proposals/nnnn-utf-8-unsafe-contiguous.md
+++ b/proposals/nnnn-utf-8-unsafe-contiguous.md
@@ -1,7 +1,13 @@
-<!-- utf8_processing.md -->
-
 # UTF-8 Processing Over Unsafe Contiguous Bytes
 
+* Proposal: [SE-NNNN](nnnn-utf-8-unsafe-contiguous.md)
+* Authors: [Michael Ilseman](https://github.com/milseman)
+* Review Manager: TBD
+* Status: **Awaiting implementation**
+* Implementation: (pending)
+* Upcoming Feature Flag: (pending)
+* Review: ([pitch](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
+
 ## Introduction and Motivation
 
 Native `String`s are stored as validly-encoded UTF-8 bytes in a contiguous memory buffer. The standard library implements `String` functionality on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose exposing this functionality as API for more advanced libraries and developers.

From 7542cc5a289d9d134c819316e1702cdbe34b2f31 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 6 May 2024 11:44:36 -0600
Subject: [PATCH 03/16] Update to be a span

---
 proposals/nnnn-utf-8-unsafe-contiguous.md | 565 ----------------
 proposals/nnnn-utf8-span.md               | 756 ++++++++++++++++++++++
 2 files changed, 756 insertions(+), 565 deletions(-)
 delete mode 100644 proposals/nnnn-utf-8-unsafe-contiguous.md
 create mode 100644 proposals/nnnn-utf8-span.md

diff --git a/proposals/nnnn-utf-8-unsafe-contiguous.md b/proposals/nnnn-utf-8-unsafe-contiguous.md
deleted file mode 100644
index 92bf3eafd4..0000000000
--- a/proposals/nnnn-utf-8-unsafe-contiguous.md
+++ /dev/null
@@ -1,565 +0,0 @@
-# UTF-8 Processing Over Unsafe Contiguous Bytes
-
-* Proposal: [SE-NNNN](nnnn-utf-8-unsafe-contiguous.md)
-* Authors: [Michael Ilseman](https://github.com/milseman)
-* Review Manager: TBD
-* Status: **Awaiting implementation**
-* Implementation: (pending)
-* Upcoming Feature Flag: (pending)
-* Review: ([pitch](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
-
-## Introduction and Motivation
-
-Native `String`s are stored as validly-encoded UTF-8 bytes in a contiguous memory buffer. The standard library implements `String` functionality on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose exposing this functionality as API for more advanced libraries and developers.
-
-This pitch focuses on a portion of the broader API and functionality discussed in [Pitch: Unicode Processing APIs](https://forums.swift.org/t/pitch-unicode-processing-apis/69294). That broader pitch can be divided into 3 kinds of API additions:
-
-1. Unicode processing API for working with contiguously-stored valid UTF-8 bytes
-2. `Element`-based stream processing functionality. E.g., a stream of `UInt8` can be turned into a stream of `Unicode.Scalar` or `Character`s.
-3. Stream-of-buffers processing functionality, which provides a lower-level / more efficient implementation for the second area.
-
-This pitch focuses on the first.
-
-## Proposed Solution
-
-We propose `UnsafeValidUTF8BufferPointer` which exposes a similar API surface as `String` for validly-encoded UTF-8 code units in contiguous memory.
-
-
-## Detailed Design
-
-`UnsafeValidUTF8BufferPointer` consists of a (non-optional) raw pointer and a length, with some flags bit-packed in.
-
-```swift
-/// An unsafe buffer pointer to validly-encoded UTF-8 code units stored in
-/// contiguous memory.
-///
-/// UTF-8 validity is checked upon creation.
-///
-/// `UnsafeValidUTF8BufferPointer` does not manage the memory or guarantee
-/// memory safety. Any overlapping writes into the memory can lead to undefined 
-/// behavior.
-///
-@frozen
-public struct UnsafeValidUTF8BufferPointer {
-  @usableFromInline
-  internal var _baseAddress: UnsafeRawPointer
-
-  // A bit-packed count and flags (such as isASCII)
-  @usableFromInline
-  internal var _countAndFlags: UInt64
-}
-```
-
-It differs from `UnsafeRawBufferPointer` in that its contents, upon construction, are guaranteed to be validly-encoded UTF-8. This guarantee speeds up processing significantly relative to performing validation on every read. It is unsafe because it is an API surface on top of `UnsafeRawPointer`, inheriting all the unsafety therein and developers must manually guarantee invariants such as lifetimes and exclusivity. It is further based on `UnsafeRawPointer` instead of `UnsafePointer<UInt8>` so as not to [bind memory to a type](https://developer.apple.com/documentation/swift/unsaferawpointer#Typed-Memory).
-
-
-### Validation and creation
-
-`UnsafeValidUTF8BufferPointer` is validated at initialization time, and encoding errors are thrown.
-
-```swift
-extension Unicode.UTF8 {
-  @frozen
-  public enum EncodingErrorKind: Error {
-    case unexpectedContinuationByte
-    case expectedContinuationByte
-    case overlongEncoding
-    case invalidCodePoint
-
-    case invalidStarterByte
-
-    case unexpectedEndOfInput
-  }
-}
-```
-
-```swift
-// All the initializers below are `throw`ing, as they validate the contents
-// upon construction.
-extension UnsafeValidUTF8BufferPointer {
-  @frozen
-  public struct DecodingError: Error, Sendable, Hashable, Codable {
-    public var kind: UTF8.EncodingErrorKind
-    public var offsets: Range<Int>
-  }
-
-  // ABI traffics in `Result`
-  @usableFromInline
-  internal static func _validate(
-    baseAddress: UnsafeRawPointer, length: Int
-  ) -> Result<UnsafeValidUTF8BufferPointer, DecodingError>
-
-  @_alwaysEmitIntoClient
-  public init(baseAddress: UnsafeRawPointer, length: Int) throws(DecodingError)
-
-  @_alwaysEmitIntoClient
-  public init(nulTerminatedCString: UnsafeRawPointer) throws(DecodingError)
-
-  @_alwaysEmitIntoClient
-  public init(nulTerminatedCString: UnsafePointer<CChar>) throws(DecodingError)
-
-  @_alwaysEmitIntoClient
-  public init(_: UnsafeRawBufferPointer) throws(DecodingError)
-
-  @_alwaysEmitIntoClient
-  public init(_: UnsafeBufferPointer<UInt8>) throws(DecodingError)
-}
-```
-
-#### Unsafety and encoding validity
-
-Every way to construct a `UnsafeValidUTF8BufferPointer` ensures that its contents are validly-encoded UTF-8. Thus, it has no new source of unsafety beyond the unsafety inherent in unsafe pointer's requirement that lifetime and exclusive access be manually enforced by the programmer. A write into this memory which violates encoding validity would also violate exclusivity.
-
-If we did not guarantee UTF-8 encoding validity, we'd be open to new security and safety concerns beyond unsafe pointers.
-
-With invalidly-encoded contents, memory safety would become more nuanced. An ill-formed leading byte can dictate a scalar length that is longer than the memory buffer. The buffer may have bounds associated with it, which differs from the bounds dictated by its contents.
-
-Additionally, a particular scalar value in valid UTF-8 has only one encoding, but invalid UTF-8 could have the same value encoded as an [overlong encoding](https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings), which would compromise code that checks for the presence of a scalar value by looking at the encoded bytes (or that does a byte-wise comparison).
-
-`UnsafeValidUTF8BufferPointer` is unsafe in the all ways that unsafe pointers are unsafe, but not in more ways.
-
-
-### Accessing contents
-
-Flags and raw contents can be accessed:
-
-```swift
-extension UnsafeValidUTF8BufferPointer {
-  /// Returns whether the validated contents were all-ASCII. This is checked at
-  /// initialization time and remembered.
-  @inlinable
-  public var isASCII: Bool
-
-  /// Access the underlying raw bytes
-  @inlinable
-  public var rawBytes: UnsafeRawBufferPointer
-}
-```
-
-Like `String`, `UnsafeValidUTF8BufferPointer` provides views for accessing `Unicode.Scalar`s, `UTF16.CodeUnit`s, and `Character`s.
-
-```swift
-extension UnsafeValidUTF8BufferPointer {
-  /// A view of the buffer's contents as a bidirectional collection of `Unicode.Scalar`s.
-  @frozen
-  public struct UnicodeScalarView {
-    public var buffer: UnsafeValidUTF8BufferPointer
-
-    @inlinable
-    public init(_ buffer: UnsafeValidUTF8BufferPointer)
-  }
-
-  @inlinable
-  public var unicodeScalars: UnicodeScalarView
-
-  /// A view of the buffer's contents as a bidirectional collection of `Character`s.
-  @frozen
-  public struct CharacterView {
-    public var buffer: UnsafeValidUTF8BufferPointer
-
-    @inlinable
-    public init(_ buffer: UnsafeValidUTF8BufferPointer)
-  }
-
-  @inlinable
-  public var characters: CharacterView
-
-  /// A view off the buffer's contents as a bidirectional collection of transcoded
-  /// `UTF16.CodeUnit`s.
-  @frozen
-  public struct UTF16View {
-    public var buffer: UnsafeValidUTF8BufferPointer
-
-    @inlinable
-    public init(_ buffer: UnsafeValidUTF8BufferPointer)
-  }
-
-  @inlinable
-  public var utf16: UTF16View
-}
-```
-
-These are bidirectional collections, as in `String`. Their indices, however, are distinct from each other because they mean different things. For example, a scalar-view index is scalar aligned but not necessarily `Character` aligned, and a transcoded index which points mid-scalar doesn't have a corresponding position in the raw bytes.
-
-```swift
-extension UnsafeValidUTF8BufferPointer.UnicodeScalarView: BidirectionalCollection {
-  public typealias Element = Unicode.Scalar
-
-  @frozen
-  public struct Index: Comparable, Hashable {
-    @usableFromInline
-    internal var _byteOffset: Int
-
-    @inlinable
-    public var byteOffset: Int { get }
-
-    @inlinable
-    public static func < (lhs: Self, rhs: Self) -> Bool
-
-    @inlinable
-    internal init(_uncheckedByteOffset offset: Int)
-  }
-
-  @inlinable
-  public subscript(position: Index) -> Element { _read }
-
-  @inlinable
-  public func index(after i: Index) -> Index
-
-  @inlinable
-  public func index(before i: Index) -> Index
-
-  @inlinable
-  public var startIndex: Index
-
-  @inlinable
-  public var endIndex: Index
-}
-
-
-extension UnsafeValidUTF8BufferPointer.CharacterView: BidirectionalCollection {
-  public typealias Element = Character
-
-  @frozen
-  public struct Index: Comparable, Hashable {
-    @usableFromInline
-    internal var _byteOffset: Int
-
-    @inlinable
-    public var byteOffset: Int { get }
-
-    @inlinable
-    public static func < (lhs: Self, rhs: Self) -> Bool
-
-    @inlinable
-    internal init(_uncheckedByteOffset offset: Int)
-  }
-
-  // Custom-defined for performance to avoid double-measuring
-  // grapheme cluster length
-  @frozen
-  public struct Iterator: IteratorProtocol {
-    @usableFromInline
-    internal var _buffer: UnsafeValidUTF8BufferPointer
-
-    @usableFromInline
-    internal var _position: Index
-
-    @inlinable
-    public var buffer: UnsafeValidUTF8BufferPointer { get }
-
-    @inlinable
-    public var position: Index { get }
-
-    public typealias Element = Character
-
-    public mutating func next() -> Character?
-
-    @inlinable
-    internal init(
-      _buffer: UnsafeValidUTF8BufferPointer, _position: Index
-    )
-  }
-
-  @inlinable
-  public func makeIterator() -> Iterator
-
-  @inlinable
-  public subscript(position: Index) -> Element { _read }
-
-  @inlinable
-  public func index(after i: Index) -> Index
-
-  @inlinable
-  public func index(before i: Index) -> Index
-
-  @inlinable
-  public var startIndex: Index
-
-  @inlinable
-  public var endIndex: Index
-}
-
-extension UnsafeValidUTF8BufferPointer.UTF16View: BidirectionalCollection {
-  public typealias Element = Unicode.Scalar
-
-  @frozen
-  public struct Index: Comparable, Hashable {
-    // Bitpacked byte offset and transcoded offset
-    @usableFromInline
-    internal var _byteOffsetAndTranscodedOffset: UInt64
-
-    /// Offset of the first byte of the currently-indexed scalar
-    @inlinable
-    public var byteOffset: Int { get }
-
-    /// Offset of the transcoded code unit within the currently-indexed scalar
-    @inlinable
-    public var transcodedOffset: Int { get }
-
-    @inlinable
-    public static func < (lhs: Self, rhs: Self) -> Bool
-
-    @inlinable
-    internal init(
-      _uncheckedByteOffset offset: Int, _transcodedOffset: Int
-    )
-  }
-
-  @inlinable
-  public subscript(position: Index) -> Element { _read }
-
-  @inlinable
-  public func index(after i: Index) -> Index
-
-  @inlinable
-  public func index(before i: Index) -> Index
-
-  @inlinable
-  public var startIndex: Index
-
-  @inlinable
-  public var endIndex: Index
-}
-```
-
-### Canonical equivalence
-
-```swift
-// Canonical equivalence
-extension UnsafeValidUTF8BufferPointer {
-  /// Whether `self` is equivalent to `other` under Unicode Canonical Equivalance
-  public func isCanonicallyEquivalent(
-    to other: UnsafeValidUTF8BufferPointer
-  ) -> Bool
-
-  /// Whether `self` orders less than `other` (under Unicode Canonical Equivalance
-  /// using normalized code-unit order)
-  public func isCanonicallyLessThan(
-    _ other: UnsafeValidUTF8BufferPointer
-  ) -> Bool
-}
-```
-
-
-
-## Alternatives Considered
-
-### Other names
-
-We're not particularly attached to the name `UnsafeValidUTF8BufferPointer`. Other names could include:
-
-- `UnsafeValidUTF8CodeUnitBufferPointer`
-- `UTF8.UnsafeValidBufferPointer`
-- `UTF8.UnsafeValidCodeUnitBufferPointer`
-- `UTF8.ValidlyEncodedCodeUnitUnsafeBufferPointer`
-- `UnsafeContiguouslyStoredValidUTF8CodeUnitsBuffer`
-
-etc.
-
-For `isCanonicallyLessThan`, another name could be `canonicallyPrecedes`, `lexicographicallyPrecedesUnderNFC`, etc.
-
-### Static methods instead of initializers
-
-`UnsafeValidUTF8BufferPointer`s could instead be created by static methods on `UTF8`:
-
-```swift
-extension Unicode.UTF8 {
-  static func validate(
-    ...
-  ) throws -> UnsafeValidUTF8BufferPointer
-}
-```
-
-### Hashable and other conformances
-
-`UnsafeValidUTF8BufferPointer` follows `UnsafeRawBufferPointer` and `UnsafeBufferPointer` in not conforming to `Sendable`, `Hashable`, `Equatable`, `Comparable`, `Codable`, etc.
-
-### `UTF8.EncodingErrorKind` as a `struct`
-
-We may want to use the [raw-representable struct pattern](https://github.com/apple/swift-system/blob/9a812b5fef1e7f27f8594fee5463bd88c5b691ec/Sources/System/Errno.swift#L14) for `UTF8.EncodingErrorKind` instead of an exhaustive enum. That is, we may want to define it as:
-
-```swift
-extension Unicode.UTF8 {
-  @frozen
-  public struct EncodingErrorKind: Error, Sendable, Hashable, Codable {
-    public var rawValue: UInt8
-
-    @inlinable
-    public init(rawValue: UInt8) {
-      self.rawValue = rawValue
-    }
-
-    @inlinable
-    public static var unexpectedContinuationByte: Self {
-      .init(rawValue: 0x01)
-    }
-
-    @inlinable
-    public static var overlongEncoding: Self {
-      .init(rawValue: 0x02)
-    }
-
-    // ...
-  }
-}
-```
-
-This would allow us to grow the kinds or errors or else add some error-nuance to the future, at the loss of exhaustive switches inside `catch`es.
-
-For example, an unexpected-end-of-input error, which happens when a scalar is in the process of being decoded but not enough bytes have been read, could be reported in different ways. It could be reported as a distinct kind of error (particularly useful for stream processing which may want to resume with more content) or it could be a `expectedContinuationByte` covering the end-of-input position. As a value, it could have a distinct value or be an alias to the same value.
-
-
-
-
-## Future Directions
-
-### A non-escapable `ValidUTF8BufferView`
-
-Future improvements to Swift enable a non-escapable type (["BufferView"](https://github.com/atrick/swift-evolution/blob/fd63292839808423a5062499f588f557000c5d15/visions/language-support-for-BufferView.md)) to provide safely-unmanaged buffers via dependent lifetimes for use within a limited scope. We should add a corresponding type for validly-encoded UTF-8 contents, following the same API shape.
-
-
-### Shared-ownership buffer
-
-We could propose a managed or shared-ownership validly-encoded UTF-8 buffer. E.g.:
-
-```swift
-struct ValidlyEncodedUTF8SharedBuffer {
-  var contents: UnsafeValidlyEncodedUTF8BufferPointer
-  var owner: AnyObject?
-}
-```
-
-where "shared" denotes that ownership is shared with the `owner` field, as opposed to an allocation exclusively managed by this type (the way `Array` or `String` would). Thus, it could be backed by a native `String`, an instance of `Data` or `Array<UInt8>` (if ensured to be validly encoded), etc., which participate fully in their COW semantics by retaining their storage.
-
-This would enable us to create shared strings, e.g.
-
-```swift
-extension String {
-  /// Does not copy the given storage, rather shares it
-  init(sharing: ValidlyEncodedUTF8SharedBuffer)
-}
-```
-
-Also, this could allow us to present API which repairs invalid contents, since a repair operation would need to create and manage its own allocation.
-
-
-#### Alternative: More general formulation (💥🐮)
-
-We could add the more general ["deconstructed COW"](https://forums.swift.org/t/idea-bytes-literal/44124/50)
-
-```swift
-/// A buffer of `T`s in contiguous memory
-struct SharedContiguousStorage<T> {
-  var rawContents: UnsafeRawBufferPointer
-  var owner: AnyObject?
-}
-```
-
-where the choice of `Raw` pointers is necessary to avoid type-binding the memory, but other designs are possible too. 
-
-However, this type alone loses static knowledge of the UTF-8 validity, so we'd still need a separate type for validly encoded UTF-8.
-
-Instead, we could parameterize over a unsafe-buffer-pointer-like protocol:
-
-```swift
-struct SharedContiguousStorage<UnsafeBuffer: UnsafeBufferPointerProtocol> {
-  var contents: UnsafeBuffer
-  var owner: AnyObject?    
-}
-
-extension String {
-  /// Does not copy the given storage, rather shares it
-  init(sharing: SharedContiguousStorage<UnsafeValidUTF8BufferPointer>)
-}
-```
-
-Accessing the stored pointer would still need to be done carefully, as it would have lifetime dependent on `owner`. In current Swift, that would likely need to be done via a closure-taking API.
-
-
-### `protocol ContiguouslyStoredValidUTF8`
-
-We could define a protocol for validly-encoded UTF-8 bytes in contiguous memory, somewhat analogous to a low-level `StringProtocol`. Both an unsafe and a shared-ownership type could conform to provide the same API.
-
-However, we'd want to be careful to future-proof such a protocol so that a  `ValidUTF8BufferView` could conform as well. In the mean-time, even if we go with adding a shared-ownership type, Unicode processing operations can be performed by accessing the unsafe buffer pointer.
-
-### Extend to `Element`-based or buffer-based streams
-
-We could define a segment of validly encoded UTF-8, which is not necessarily aligned across any particular boundary. This would be a significantly different API shape than `String`'s views. Accessing the start of content would require passing in initial state and reaching the end would produce a state to be fed into the next segment. 
-
-It would make an awkward fit directly on top of `Collection`, so this would be a new API shape. For example, it could be akin to a `StatefulCollection` that in addition to having `startIndex/endIndex` would have `startState/endState`. Concerns such as bidirectionality, where exactly `endIndex` points to (the start or end of the partial value at the tail), etc, requires further thought.
-
-### Regex or regex-like support
-
-Future API additions would be to support `Regex`es on such buffers. 
-
-Another future direction could be to add many routines corresponding to the underlying operations performed by the regex engine, such as:
-
-```swift
-extension UnsafeValidUTF8BufferPointer.CharacterView {
-  func matchCharacterClass(
-    _: CharacterClass,
-    startingAt: Index,
-    limitedBy: Index    
-  ) throws -> Index?
-
-  func matchQuantifiedCharacterClass(
-    _: CharacterClass,
-    _: QuantificationDescription,
-    startingAt: Index,
-    limitedBy: Index    
-  ) throws -> Index?
-}
-```
-
-which would be useful for parser-combinator libraries who wish to expose `String`'s model of Unicode by using the stdlib's accelerated implementation.
-
-### Transcoded views, normalized views, case-folded views, etc
-
-We could provide lazily transcoded, normalized, case-folded, etc., views. If we do any of these for `UnsafeValidUTF8BufferPointer`, we should consider adding equivalents on `String`, `Substring`, etc. If we were to make any new protocols or changes to protocols, we'd want to also future-proof for a `ValidUTF8BufferView`.
-
-For example, transcoded views can be generalized:
-
-```swift
-extension UnsafeValidUTF8BufferPointer {
-  /// A view off the buffer's contents as a bidirectional collection of transcoded
-  /// `Encoding.CodeUnit`s.
-  @frozen
-  public struct TranscodedView<Encoding: _UnicodeEncoding> {
-    public var buffer: UnsafeValidUTF8BufferPointer
-
-    @inlinable
-    public init(_ buffer: UnsafeValidUTF8BufferPointer)
-  }
-}
-```
-
-Note that since UTF-16 has such historical significance that even with a fully-generic transcoded view, we'd likely want a dedicated, specialized type for UTF-16.
-
-We could similarly provide lazily-normalized views of code units or scalars under NFC or NFD (which the stdlib already distributes data tables for), possibly generic via a protocol for 3rd party normal forms.
-
-Finally, case-folded functionality can be accessed in today's Swift via [scalar properties](https://developer.apple.com/documentation/swift/unicode/scalar/properties-swift.struct), but we could provide convenience collections ourselves as well.
-
-
-### UTF-8 to/from UTF-16 breadcrumbs API
-
-String's implementation caches distances between UTF-8 and UTF-16 views, as some imported Cocoa APIs use random access to the UTF-16 view. We could formalize and expose API for this.
-
-
-### `NUL`-termination concerns and C bridging
-
-`UnsafeValidUTF8BufferPointer` is capable of housing interior `NUL` characters, just like `String`. We could add additional flags and initialization options to detect a trailing `NUL` byte beyond the count and treat it as a terminator. In those cases, we could provide a `withCStringIfAvailable` style API.
-
-### Index rounding operations
-
-Unlike String, `UnsafeValidUTF8BufferPointer`'s view's `Index` types are distinct, which avoids a [mess of problems](https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946). Interesting additions to both `UnsafeValidUTF8BufferPointer` and `String` would be explicit index-rounding for a desired behavior.
-
-
-### Canonical Spaceships
-
-Should a `ComparisonResult` (or [spaceship](https://forums.swift.org/t/pitch-comparison-reform/5662)) be added to Swift, we could support that operation under canonical equivalence in a single pass rather than subsequent calls to `isCanonicallyEquivalent(to:)` and `isCanonicallyLessThan(_:)`.
-
-
-### Other Unicode functionality
-
-For the purposes of this pitch, we're not looking to expand the scope of functionality beyond what the stdlib already does in support of `String`'s API. Other functionality can be considered future work.
diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
new file mode 100644
index 0000000000..84e8c5d0fd
--- /dev/null
+++ b/proposals/nnnn-utf8-span.md
@@ -0,0 +1,756 @@
+# Safe Access to Contiguous UTF-8 Storage
+
+* Proposal: [SE-NNNN](nnnn-utf8-span.md)
+* Authors: [Michael Ilseman](https://github.com/milseman), [Guillaume Lessard](https://github.com/glessard)
+* Review Manager: TBD
+* Status: **Awaiting implementation**
+* Bug: rdar://48132971, rdar://96837923
+* Implementation: (pending)
+* Upcoming Feature Flag: (pending)
+* Review: ([pitch 1](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
+
+## Introduction
+
+We introduce `UTF8Span` for efficient and safe Unicode processing over contiguous storage.
+
+Native `String`s are stored as validly-encoded UTF-8 bytes in an internal contiguous memory buffer. The standard library implements `String`'s API as internal methods which operate on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose making this UTF-8 buffer and its methods public as API for more advanced libraries and developers.
+
+## Motivation
+
+Currently, if a developer wants to do `String`-like processing over UTF-8 bytes, they have to make an instance of `String`, which allocates a native storage class and copies all the bytes. The developer would then need to operate within the new `String`'s views and map between `String.Index` and byte offsets in the original buffer.
+
+For example, if these bytes were part of a data structure, the developer would need to decide to either cache such a new `String` instance or recreate it on the fly. Caching more than doubles the size and adds caching complexity. Recreating it on the fly adds a linear time factor and class instance allocation/deallocation.
+
+Furthermore, `String` may not be available on all embedded platforms due to the fact that it's conformance to `Comparable` and `Collection` depend on data tables bundled with the stdlib. `UTF8Span` is a more appropriate type for these platforms, and only some explicit API make use of data tables.
+
+
+
+### UTF-8 validity and efficiency
+
+UTF-8 validation is particularly common concern and the subject of a fair amount of [research](https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/). Once an input is known to be validly encoded UTF-8, subsequent operations such as decoding, grapheme breaking, comparison, etc., can be implemented much more efficiently under this assumption of validity. Swift's `String` type's native storage is guaranteed-valid-UTF8 for this reason.
+
+Failure to guarantee UTF-8 encoding validity creates security and safety concerns. With invalidly-encoded contents, memory safety would become more nuanced. An ill-formed leading byte can dictate a scalar length that is longer than the memory buffer. The buffer may have bounds associated with it, which differs from the bounds dictated by its contents.
+
+Additionally, a particular scalar value in valid UTF-8 has only one encoding, but invalid UTF-8 could have the same value encoded as an [overlong encoding](https://en.wikipedia.org/wiki/UTF-8#Overlong_encodings), which would compromise code that checks for the presence of a scalar value by looking at the encoded bytes (or that does a byte-wise comparison).
+
+
+## Proposed solution
+
+We propose a non-escapable `UTF8Span` which exposes a similar API surface as `String` for validly-encoded UTF-8 code units in contiguous memory.
+
+
+## Detailed design
+
+`UTF8Span` is a borrowed view into contiguous memory containing validly-encoded UTF-8 code units.
+
+```swift
+@frozen
+public struct UTF8Span: Copyable, ~Escapable {
+  @usableFromInline
+  internal var _start: Index
+
+  /*
+   A bit-packed count and flags (such as isASCII)
+
+   ┌───────┬──────────┬───────┐
+   │ b63   │ b62:56   │ b56:0 │
+   ├───────┼──────────┼───────┤
+   │ ASCII │ reserved │ count │
+   └───────┴──────────┴───────┘
+
+   Future bits could be used for all <0x300 scalar (aka <0xC0 byte)
+   flag which denotes the quickest NFC check, a quickCheck NFC
+   flag (using Unicode data tables), a full-check NFC flag,
+   single-scalar-grapheme-clusters flag, etc.
+
+   */
+  @usableFromInline
+  internal var _countAndFlags: UInt64
+}
+```
+
+### Creation and validation
+
+`UTF8Span` is validated at initialization time, and encoding errors are diagnosed and thrown.
+
+```swift
+extension Unicode.UTF8 {
+  /// The kind of encoding error encountered during validation
+  @frozen
+  public struct EncodingErrorKind: Error, Sendable, Hashable, Codable {
+    public var rawValue: UInt8
+
+    @inlinable
+    public init(rawValue: UInt8)
+
+    @_alwaysEmitIntoClient
+    public static var unexpectedContinuationByte: Self { get }
+
+    @_alwaysEmitIntoClient
+    public static var overlongEncoding: Self { get }
+
+    @_alwaysEmitIntoClient
+    public static var invalidCodePoint: Self { get }
+  }
+}
+```
+
+**TODO**: Check all the kinds of errors we'd like to diagnose. Since this is a `RawRepresentable` struct, we can still extend it with a (finite) number of error kinds in the future.
+
+```swift
+extension UTF8Span {
+  /// The kind and location of invalidly-encoded UTF-8 bytes
+  @frozen
+  public struct EncodingError: Error, Sendable, Hashable, Codable {
+    /// The kind of encoding error
+    public var kind: Unicode.UTF8.EncodingErrorKind
+
+    /// The range of offsets into our input containing the error
+    public var range: Range<Int>
+  }
+
+  public init(
+    validating codeUnits: Span<UInt8>
+  ) throws(EncodingError) -> dependsOn(codeUnits) Self
+
+  public init<Owner: ~Copyable & ~Escapable>(
+    nulTerminatedCString: UnsafeRawPointer,
+    owner: borrowing Owner
+  ) throws(EncodingError) -> dependsOn(owner) Self
+
+  public init<Owner: ~Copyable & ~Escapable>(
+    nulTerminatedCString: UnsafePointer<CChar>,
+    owner: borrowing Owner
+  ) throws(EncodingError) -> dependsOn(owner) Self
+}
+```
+
+### Views
+
+Similarly to `String`, `UTF8Span` exposes different ways to view the UTF-8 contents.
+
+`UTF8Span.UnicodeScalarView` corresponds to `String.UnicodeScalarView` for read-only purposes, however it is not `RangeReplaceable` as `UTF8Span` provides read-only access. Similarly, `UTF8Span.CharacterView` corresponds to `String`'s character view (i.e. its default view), `UTF8Span.UTF16View` to `String.UTF16View`, and `UTF8Span.CodeUnits` to `String.UTF8View`.
+
+```swift
+extension UTF8Span {
+  public typealias CodeUnits = Span<UInt8>
+
+  @inlinable
+  public var codeUnits: CodeUnits { get }
+
+  @frozen
+  public struct UnicodeScalarView: ~Escapable {
+    public let span: UTF8Span
+
+    @inlinable
+    public init(_ span: UTF8Span)
+  }
+
+  @inlinable
+  public var unicodeScalars: UnicodeScalarView { _read }
+
+  @frozen
+  public struct CharacterView: ~Escapable {
+    public let span: UTF8Span
+
+    @inlinable
+    public init(_ span: UTF8Span)
+  }
+
+  @inlinable
+  public var characters: CharacterView { _read }
+
+  @frozen
+  public struct UTF16View: ~Escapable {
+    public let span: UTF8Span
+
+    @inlinable
+    public init(_ span: UTF8Span)
+  }
+
+  @inlinable
+  public var utf16: UTF16View { _read }
+}
+```
+
+**TOOD**: `_read` vs `get`? `@inlinable` vs `@_alwaysEmitIntoClient`?
+
+##### `Collection`-like API:
+
+Like `Span`, `UTF8Span` provides index and `Collection`-like API:
+
+
+```swift
+extension UTF8Span {
+  public typealias Index = RawSpan.Index
+}
+
+extension UTF8Span.UnicodeScalarView {
+  @frozen
+  public struct Index: Comparable, Hashable {
+    public var position: UTF8Span.Index
+
+    @inlinable
+    public init(_ position: UTF8Span.Index)
+
+    @inlinable
+    public static func < (
+      lhs: UTF8Span.UnicodeScalarView.Index,
+      rhs: UTF8Span.UnicodeScalarView.Index
+    ) -> Bool
+  }
+
+  public typealias Element = Unicode.Scalar
+
+  @frozen
+  public struct Iterator: ~Escapable {
+    public typealias Element = Unicode.Scalar
+
+    public let span: UTF8Span
+
+    public var position: UTF8Span.Index
+
+    @inlinable
+    init(_ span: UTF8Span)
+
+    @inlinable
+    public mutating func next() -> Unicode.Scalar?
+  }
+
+  @inlinable
+  public borrowing func makeIterator() -> Iterator
+
+  @inlinable
+  public var startIndex: Index { get }
+
+  @inlinable
+  public var endIndex: Index { get }
+
+  @inlinable
+  public var count: Int { get }
+
+  @inlinable
+  public var isEmpty: Bool { get }
+
+  @inlinable
+  public var indices: Range<Index> { get }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public func index(
+    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Index?
+
+  @inlinable
+  public func formIndex(after i: inout Index)
+
+  @inlinable
+  public func formIndex(before i: inout Index)
+
+  @inlinable
+  public func index(_ i: Index, offsetBy distance: Int) -> Index
+
+  @inlinable
+  public func formIndex(_ i: inout Index, offsetBy distance: Int)
+
+  @inlinable
+  public func formIndex(
+    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Bool
+
+  @inlinable
+  public subscript(position: Index) -> Element { borrowing _read }
+
+  @inlinable
+  public subscript(unchecked position: Index) -> Element { 
+    borrowing _read
+  }
+
+  @inlinable
+  public subscript(bounds: Range<Index>) -> Self { get }
+
+  @inlinable
+  public subscript(unchecked bounds: Range<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(bounds: some RangeExpression<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(
+    unchecked bounds: some RangeExpression<Index>
+  ) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(x: UnboundedRange) -> Self {
+    borrowing get
+  }
+
+  @inlinable
+  public func distance(from start: Index, to end: Index) -> Int
+
+  @inlinable
+  public func elementsEqual(_ other: Self) -> Bool
+
+  @inlinable
+  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
+}
+
+extension UTF8Span.CharacterView {
+  @frozen
+  public struct Index: Comparable, Hashable {
+    public var position: UTF8Span.Index
+
+    @inlinable
+    public init(_ position: UTF8Span.Index)
+
+    @inlinable
+    public static func < (
+      lhs: UTF8Span.CharacterView.Index,
+      rhs: UTF8Span.CharacterView.Index
+    ) -> Bool
+  }
+
+  public typealias Element = Character
+
+  @frozen
+  public struct Iterator: ~Escapable {
+    public typealias Element = Character
+
+    public let span: UTF8Span
+
+    public var position: UTF8Span.Index
+
+    @inlinable
+    init(_ span: UTF8Span)
+
+    @inlinable
+    public mutating func next() -> Character?
+  }
+
+  @inlinable
+  public borrowing func makeIterator() -> Iterator
+
+  @inlinable
+  public var startIndex: Index { get }
+
+  @inlinable
+  public var endIndex: Index { get }
+
+  @inlinable
+  public var count: Int { get }
+
+  @inlinable
+  public var isEmpty: Bool { get }
+
+  @inlinable
+  public var indices: Range<Index> { get }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public func index(
+    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Index?
+
+  @inlinable
+  public func formIndex(after i: inout Index)
+
+  @inlinable
+  public func formIndex(before i: inout Index)
+
+  @inlinable
+  public func index(_ i: Index, offsetBy distance: Int) -> Index
+
+  @inlinable
+  public func formIndex(_ i: inout Index, offsetBy distance: Int)
+
+  @inlinable
+  public func formIndex(
+    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Bool
+
+  @inlinable
+  public subscript(position: Index) -> Element { borrowing _read }
+
+  @inlinable
+  public subscript(unchecked position: Index) -> Element { 
+    borrowing _read
+  }
+
+  @inlinable
+  public subscript(bounds: Range<Index>) -> Self { get }
+
+  @inlinable
+  public subscript(unchecked bounds: Range<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(bounds: some RangeExpression<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(
+    unchecked bounds: some RangeExpression<Index>
+  ) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(x: UnboundedRange) -> Self {
+    borrowing get
+  }
+
+  @inlinable
+  public func distance(from start: Index, to end: Index) -> Int
+
+  @inlinable
+  public func elementsEqual(_ other: Self) -> Bool
+
+  @inlinable
+  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
+}
+
+extension UTF8Span.UTF16View {
+  @frozen
+  public struct Index: Comparable, Hashable {
+    @usableFromInline
+    internal var _rawValue: UInt64
+
+    @inlinable
+    public var position: UTF8Span.Index { get }
+
+    /// Whether this index is referring to the second code unit of a non-BMP
+    /// Unicode Scalar value.
+    @inlinable
+    public var secondCodeUnit: Bool { get }
+
+    @inlinable
+    public init(_ position: UTF8Span.Index, secondCodeUnit: Bool)
+
+    @inlinable
+    public static func < (
+      lhs: UTF8Span.UTF16View.Index,
+      rhs: UTF8Span.UTF16View.Index
+    ) -> Bool
+  }
+
+  public typealias Element = UInt16
+
+  @frozen
+  public struct Iterator: ~Escapable {
+    public typealias Element = UInt16
+
+    public let span: UTF8Span
+
+    public var index: UTF8Span.UTF16View.Index
+
+    @inlinable
+    init(_ span: UTF8Span)
+
+    @inlinable
+    public mutating func next() -> UInt16?
+  }
+
+  @inlinable
+  public borrowing func makeIterator() -> Iterator
+
+  @inlinable
+  public var startIndex: Index { get }
+
+  @inlinable
+  public var endIndex: Index { get }
+
+  @inlinable
+  public var count: Int { get }
+
+  @inlinable
+  public var isEmpty: Bool { get }
+
+  @inlinable
+  public var indices: Range<Index> { get }
+
+  @inlinable
+  public func index(after i: Index) -> Index
+
+  @inlinable
+  public func index(before i: Index) -> Index
+
+  @inlinable
+  public func index(
+    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Index?
+
+  @inlinable
+  public func formIndex(after i: inout Index)
+
+  @inlinable
+  public func formIndex(before i: inout Index)
+
+  @inlinable
+  public func index(_ i: Index, offsetBy distance: Int) -> Index
+
+  @inlinable
+  public func formIndex(_ i: inout Index, offsetBy distance: Int)
+
+  @inlinable
+  public func formIndex(
+    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
+  ) -> Bool
+
+  @inlinable
+  public subscript(position: Index) -> Element { borrowing _read }
+
+  @inlinable
+  public subscript(unchecked position: Index) -> Element { 
+    borrowing _read
+  }
+
+  @inlinable
+  public subscript(bounds: Range<Index>) -> Self { get }
+
+  @inlinable
+  public subscript(unchecked bounds: Range<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(bounds: some RangeExpression<Index>) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(
+    unchecked bounds: some RangeExpression<Index>
+  ) -> Self {
+    borrowing get
+  }
+
+  @_alwaysEmitIntoClient
+  public subscript(x: UnboundedRange) -> Self {
+    borrowing get
+  }
+
+  @inlinable
+  public func distance(from start: Index, to end: Index) -> Int
+
+  @inlinable
+  public func elementsEqual(_ other: Self) -> Bool
+
+  @inlinable
+  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
+}
+```
+
+### Queries
+
+```swift
+extension UTF8Span {
+  /// Returns whether the validated contents were all-ASCII. This is checked at
+  /// initialization time and remembered.
+  @inlinable
+  public var isASCII: Bool { get }
+
+  /// Whether `i` is on a boundary between Unicode scalar values
+  @inlinable
+  public func isScalarAligned(_ i: UTF8Span.Index) -> Bool
+
+  /// Whether `i` is on a boundary between `Character`s, i.e. extended grapheme clusters.
+  @inlinable
+  public func isCharacterAligned(_ i: UTF8Span.Index) -> Bool
+
+  /// Whether `self` is equivalent to `other` under Unicode Canonical Equivalance
+  public func isCanonicallyEquivalent(to other: UTF8Span) -> Bool
+
+  /// Whether `self` orders less than `other` under Unicode Canonical Equivalance
+  /// using normalized code-unit order
+  public func isCanonicallyLessThan(_ other: UTF8Span) -> Bool
+}
+```
+
+### Additions to `String` and `RawSpan`
+
+We extend `String` with the ability to access its backing `UTF8Span`:
+
+```swift
+extension String {
+  // TODO: note that a copy may happen if `String` is not native...
+  public var utf8Span: UTF8Span {
+    // TODO: how to do this well, considering we also have small 
+    //       strings
+  }
+}
+extension Substring {
+  // TODO: needs scalar alignment (check Substring's invariants)
+  // TODO: note that a copy may happen if `String` is not native...
+  public var utf8Span: UTF8Span {
+    // TODO: how to do this well, considering we also have small 
+    //       strings
+  }
+}
+```
+
+Additionally, we extend `RawSpan`'s byte parsing support with helpers for parsing validly-encoded UTF-8.
+
+```swift
+extension RawSpan {
+  public func parseUTF8(
+    _ position: inout Index, length: Int
+  ) throws -> UTF8Span
+
+  public func parseNullTermiantedUTF8(
+    _ position: inout Index
+  ) throws -> UTF8Span
+}
+
+extension RawSpan.Cursor {
+  public mutating func parseUTF8(length: Int) throws -> UTF8Span
+
+  public mutating func parseNullTermiantedUTF8() throws -> UTF8Span
+}
+```
+
+## Source compatibility
+
+This proposal is additive and source-compatible with existing code.
+
+## ABI compatibility
+
+This proposal is additive and ABI-compatible with existing code.
+
+## Implications on adoption
+
+The additions described in this proposal require a new version of the standard library and runtime.
+
+## Future directions
+
+
+### More alignments
+
+Future API could include whether an index is "word aligned" (either [simple](https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries) or [default](https://www.unicode.org/reports/tr18/#Default_Word_Boundaries)), "line aligned", etc.
+
+### Normalization
+
+Future API could include checks for whether the content is in a normal form. These could take the form of thorough checks, quick checks, and even mutating check-and-update-flag checks.
+
+### Transcoded views, normalized views, case-folded views, etc
+
+We could provide lazily transcoded, normalized, case-folded, etc., views. If we do any of these for `UTF8Span`, we should consider adding equivalents on `String`, `Substring`, etc.
+
+For example, transcoded views can be generalized:
+
+```swift
+extension UTF8Span {
+  /// A view off the span's contents as a bidirectional collection of 
+  /// transcoded `Encoding.CodeUnit`s.
+  @frozen
+  public struct TranscodedView<Encoding: _UnicodeEncoding> {
+    public var span: UTF8Span
+
+    @inlinable
+    public init(_ span: UTF8Span)
+
+    ...
+  }
+}
+```
+
+Note: UTF-16 has such historical significance that, even with a fully-generic transcoded view, we'd still want a dedicated, specialized type for UTF-16.
+
+We could similarly provide lazily-normalized views of code units or scalars under NFC or NFD (which the stdlib already distributes data tables for), possibly generic via a protocol for 3rd party normal forms.
+
+Finally, case-folded functionality can be accessed in today's Swift via [scalar properties](https://developer.apple.com/documentation/swift/unicode/scalar/properties-swift.struct), but we could provide convenience collections ourselves as well.
+
+
+### Regex or regex-like support
+
+Future API additions would be to support `Regex`es on such spans. 
+
+Another future direction could be to add many routines corresponding to the underlying operations performed by the regex engine, such as:
+
+```swift
+extension UTF8Span.CharacterView {
+  func matchCharacterClass(
+    _: CharacterClass,
+    startingAt: Index,
+    limitedBy: Index    
+  ) throws -> Index?
+
+  func matchQuantifiedCharacterClass(
+    _: CharacterClass,
+    _: QuantificationDescription,
+    startingAt: Index,
+    limitedBy: Index    
+  ) throws -> Index?
+}
+```
+
+which would be useful for parser-combinator libraries who wish to expose `String`'s model of Unicode by using the stdlib's accelerated implementation.
+
+
+### Index rounding operations
+
+Unlike String, `UTF8Span`'s view's `Index` types are distinct, which avoids a [mess of problems](https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946). Interesting additions to both `UTF8Span` and `String` would be explicit index-rounding for a desired behavior.
+
+### Canonical Spaceships
+
+Should a `ComparisonResult` (or [spaceship](https://forums.swift.org/t/pitch-comparison-reform/5662)) be added to Swift, we could support that operation under canonical equivalence in a single pass rather than subsequent calls to `isCanonicallyEquivalent(to:)` and `isCanonicallyLessThan(_:)`.
+
+
+### Other Unicode functionality
+
+For the purposes of this pitch, we're not looking to expand the scope of functionality beyond what the stdlib already does in support of `String`'s API. Other functionality can be considered future work.
+
+
+## Alternatives considered
+
+
+
+### Use the same Index type across views
+
+
+
+
+### Deprecate `String.withUTF8`
+
+... mutating... 
+
+### Alternate places or representations for UTF-8 `EncodingError`s
+
+**TODO**: Should `EncodingError.range` be a range of span indices instead, and we only have a span-based init? Should it be generic over the index type? Should it be inside of `Unicode.UTF8` instead?
+
+
+
+- put it on `UTF8.EncodingError`
+- make it generic over index type 
+  - (but doesn't necessarily make more sense for null-terminated UTF-8 pointer)
+
+
+
+
+### An unsafe UTF8 Buffer Pointer type
+
+An [earlier pitch](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715) proposed an unsafe version of `UTF8Span`. 
+
+...
+
+## Acknowledgments
+
+Karoy Lorentey, Karl, Geordie_J, and fclout, contributed to this proposal with their clarifying questions and discussions.
+

From 41cec56cae9cae1c8a2263bd1e46e4ed19d4c366 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 24 Jun 2024 17:28:42 -0600
Subject: [PATCH 04/16] Update to be a span

---
 proposals/nnnn-utf8-span.md | 1236 +++++++++++++++++++++--------------
 1 file changed, 759 insertions(+), 477 deletions(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 84e8c5d0fd..bdfad02e39 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -9,20 +9,22 @@
 * Upcoming Feature Flag: (pending)
 * Review: ([pitch 1](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
 
+
 ## Introduction
 
-We introduce `UTF8Span` for efficient and safe Unicode processing over contiguous storage.
+We introduce `UTF8Span` for efficient and safe Unicode processing over contiguous storage. `UTF8Span` is a memory safe non-escapable type similar to `Span` (**TODO**: link span proposal).
 
 Native `String`s are stored as validly-encoded UTF-8 bytes in an internal contiguous memory buffer. The standard library implements `String`'s API as internal methods which operate on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose making this UTF-8 buffer and its methods public as API for more advanced libraries and developers.
 
 ## Motivation
 
-Currently, if a developer wants to do `String`-like processing over UTF-8 bytes, they have to make an instance of `String`, which allocates a native storage class and copies all the bytes. The developer would then need to operate within the new `String`'s views and map between `String.Index` and byte offsets in the original buffer.
+Currently, if a developer wants to do `String`-like processing over UTF-8 bytes, they have to make an instance of `String`, which allocates a native storage class, copies all the bytes, and is reference counted. The developer would then need to operate within the new `String`'s views and map between `String.Index` and byte offsets in the original buffer.
 
-For example, if these bytes were part of a data structure, the developer would need to decide to either cache such a new `String` instance or recreate it on the fly. Caching more than doubles the size and adds caching complexity. Recreating it on the fly adds a linear time factor and class instance allocation/deallocation.
+For example, if these bytes were part of a data structure, the developer would need to decide to either cache such a new `String` instance or recreate it on the fly. Caching more than doubles the size and adds caching complexity. Recreating it on the fly adds a linear time factor and class instance allocation/deallocation and potentially reference counting.
 
 Furthermore, `String` may not be available on all embedded platforms due to the fact that it's conformance to `Comparable` and `Collection` depend on data tables bundled with the stdlib. `UTF8Span` is a more appropriate type for these platforms, and only some explicit API make use of data tables.
 
+**TODO** annotate those API as unavailable on embedded
 
 
 ### UTF-8 validity and efficiency
@@ -36,8 +38,7 @@ Additionally, a particular scalar value in valid UTF-8 has only one encoding, bu
 
 ## Proposed solution
 
-We propose a non-escapable `UTF8Span` which exposes a similar API surface as `String` for validly-encoded UTF-8 code units in contiguous memory.
-
+We propose a non-escapable `UTF8Span` which exposes a similar API surface as `String` for validly-encoded UTF-8 code units in contiguous memory. We also propose rich API describing the kind and location of encoding errors.
 
 ## Detailed design
 
@@ -46,585 +47,830 @@ We propose a non-escapable `UTF8Span` which exposes a similar API surface as `St
 ```swift
 @frozen
 public struct UTF8Span: Copyable, ~Escapable {
-  @usableFromInline
-  internal var _start: Index
+  public var unsafeBaseAddress: UnsafeRawPointer
 
   /*
    A bit-packed count and flags (such as isASCII)
 
-   ┌───────┬──────────┬───────┐
-   │ b63   │ b62:56   │ b56:0 │
-   ├───────┼──────────┼───────┤
-   │ ASCII │ reserved │ count │
-   └───────┴──────────┴───────┘
-
-   Future bits could be used for all <0x300 scalar (aka <0xC0 byte)
-   flag which denotes the quickest NFC check, a quickCheck NFC
-   flag (using Unicode data tables), a full-check NFC flag,
-   single-scalar-grapheme-clusters flag, etc.
+   ╔═══════╦═════╦═════╦══════════╦═══════╗
+   ║  b63  ║ b62 ║ b61 ║  b60:56  ║ b56:0 ║
+   ╠═══════╬═════╬═════╬══════════╬═══════╣
+   ║ ASCII ║ NFC ║ SSC ║ reserved ║ count ║
+   ╚═══════╩═════╩═════╩══════════╩═══════╝
 
+   ASCII means the contents are all-ASCII (<0x7F). 
+   NFC means contents are in normal form C for fast comparisons.
+   SSC means single-scalar Characters (i.e. grapheme clusters): every
+     `Character` holds only a single `Unicode.Scalar`.
    */
   @usableFromInline
   internal var _countAndFlags: UInt64
+
+  @inlinable @inline(__always)
+  init<Owner: ~Copyable & ~Escapable>(
+    _unsafeAssumingValidUTF8 start: UnsafeRawPointer,
+    _countAndFlags: UInt64,
+    owner: borrowing Owner
+  ) -> dependsOn(owner) Self { }
 }
+
 ```
 
+**TODO**: dependsOn(owner) or omit?
+
+**TODO**: Should we have null-termination support? A null-terminated UTF8Span has a NUL byte after its contents and contains no interior NULs. How would we ensure the NUL byte is exclusively borrowed by us?
+
+**TODO**: Should we track contains-newlines or only-newline-terminated? That would speed up Regex `.*` matching considerably.
+
 ### Creation and validation
 
 `UTF8Span` is validated at initialization time, and encoding errors are diagnosed and thrown.
 
 ```swift
 extension Unicode.UTF8 {
+  /**
+
+   The kind and location of a UTF-8 encoding error.
+
+   Valid UTF-8 is represented by this table:
+
+   ╔════════════════════╦════════╦════════╦════════╦════════╗
+   ║    Scalar value    ║ Byte 0 ║ Byte 1 ║ Byte 2 ║ Byte 3 ║
+   ╠════════════════════╬════════╬════════╬════════╬════════╣
+   ║ U+0000..U+007F     ║ 00..7F ║        ║        ║        ║
+   ║ U+0080..U+07FF     ║ C2..DF ║ 80..BF ║        ║        ║
+   ║ U+0800..U+0FFF     ║ E0     ║ A0..BF ║ 80..BF ║        ║
+   ║ U+1000..U+CFFF     ║ E1..EC ║ 80..BF ║ 80..BF ║        ║
+   ║ U+D000..U+D7FF     ║ ED     ║ 80..9F ║ 80..BF ║        ║
+   ║ U+E000..U+FFFF     ║ EE..EF ║ 80..BF ║ 80..BF ║        ║
+   ║ U+10000..U+3FFFF   ║ F0     ║ 90..BF ║ 80..BF ║ 80..BF ║
+   ║ U+40000..U+FFFFF   ║ F1..F3 ║ 80..BF ║ 80..BF ║ 80..BF ║
+   ║ U+100000..U+10FFFF ║ F4     ║ 80..8F ║ 80..BF ║ 80..BF ║
+   ╚════════════════════╩════════╩════════╩════════╩════════╝
+
+   ### Classifying errors
+
+   An *unexpected continuation* is when a continuation byte (`10xxxxxx`) occurs
+   in a position that should be the start of a new scalar value. Unexpected
+   continuations can often occur when the input contains arbitrary data
+   instead of textual content. An unexpected continuation at the start of
+   input might mean that the input was not correctly sliced along scalar
+   boundaries or that it does not contain UTF-8.
+
+   A *truncated scalar* is a multi-byte sequence that is the start of a valid
+   multi-byte scalar but is cut off before ending correctly. A truncated
+   scalar at the end of the input might mean that only part of the entire
+   input was received.
+
+   A *surrogate code point* (`U+D800..U+DFFF`) is invalid UTF-8. Surrogate
+   code points are used by UTF-16 to encode scalars in the supplementary
+   planes. Their presence may mean the input was encoded in a different 8-bit
+   encoding, such as CESU-8, WTF-8, or Java's Modified UTF-8.
+
+   An *invalid non-surrogate code point* is any code point higher than
+   `U+10FFFF`. This can often occur when the input is arbitrary data instead
+   of textual content.
+
+   An *overlong encoding* occurs when a scalar value that could have been
+   encoded using fewer bytes is encoded in a longer byte sequence. Overlong
+   encodings are invalid UTF-8 and can lead to security issues if not
+   correctly detected:
+
+   - https://nvd.nist.gov/vuln/detail/CVE-2008-2938
+   - https://nvd.nist.gov/vuln/detail/CVE-2000-0884
+
+   An overlong encoding of `NUL`, `0xC0 0x80`, is used in Java's Modified
+   UTF-8 but is invalid UTF-8. Overlong encoding errors often catch attempts
+   to bypass security measures.
+
+   ### Reporting the range of the error
+
+   The range of the error reported follows the *Maximal subpart of an
+   ill-formed subsequence* algorithm in which each error is either one byte
+   long or ends before the first byte that is disallowed. See "U+FFFD
+   Substitution of Maximal Subparts" in the Unicode Standard. Unicode started
+   recommending this algorithm in version 6 and is adopted by the W3C.
+
+   The maximal subpart algorithm will produce a single multi-byte range for a
+   truncated scalar (a multi-byte sequence that is the start of a valid
+   multi-byte scalar but is cut off before ending correctly). For all other
+   errors (including overlong encodings, surrogates, and invalid code
+   points), it will produce an error per byte.
+
+   Since overlong encodings, surrogates, and invalid code points are erroneous
+   by the second byte (at the latest), the above definition produces the same
+   ranges as defining such a sequence as a truncated scalar error followed by
+   unexpected continuation byte errors. The more semantically-rich
+   classification is reported.
+
+   For example, a surrogate count point sequence `ED A0 80` will be reported
+   as three `.surrogateCodePointByte` errors rather than a `.truncatedScalar`
+   followed by two `.unexpectedContinuationByte` errors.
+
+   Other commonly reported error ranges can be constructed from this result.
+   For example, PEP 383's error-per-byte can be constructed by mapping over
+   the reported range. Similarly, constructing a single error for the longest
+   invalid byte range can be constructed by joining adjacent error ranges.
+
+   ╔═════════════════╦══════╦═════╦═════╦═════╦═════╦═════╦═════╦══════╗
+   ║                 ║  61  ║ F1  ║ 80  ║ 80  ║ E1  ║ 80  ║ C2  ║  62  ║
+   ╠═════════════════╬══════╬═════╬═════╬═════╬═════╬═════╬═════╬══════╣
+   ║ Longest range   ║ U+61 ║ err ║     ║     ║     ║     ║     ║ U+62 ║
+   ║ Maximal subpart ║ U+61 ║ err ║     ║     ║ err ║     ║ err ║ U+62 ║
+   ║ Error per byte  ║ U+61 ║ err ║ err ║ err ║ err ║ err ║ err ║ U+62 ║
+   ╚═════════════════╩══════╩═════╩═════╩═════╩═════╩═════╩═════╩══════╝
+
+   */
+  @frozen
+  public struct EncodingError: Error, Sendable, Hashable, Codable {
+    /// The kind of encoding error
+    public var kind: Unicode.UTF8.EncodingError.Kind
+
+    /// The range of offsets into our input containing the error
+    public var range: Range<Int>
+
+    @_alwaysEmitIntoClient
+    public init(
+      _ kind: Unicode.UTF8.EncodingError.Kind,
+      _ range: some RangeExpression<Int>
+    )
+
+    @_alwaysEmitIntoClient
+    public init(_ kind: Unicode.UTF8.EncodingError.Kind, at: Int)
+  }
+}
+
+extension UTF8.EncodingError {
   /// The kind of encoding error encountered during validation
   @frozen
-  public struct EncodingErrorKind: Error, Sendable, Hashable, Codable {
+  public struct Kind: Error, Sendable, Hashable, Codable, RawRepresentable {
     public var rawValue: UInt8
 
     @inlinable
     public init(rawValue: UInt8)
 
+    /// A continuation byte (`10xxxxxx`) outside of a multi-byte sequence
+    @_alwaysEmitIntoClient
+    public static var unexpectedContinuationByte: Self
+
+    /// A byte in a surrogate code point (`U+D800..U+DFFF`) sequence
+    @_alwaysEmitIntoClient
+    public static var surrogateCodePointByte: Self
+
+    /// A byte in an invalid, non-surrogate code point (`>U+10FFFF`) sequence
     @_alwaysEmitIntoClient
-    public static var unexpectedContinuationByte: Self { get }
+    public static var invalidNonSurrogateCodePointByte: Self
 
+    /// A byte in an overlong encoding sequence
     @_alwaysEmitIntoClient
-    public static var overlongEncoding: Self { get }
+    public static var overlongEncodingByte: Self
 
+    /// A multi-byte sequence that is the start of a valid multi-byte scalar
+    /// but is cut off before ending correctly
     @_alwaysEmitIntoClient
-    public static var invalidCodePoint: Self { get }
+    public static var truncatedScalar: Self
   }
 }
-```
-
-**TODO**: Check all the kinds of errors we'd like to diagnose. Since this is a `RawRepresentable` struct, we can still extend it with a (finite) number of error kinds in the future.
 
-```swift
-extension UTF8Span {
-  /// The kind and location of invalidly-encoded UTF-8 bytes
-  @frozen
-  public struct EncodingError: Error, Sendable, Hashable, Codable {
-    /// The kind of encoding error
-    public var kind: Unicode.UTF8.EncodingErrorKind
+extension UTF8.EncodingError.Kind: CustomStringConvertible {
+  public var description: String { get }
+}
 
-    /// The range of offsets into our input containing the error
-    public var range: Range<Int>
-  }
+extension UTF8.EncodingError: CustomStringConvertible {
+  public var description: String { get }
+}
 
+extension UTF8Span {
   public init(
     validating codeUnits: Span<UInt8>
   ) throws(EncodingError) -> dependsOn(codeUnits) Self
-
-  public init<Owner: ~Copyable & ~Escapable>(
-    nulTerminatedCString: UnsafeRawPointer,
-    owner: borrowing Owner
-  ) throws(EncodingError) -> dependsOn(owner) Self
-
-  public init<Owner: ~Copyable & ~Escapable>(
-    nulTerminatedCString: UnsafePointer<CChar>,
-    owner: borrowing Owner
-  ) throws(EncodingError) -> dependsOn(owner) Self
 }
 ```
 
-### Views
+**TODO**: null-terminated strings where we borrow and remember the terminator (and ensure there's no interior nulls)?
 
-Similarly to `String`, `UTF8Span` exposes different ways to view the UTF-8 contents.
+### Basic operations
 
-`UTF8Span.UnicodeScalarView` corresponds to `String.UnicodeScalarView` for read-only purposes, however it is not `RangeReplaceable` as `UTF8Span` provides read-only access. Similarly, `UTF8Span.CharacterView` corresponds to `String`'s character view (i.e. its default view), `UTF8Span.UTF16View` to `String.UTF16View`, and `UTF8Span.CodeUnits` to `String.UTF8View`.
+#### Core Scalar API
 
 ```swift
 extension UTF8Span {
-  public typealias CodeUnits = Span<UInt8>
-
-  @inlinable
-  public var codeUnits: CodeUnits { get }
-
-  @frozen
-  public struct UnicodeScalarView: ~Escapable {
-    public let span: UTF8Span
-
-    @inlinable
-    public init(_ span: UTF8Span)
-  }
-
-  @inlinable
-  public var unicodeScalars: UnicodeScalarView { _read }
-
-  @frozen
-  public struct CharacterView: ~Escapable {
-    public let span: UTF8Span
-
-    @inlinable
-    public init(_ span: UTF8Span)
-  }
+  /// Whether `i` is on a boundary between Unicode scalar values.
+  @_alwaysEmitIntoClient
+  public func isScalarAligned(_ i: Int) -> Bool
 
-  @inlinable
-  public var characters: CharacterView { _read }
+  /// Whether `i` is on a boundary between Unicode scalar values.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func isScalarAligned(unchecked i: Int) -> Bool
 
-  @frozen
-  public struct UTF16View: ~Escapable {
-    public let span: UTF8Span
+  /// Whether `range`'s bounds are aligned to `Unicode.Scalar` boundaries.
+  @_alwaysEmitIntoClient
+  public func isScalarAligned(_ range: Range<Int>) -> Bool
 
-    @inlinable
-    public init(_ span: UTF8Span)
-  }
+  /// Whether `range`'s bounds are aligned to `Unicode.Scalar` boundaries.
+  ///
+  /// This function does not validate that `range` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func isScalarAligned(unchecked range: Range<Int>) -> Bool
 
-  @inlinable
-  public var utf16: UTF16View { _read }
+  /// Returns the start of the next `Unicode.Scalar` after the one starting at
+  /// `i`, or the end of the span if `i` denotes the final scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  @_alwaysEmitIntoClient
+  public func nextScalarStart(_ i: Int) -> Int
+
+  /// Returns the start of the next `Unicode.Scalar` after the one starting at
+  /// `i`, or the end of the span if `i` denotes the final scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func nextScalarStart(unchecked i: Int) -> Int
+
+  /// Returns the start of the next `Unicode.Scalar` after the one starting at
+  /// `i`, or the end of the span if `i` denotes the final scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  /// This function does not validate that `i` is scalar-aligned; this is an
+  /// unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func nextScalarStart(
+    uncheckedAssumingAligned i: Int
+  ) -> Int
+
+  /// Returns the start of the `Unicode.Scalar` ending at `i`, i.e. the scalar
+  /// before the one starting at `i` or the last scalar if `i` is the end of
+  /// the span.
+  ///
+  /// `i` must be scalar-aligned.
+  @_alwaysEmitIntoClient
+  public func previousScalarStart(_ i: Int) -> Int
+
+  /// Returns the start of the `Unicode.Scalar` ending at `i`, i.e. the scalar
+  /// before the one starting at `i` or the last scalar if `i` is the end of
+  /// the span.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func previousScalarStart(unchecked i: Int) -> Int
+
+  /// Returns the start of the `Unicode.Scalar` ending at `i`, i.e. the scalar
+  /// before the one starting at `i` or the last scalar if `i` is the end of
+  /// the span.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  ///
+  /// This function does not validate that `i` is scalar-aligned; this is an
+  /// unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func previousScalarStart(
+    uncheckedAssumingAligned i: Int
+  ) -> Int
+
+  /// Decode the `Unicode.Scalar` starting at `i`. Return it and the start of
+  /// the next scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  @_alwaysEmitIntoClient
+  public func decodeNextScalar(
+    _ i: Int
+  ) -> (Unicode.Scalar, nextScalarStart: Int)
+
+  /// Decode the `Unicode.Scalar` starting at `i`. Return it and the start of 
+  /// the next scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func decodeNextScalar(
+    unchecked i: Int
+  ) -> (Unicode.Scalar, nextScalarStart: Int)
+
+  /// Decode the `Unicode.Scalar` starting at `i`. Return it and the start of
+  /// the next scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  ///
+  /// This function does not validate that `i` is scalar-aligned; this is an
+  /// unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func decodeNextScalar(
+    uncheckedAssumingAligned i: Int
+  ) -> (Unicode.Scalar, nextScalarStart: Int)
+
+  /// Decode the `Unicode.Scalar` ending at `i`, i.e. the previous scalar.
+  /// Return it and the start of that scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  @_alwaysEmitIntoClient
+  public func decodePreviousScalar(
+    _ i: Int
+  ) -> (Unicode.Scalar, previousScalarStart: Int)
+
+  /// Decode the `Unicode.Scalar` ending at `i`, i.e. the previous scalar.
+  /// Return it and the start of that scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func decodePreviousScalar(
+    unchecked i: Int
+  ) -> (Unicode.Scalar, previousScalarStart: Int)
+
+  /// Decode the `Unicode.Scalar` ending at `i`, i.e. the previous scalar.
+  /// Return it and the start of that scalar.
+  ///
+  /// `i` must be scalar-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  ///
+  /// This function does not validate that `i` is scalar-aligned; this is an
+  /// unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func decodePreviousScalar(
+    uncheckedAssumingAligned i: Int
+  ) -> (Unicode.Scalar, previousScalarStart: Int)
 }
-```
-
-**TOOD**: `_read` vs `get`? `@inlinable` vs `@_alwaysEmitIntoClient`?
 
-##### `Collection`-like API:
-
-Like `Span`, `UTF8Span` provides index and `Collection`-like API:
+```
 
+#### Core Character API
 
 ```swift
 extension UTF8Span {
-  public typealias Index = RawSpan.Index
-}
-
-extension UTF8Span.UnicodeScalarView {
-  @frozen
-  public struct Index: Comparable, Hashable {
-    public var position: UTF8Span.Index
-
-    @inlinable
-    public init(_ position: UTF8Span.Index)
-
-    @inlinable
-    public static func < (
-      lhs: UTF8Span.UnicodeScalarView.Index,
-      rhs: UTF8Span.UnicodeScalarView.Index
-    ) -> Bool
-  }
-
-  public typealias Element = Unicode.Scalar
-
-  @frozen
-  public struct Iterator: ~Escapable {
-    public typealias Element = Unicode.Scalar
-
-    public let span: UTF8Span
-
-    public var position: UTF8Span.Index
-
-    @inlinable
-    init(_ span: UTF8Span)
-
-    @inlinable
-    public mutating func next() -> Unicode.Scalar?
-  }
-
-  @inlinable
-  public borrowing func makeIterator() -> Iterator
-
-  @inlinable
-  public var startIndex: Index { get }
-
-  @inlinable
-  public var endIndex: Index { get }
-
-  @inlinable
-  public var count: Int { get }
-
-  @inlinable
-  public var isEmpty: Bool { get }
-
-  @inlinable
-  public var indices: Range<Index> { get }
-
-  @inlinable
-  public func index(after i: Index) -> Index
-
-  @inlinable
-  public func index(before i: Index) -> Index
-
-  @inlinable
-  public func index(
-    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
-  ) -> Index?
-
-  @inlinable
-  public func formIndex(after i: inout Index)
-
-  @inlinable
-  public func formIndex(before i: inout Index)
-
-  @inlinable
-  public func index(_ i: Index, offsetBy distance: Int) -> Index
-
-  @inlinable
-  public func formIndex(_ i: inout Index, offsetBy distance: Int)
-
-  @inlinable
-  public func formIndex(
-    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
-  ) -> Bool
-
-  @inlinable
-  public subscript(position: Index) -> Element { borrowing _read }
-
-  @inlinable
-  public subscript(unchecked position: Index) -> Element { 
-    borrowing _read
-  }
-
-  @inlinable
-  public subscript(bounds: Range<Index>) -> Self { get }
-
-  @inlinable
-  public subscript(unchecked bounds: Range<Index>) -> Self {
-    borrowing get
-  }
-
+  /// Whether `i` is on a boundary between `Character`s (i.e. grapheme
+  /// clusters).
   @_alwaysEmitIntoClient
-  public subscript(bounds: some RangeExpression<Index>) -> Self {
-    borrowing get
-  }
+  public func isCharacterAligned(_ i: Int) -> Bool
 
+  /// Whether `i` is on a boundary between `Character`s (i.e. grapheme
+  /// clusters).
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
   @_alwaysEmitIntoClient
-  public subscript(
-    unchecked bounds: some RangeExpression<Index>
-  ) -> Self {
-    borrowing get
-  }
+  public func isCharacterAligned(unchecked i: Int) -> Bool
 
+  /// Returns the start of the next `Character` (i.e. grapheme cluster) after
+  /// the one  starting at `i`, or the end of the span if `i` denotes the final
+  /// `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
   @_alwaysEmitIntoClient
-  public subscript(x: UnboundedRange) -> Self {
-    borrowing get
-  }
-
-  @inlinable
-  public func distance(from start: Index, to end: Index) -> Int
-
-  @inlinable
-  public func elementsEqual(_ other: Self) -> Bool
+  public func nextCharacterStart(_ i: Int) -> Int
+
+  /// Returns the start of the next `Character` (i.e. grapheme cluster) after
+  /// the one  starting at `i`, or the end of the span if `i` denotes the final
+  /// `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func nextCharacterStart(unchecked i: Int) -> Int
+
+  /// Returns the start of the next `Character` (i.e. grapheme cluster) after
+  /// the one  starting at `i`, or the end of the span if `i` denotes the final
+  /// `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  /// This function does not validate that `i` is `Character`-aligned; this is
+  /// an unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func nextCharacterStart(
+    uncheckedAssumingAligned i: Int
+  ) -> Int
+
+  /// Returns the start of the `Character` (i.e. grapheme cluster) ending at
+  /// `i`, i.e. the `Character` before the one starting at `i` or the last
+  /// `Character` if `i` is the end of the span.
+  ///
+  /// `i` must be `Character`-aligned.
+  @_alwaysEmitIntoClient
+  public func previousCharacterStart(_ i: Int) -> Int
+
+  /// Returns the start of the `Character` (i.e. grapheme cluster) ending at
+  /// `i`, i.e. the `Character` before the one starting at `i` or the last
+  /// `Character` if `i` is the end of the span.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func previousCharacterStart(unchecked i: Int) -> Int
+
+  /// Returns the start of the `Character` (i.e. grapheme cluster) ending at
+  /// `i`, i.e. the `Character` before the one starting at `i` or the last
+  /// `Character` if `i` is the end of the span.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  /// This function does not validate that `i` is `Character`-aligned; this is
+  /// an unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func previousCharacterStart(
+    uncheckedAssumingAligned i: Int
+  ) -> Int
+
+  /// Decode the `Character` starting at `i` Return it and the start of the
+  /// next `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  @_alwaysEmitIntoClient
+  public func decodeNextCharacter(
+    _ i: Int
+  ) -> (Character, nextCharacterStart: Int)
+
+  /// Decode the `Character` starting at `i` Return it and the start of the
+  /// next `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func decodeNextCharacter(
+    unchecked i: Int
+  ) -> (Character, nextCharacterStart: Int)
+
+  /// Decode the `Character` starting at `i` Return it and the start of the
+  /// next `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  /// This function does not validate that `i` is `Character`-aligned; this is
+  /// an unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func decodeNextCharacter(
+    uncheckedAssumingAligned i: Int
+  ) -> (Character, nextCharacterStart: Int)
+
+  /// Decode the `Character` (i.e. grapheme cluster) ending at `i`, i.e. the
+  /// previous `Character`. Return it and the start of that `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  @_alwaysEmitIntoClient
+  public func decodePreviousCharacter(_ i: Int) -> (Character, Int)
+
+  /// Decode the `Character` (i.e. grapheme cluster) ending at `i`, i.e. the
+  /// previous `Character`. Return it and the start of that `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func decodePreviousCharacter(
+    unchecked i: Int
+  ) -> (Character, Int)
+
+  /// Decode the `Character` (i.e. grapheme cluster) ending at `i`, i.e. the
+  /// previous `Character`. Return it and the start of that `Character`.
+  ///
+  /// `i` must be `Character`-aligned.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  ///
+  /// This function does not validate that `i` is `Character`-aligned; this is
+  /// an unsafe operation if `i` isn't.
+  @_alwaysEmitIntoClient
+  public func decodePreviousCharacter(
+    uncheckedAssumingAligned i: Int
+  ) -> (Character, Int)
 
-  @inlinable
-  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
 }
 
-extension UTF8Span.CharacterView {
-  @frozen
-  public struct Index: Comparable, Hashable {
-    public var position: UTF8Span.Index
-
-    @inlinable
-    public init(_ position: UTF8Span.Index)
+```
 
-    @inlinable
-    public static func < (
-      lhs: UTF8Span.CharacterView.Index,
-      rhs: UTF8Span.CharacterView.Index
-    ) -> Bool
-  }
+#### Derived Scalar operations
 
-  public typealias Element = Character
+```swift
+extension UTF8Span {
+  /// Find the nearest scalar-aligned position `<= i`.
+  @_alwaysEmitIntoClient
+  public func scalarAlignBackwards(_ i: Int) -> Int
 
-  @frozen
-  public struct Iterator: ~Escapable {
-    public typealias Element = Character
+  /// Find the nearest scalar-aligned position `<= i`.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func scalarAlignBackwards(unchecked i: Int) -> Int
 
-    public let span: UTF8Span
+  /// Find the nearest scalar-aligned position `>= i`.
+  @_alwaysEmitIntoClient
+  public func scalarAlignForwards(_ i: Int) -> Int
 
-    public var position: UTF8Span.Index
+  /// Find the nearest scalar-aligned position `>= i`.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func scalarAlignForwards(unchecked i: Int) -> Int
+}
+```
 
-    @inlinable
-    init(_ span: UTF8Span)
+#### Derived Character operations
 
-    @inlinable
-    public mutating func next() -> Character?
-  }
+```swift
+extension UTF8Span {
+  /// Find the nearest `Character` (i.e. grapheme cluster)-aligned position
+  /// that is `<= i`.
+  @_alwaysEmitIntoClient
+  public func characterAlignBackwards(_ i: Int) -> Int
 
-  @inlinable
-  public borrowing func makeIterator() -> Iterator
+  /// Find the nearest `Character` (i.e. grapheme cluster)-aligned position
+  /// that is `<= i`.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func characterAlignBackwards(unchecked i: Int) -> Int
 
-  @inlinable
-  public var startIndex: Index { get }
+  /// Find the nearest `Character` (i.e. grapheme cluster)-aligned position
+  /// that is `>= i`.
+  @_alwaysEmitIntoClient
+  public func characterAlignForwards(_ i: Int) -> Int
 
-  @inlinable
-  public var endIndex: Index { get }
+  /// Find the nearest `Character` (i.e. grapheme cluster)-aligned position
+  /// that is `>= i`.
+  ///
+  /// This function does not validate that `i` is within the span's bounds;
+  /// this is an unsafe operation.
+  @_alwaysEmitIntoClient
+  public func characterAlignForwards(unchecked i: Int) -> Int
+}
+```
 
-  @inlinable
-  public var count: Int { get }
+### Collection-like API
 
-  @inlinable
-  public var isEmpty: Bool { get }
+#### Comparisons
 
-  @inlinable
-  public var indices: Range<Index> { get }
+```swift
+extension UTF8Span {
+  /// Whether this span has the same bytes as `other`.
+  @_alwaysEmitIntoClient
+  public func bytesEqual(to other: UTF8Span) -> Bool
 
-  @inlinable
-  public func index(after i: Index) -> Index
+  /// Whether this span has the same bytes as `other`.
+  @_alwaysEmitIntoClient
+  public func bytesEqual(to other: some Sequence<UInt8>) -> Bool
 
-  @inlinable
-  public func index(before i: Index) -> Index
+  /// Whether this span has the same `Unicode.Scalar`s as `other`.
+  @_alwaysEmitIntoClient
+  public func scalarsEqual(
+    to other: some Sequence<Unicode.Scalar>
+  ) -> Bool
 
-  @inlinable
-  public func index(
-    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
-  ) -> Index?
+  /// Whether this span has the same `Character`s as `other`.
+  @_alwaysEmitIntoClient
+  public func charactersEqual(
+    to other: some Sequence<Character>
+  ) -> Bool
 
-  @inlinable
-  public func formIndex(after i: inout Index)
+}
+```
 
-  @inlinable
-  public func formIndex(before i: inout Index)
+**TODO**: lexicographically less than? `std::mismatch`? others?
 
-  @inlinable
-  public func index(_ i: Index, offsetBy distance: Int) -> Index
+#### Canonical equivalence and ordering
 
-  @inlinable
-  public func formIndex(_ i: inout Index, offsetBy distance: Int)
+`UTF8Span` can perform Unicode canonical equivalence checks (i.e. the semantics of `String.==` and `Character.==`).
 
-  @inlinable
-  public func formIndex(
-    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
+```swift
+extension UTF8Span {
+  /// Whether `self` is equivalent to `other` under Unicode Canonical
+  /// Equivalance.
+  public func isCanonicallyEquivalent(
+    to other: UTF8Span
   ) -> Bool
 
-  @inlinable
-  public subscript(position: Index) -> Element { borrowing _read }
-
-  @inlinable
-  public subscript(unchecked position: Index) -> Element { 
-    borrowing _read
-  }
+  /// Whether `self` orders less than `other` under Unicode Canonical 
+  /// Equivalance using normalized code-unit order (in NFC).
+  public func isCanonicallyLessThan(
+    _ other: UTF8Span
+  ) -> Bool
+}
+```
 
-  @inlinable
-  public subscript(bounds: Range<Index>) -> Self { get }
+#### Extracting sub-spans
 
-  @inlinable
-  public subscript(unchecked bounds: Range<Index>) -> Self {
-    borrowing get
-  }
+Similarly to `Span`, we support subscripting and extracting sub-spans. Since a `UTF8Span` is always validly-encoded UTF-8, extracting must happen along Unicode scalar boundaries.
 
+```swift
+extension UTF8Span {
+  /// Constructs a new `UTF8Span` span over the bytes within the supplied
+  /// range of positions within this span.
+  ///
+  /// `bounds` must be scalar aligned.
+  ///
+  /// The returned span's first item is always at offset 0; unlike buffer
+  /// slices, extracted spans do not generally share their indices with the
+  /// span from which they are extracted.
+  ///
+  /// - Parameter bounds: A valid range of positions. Every position in
+  ///     this range must be within the bounds of this `Span`.
+  ///
+  /// - Returns: A `UTF8Span` over the bytes within `bounds`.
   @_alwaysEmitIntoClient
-  public subscript(bounds: some RangeExpression<Index>) -> Self {
-    borrowing get
-  }
-
+  public func extracting(_ bounds: some RangeExpression<Int>) -> Self
+
+  /// Constructs a new `UTF8Span` span over the bytes within the supplied
+  /// range of positions within this span.
+  ///
+  /// `bounds` must be scalar aligned.
+  ///
+  /// This function does not validate that `bounds` is within the span's
+  /// bounds; this is an unsafe operation.
+  ///
+  /// The returned span's first item is always at offset 0; unlike buffer
+  /// slices, extracted spans do not generally share their indices with the
+  /// span from which they are extracted.
+  ///
+  /// - Parameter bounds: A valid range of positions. Every position in
+  ///     this range must be within the bounds of this `Span`.
+  ///
+  /// - Returns: A `UTF8Span` over the bytes within `bounds`.
   @_alwaysEmitIntoClient
-  public subscript(
-    unchecked bounds: some RangeExpression<Index>
-  ) -> Self {
-    borrowing get
-  }
-
+  public func extracting(
+    unchecked bounds: some RangeExpression<Int>
+  ) -> Self
+
+  /// Constructs a new `UTF8Span` span over the bytes within the supplied
+  /// range of positions within this span.
+  ///
+  /// This function does not validate that `bounds` is within the span's
+  /// bounds; this is an unsafe operation.
+  ///
+  /// This function does not validate that `bounds` is within the span's
+  /// bounds; this is an unsafe operation.
+  ///
+  /// The returned span's first item is always at offset 0; unlike buffer
+  /// slices, extracted spans do not generally share their indices with the
+  /// span from which they are extracted.
+  ///
+  /// - Parameter bounds: A valid range of positions. Every position in
+  ///     this range must be within the bounds of this `Span`.
+  ///
+  /// - Returns: A `UTF8Span` over the bytes within `bounds`.
   @_alwaysEmitIntoClient
-  public subscript(x: UnboundedRange) -> Self {
-    borrowing get
-  }
-
-  @inlinable
-  public func distance(from start: Index, to end: Index) -> Int
-
-  @inlinable
-  public func elementsEqual(_ other: Self) -> Bool
-
-  @inlinable
-  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
+  public func extracting(
+    uncheckedAssumingAligned bounds: some RangeExpression<Int>
+  ) -> Self
 }
 
-extension UTF8Span.UTF16View {
-  @frozen
-  public struct Index: Comparable, Hashable {
-    @usableFromInline
-    internal var _rawValue: UInt64
-
-    @inlinable
-    public var position: UTF8Span.Index { get }
-
-    /// Whether this index is referring to the second code unit of a non-BMP
-    /// Unicode Scalar value.
-    @inlinable
-    public var secondCodeUnit: Bool { get }
-
-    @inlinable
-    public init(_ position: UTF8Span.Index, secondCodeUnit: Bool)
-
-    @inlinable
-    public static func < (
-      lhs: UTF8Span.UTF16View.Index,
-      rhs: UTF8Span.UTF16View.Index
-    ) -> Bool
-  }
-
-  public typealias Element = UInt16
-
-  @frozen
-  public struct Iterator: ~Escapable {
-    public typealias Element = UInt16
-
-    public let span: UTF8Span
-
-    public var index: UTF8Span.UTF16View.Index
-
-    @inlinable
-    init(_ span: UTF8Span)
-
-    @inlinable
-    public mutating func next() -> UInt16?
-  }
-
-  @inlinable
-  public borrowing func makeIterator() -> Iterator
-
-  @inlinable
-  public var startIndex: Index { get }
-
-  @inlinable
-  public var endIndex: Index { get }
+```
 
-  @inlinable
-  public var count: Int { get }
+#### Misc.
 
-  @inlinable
+```swift
+extension UTF8Span {
+  @_alwaysEmitIntoClient
   public var isEmpty: Bool { get }
 
-  @inlinable
-  public var indices: Range<Index> { get }
-
-  @inlinable
-  public func index(after i: Index) -> Index
-
-  @inlinable
-  public func index(before i: Index) -> Index
-
-  @inlinable
-  public func index(
-    _ i: Index, offsetBy distance: Int, limitedBy limit: Index
-  ) -> Index?
-
-  @inlinable
-  public func formIndex(after i: inout Index)
-
-  @inlinable
-  public func formIndex(before i: inout Index)
-
-  @inlinable
-  public func index(_ i: Index, offsetBy distance: Int) -> Index
-
-  @inlinable
-  public func formIndex(_ i: inout Index, offsetBy distance: Int)
-
-  @inlinable
-  public func formIndex(
-    _ i: inout Index, offsetBy distance: Int, limitedBy limit: Index
-  ) -> Bool
-
-  @inlinable
-  public subscript(position: Index) -> Element { borrowing _read }
-
-  @inlinable
-  public subscript(unchecked position: Index) -> Element { 
-    borrowing _read
-  }
-
-  @inlinable
-  public subscript(bounds: Range<Index>) -> Self { get }
-
-  @inlinable
-  public subscript(unchecked bounds: Range<Index>) -> Self {
-    borrowing get
-  }
-
   @_alwaysEmitIntoClient
-  public subscript(bounds: some RangeExpression<Index>) -> Self {
-    borrowing get
-  }
+  public var storage: Span<UInt8> { get }
 
+  /// Whether `i` is in bounds
   @_alwaysEmitIntoClient
-  public subscript(
-    unchecked bounds: some RangeExpression<Index>
-  ) -> Self {
-    borrowing get
+  public func boundsCheck(_ i: Int) -> Bool {
+    i >= 0 && i < count
   }
 
+  /// Whether `bounds` is in bounds
   @_alwaysEmitIntoClient
-  public subscript(x: UnboundedRange) -> Self {
-    borrowing get
+  public func boundsCheck(_ bounds: Range<Int>) -> Bool
+
+  /// Calls a closure with a pointer to the viewed contiguous storage.
+  ///
+  /// The buffer pointer passed as an argument to `body` is valid only
+  /// during the execution of `withUnsafeBufferPointer(_:)`.
+  /// Do not store or return the pointer for later use.
+  ///
+  /// - Parameter body: A closure with an `UnsafeBufferPointer` parameter
+  ///   that points to the viewed contiguous storage. If `body` has
+  ///   a return value, that value is also used as the return value
+  ///   for the `withUnsafeBufferPointer(_:)` method. The closure's
+  ///   parameter is valid only for the duration of its execution.
+  /// - Returns: The return value of the `body` closure parameter.
+  @_alwaysEmitIntoClient
+  borrowing public func withUnsafeBufferPointer<
+    E: Error, Result: ~Copyable & ~Escapable
+  >(
+    _ body: (_ buffer: borrowing UnsafeBufferPointer<UInt8>) throws(E) -> Result
+  ) throws(E) -> dependsOn(self) Result {
+    try body(unsafeBaseAddress._ubp(0..<count))
   }
-
-  @inlinable
-  public func distance(from start: Index, to end: Index) -> Int
-
-  @inlinable
-  public func elementsEqual(_ other: Self) -> Bool
-
-  @inlinable
-  public func elementsEqual(_ other: some Sequence<Element>) -> Bool
 }
 ```
 
 ### Queries
 
+`UTF8Span` checks at construction time and remembers whether its contents are all ASCII. Additional checks can be requested and remembered.
+
 ```swift
 extension UTF8Span {
   /// Returns whether the validated contents were all-ASCII. This is checked at
   /// initialization time and remembered.
-  @inlinable
+  @inlinable @inline(__always)
   public var isASCII: Bool { get }
 
-  /// Whether `i` is on a boundary between Unicode scalar values
-  @inlinable
-  public func isScalarAligned(_ i: UTF8Span.Index) -> Bool
-
-  /// Whether `i` is on a boundary between `Character`s, i.e. extended grapheme clusters.
-  @inlinable
-  public func isCharacterAligned(_ i: UTF8Span.Index) -> Bool
-
-  /// Whether `self` is equivalent to `other` under Unicode Canonical Equivalance
-  public func isCanonicallyEquivalent(to other: UTF8Span) -> Bool
+  /// Returns whether the contents are known to be NFC. This is not
+  /// always checked at initialization time and is set by `checkForNFC`.
+  @inlinable @inline(__always)
+  public var isKnownNFC: Bool { get }
+
+  /// Do a scan checking for whether the contents are in Normal Form C.
+  /// When the contents are in NFC, canonical equivalence checks are much
+  /// faster.
+  ///
+  /// `quickCheck` will check for a subset of NFC contents using the 
+  /// NFCQuickCheck algorithm, which is faster than the full normalization
+  /// algorithm. However, it cannot detect all NFC contents.
+  ///
+  /// Updates the `isKnownNFC` bit.
+  public mutating func checkForNFC(
+    quickCheck: Bool
+  ) -> Bool
 
-  /// Whether `self` orders less than `other` under Unicode Canonical Equivalance
-  /// using normalized code-unit order
-  public func isCanonicallyLessThan(_ other: UTF8Span) -> Bool
+  /// Returns whether every `Character` (i.e. grapheme cluster)
+  /// is known to be comprised of a single `Unicode.Scalar`.
+  ///
+  /// This is not always checked at initialization time. It is set by
+  /// `checkForSingleScalarCharacters`.
+  @inlinable @inline(__always)
+  public var isKnownSingleScalarCharacters: Bool { get }
+
+  /// Do a scan, checking whether every `Character` (i.e. grapheme cluster)
+  /// is comprised of only a single `Unicode.Scalar`. When a span contains
+  /// only single-scalar characters, character operations are much faster.
+  ///
+  /// `quickCheck` will check for a subset of single-scalar character contents
+  /// using a faster algorithm than the full grapheme breaking algorithm.
+  /// However, it cannot detect all single-scalar `Character` contents.
+  ///
+  /// Updates the `isKnownSingleScalarCharacters` bit.
+  public mutating func checkForSingleScalarCharacters(
+    quickCheck: Bool
+  ) -> Bool
 }
 ```
 
-### Additions to `String` and `RawSpan`
-
-We extend `String` with the ability to access its backing `UTF8Span`:
+### Spans from strings
 
 ```swift
 extension String {
-  // TODO: note that a copy may happen if `String` is not native...
-  public var utf8Span: UTF8Span {
-    // TODO: how to do this well, considering we also have small 
-    //       strings
-  }
+  /// ... note that a copy may happen if `String` is not native...
+  public var utf8Span: UTF8Span { _read }
 }
 extension Substring {
-  // TODO: needs scalar alignment (check Substring's invariants)
-  // TODO: note that a copy may happen if `String` is not native...
-  public var utf8Span: UTF8Span {
-    // TODO: how to do this well, considering we also have small 
-    //       strings
-  }
+  // ... note that a copy may happen if `Substring` is not native...
+  public var utf8Span: UTF8Span { _read }
 }
 ```
 
-Additionally, we extend `RawSpan`'s byte parsing support with helpers for parsing validly-encoded UTF-8.
 
-```swift
-extension RawSpan {
-  public func parseUTF8(
-    _ position: inout Index, length: Int
-  ) throws -> UTF8Span
-
-  public func parseNullTermiantedUTF8(
-    _ position: inout Index
-  ) throws -> UTF8Span
-}
-
-extension RawSpan.Cursor {
-  public mutating func parseUTF8(length: Int) throws -> UTF8Span
-
-  public mutating func parseNullTermiantedUTF8() throws -> UTF8Span
-}
-```
 
 ## Source compatibility
 
@@ -640,14 +886,32 @@ The additions described in this proposal require a new version of the standard l
 
 ## Future directions
 
-
 ### More alignments
 
 Future API could include whether an index is "word aligned" (either [simple](https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries) or [default](https://www.unicode.org/reports/tr18/#Default_Word_Boundaries)), "line aligned", etc.
 
 ### Normalization
 
-Future API could include checks for whether the content is in a normal form. These could take the form of thorough checks, quick checks, and even mutating check-and-update-flag checks.
+Future API could include checks for whether the content is in a particular normal form (not just NFC).
+
+### UnicodeScalarView and CharacterView
+
+Like `Span`, we are deferring adding any collection-like types to non-escapable `UTF8Span`. Future work includes adding view types and corresponding iterators.   
+
+For an example implementation of those see **TODO**: link to test in repo
+
+### Returning all the encoding errors
+
+Future work includes returning all the encoding errors found in a given input.
+
+```swift
+extension UTF8 {
+  public static func checkAllErrors(
+    _ s: some Sequence<UInt8>
+  ) -> some Sequence<UTF8.EncodingError>
+```
+
+See **TODO**: link to example implementation
 
 ### Transcoded views, normalized views, case-folded views, etc
 
@@ -657,7 +921,7 @@ For example, transcoded views can be generalized:
 
 ```swift
 extension UTF8Span {
-  /// A view off the span's contents as a bidirectional collection of 
+  /// A view of the span's contents as a bidirectional collection of 
   /// transcoded `Encoding.CodeUnit`s.
   @frozen
   public struct TranscodedView<Encoding: _UnicodeEncoding> {
@@ -671,8 +935,6 @@ extension UTF8Span {
 }
 ```
 
-Note: UTF-16 has such historical significance that, even with a fully-generic transcoded view, we'd still want a dedicated, specialized type for UTF-16.
-
 We could similarly provide lazily-normalized views of code units or scalars under NFC or NFD (which the stdlib already distributes data tables for), possibly generic via a protocol for 3rd party normal forms.
 
 Finally, case-folded functionality can be accessed in today's Swift via [scalar properties](https://developer.apple.com/documentation/swift/unicode/scalar/properties-swift.struct), but we could provide convenience collections ourselves as well.
@@ -680,7 +942,7 @@ Finally, case-folded functionality can be accessed in today's Swift via [scalar
 
 ### Regex or regex-like support
 
-Future API additions would be to support `Regex`es on such spans. 
+Future API additions would be to support `Regex`es on `UTF8Span`. We'd expose grapheme-level semantics, scalar-level semantics, and introduce byte-level semantics.
 
 Another future direction could be to add many routines corresponding to the underlying operations performed by the regex engine, such as:
 
@@ -704,10 +966,6 @@ extension UTF8Span.CharacterView {
 which would be useful for parser-combinator libraries who wish to expose `String`'s model of Unicode by using the stdlib's accelerated implementation.
 
 
-### Index rounding operations
-
-Unlike String, `UTF8Span`'s view's `Index` types are distinct, which avoids a [mess of problems](https://forums.swift.org/t/string-index-unification-vs-bidirectionalcollection-requirements/55946). Interesting additions to both `UTF8Span` and `String` would be explicit index-rounding for a desired behavior.
-
 ### Canonical Spaceships
 
 Should a `ComparisonResult` (or [spaceship](https://forums.swift.org/t/pitch-comparison-reform/5662)) be added to Swift, we could support that operation under canonical equivalence in a single pass rather than subsequent calls to `isCanonicallyEquivalent(to:)` and `isCanonicallyLessThan(_:)`.
@@ -718,39 +976,63 @@ Should a `ComparisonResult` (or [spaceship](https://forums.swift.org/t/pitch-com
 For the purposes of this pitch, we're not looking to expand the scope of functionality beyond what the stdlib already does in support of `String`'s API. Other functionality can be considered future work.
 
 
-## Alternatives considered
+### Exposing `String`'s storage class
+
+String's internal storage class is null-terminated valid UTF-8 (by substituting replacement characters) and implements range-replaceable operations along scalar boundaries. We could consider exposing the storage class itself, which might be useful for embedded platforms that don't have `String`.
 
+### Yield UTF8Spans in byte parsers
 
+Span's proposal mentions a future direction of byte parsing helpers on a `Cursor` or `Iterator` type (**TODO**: link to span proposal section). We could extend these types (or analogous types on `Span<UInt>`) with UTF-8 parsing code:
 
-### Use the same Index type across views
+```swift
+extension RawSpan.Cursor {
+  public mutating func parseUTF8(length: Int) throws -> UTF8Span
 
+  public mutating func parseNullTermiantedUTF8() throws -> UTF8Span
+}
+```
 
 
 
-### Deprecate `String.withUTF8`
+## Alternatives considered
 
-... mutating... 
+### Invalid start / end of input UTF-8 encoding errors
 
-### Alternate places or representations for UTF-8 `EncodingError`s
+Earlier prototypes had `.invalidStartOfInput` and `.invalidEndOfInput` UTF8 validation errors to communicate that the input was perhaps incomplete or not slices along scalar boundaries. In this scenario, `.invalidStartOfInput` is equivalent to `.unexpectedContinuation` with the range's lower bound equal to 0 and `.invalidEndOfInput` is equivalent to `.truncatedScalar` with the range's upper bound equal to `count`.
 
-**TODO**: Should `EncodingError.range` be a range of span indices instead, and we only have a span-based init? Should it be generic over the index type? Should it be inside of `Unicode.UTF8` instead?
+This was rejected so as to not have two ways to encode the same error. There is no loss of information and `.unexpectedContinuation`/`.truncatedScalar` with ranges are more semantically precise.
 
+### An unsafe UTF8 Buffer Pointer type
 
+An [earlier pitch](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715) proposed an unsafe version of `UTF8Span`. Now that we have `~Escapable`, a memory-safe `UTF8Span` is better.
 
-- put it on `UTF8.EncodingError`
-- make it generic over index type 
-  - (but doesn't necessarily make more sense for null-terminated UTF-8 pointer)
+### Other names for basic operations
 
+An alternative name for `nextScalarStart(_:)` and `previousScalarStart(_:)` could be something like `scalarEnd(startingAt:)` and `scalarStart(endingAt: i)`. Similarly, `decodeNextScalar(_:)` and `decodePreviousScalar(_:)` could be `decodeScalar(startingAt:)` and `decodeScalar(endingAt:)`. These names are similar to `index(after:)` and `index(before:)`.
 
+However, in practice this buries the direction deeper into the argument label and is more confusing than the `index(before/after:)` analogues. This is especially true when the argument label contains `unchecked` or `uncheckedAssumingAligned`.
 
+That being said, these names are definitely bikesheddable and we'd like suggestions from the community.
 
-### An unsafe UTF8 Buffer Pointer type
 
-An [earlier pitch](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715) proposed an unsafe version of `UTF8Span`. 
+### Other bounds or alignment checked formulations
+
+For many operations that take an index that needs to be appropriately aligned, we propose `foo(_:)`, `foo(unchecked:)`, and `foo(uncheckedAssumingAligned:)`. 
+
+`foo(_:)` and `foo(unchecked:)` have analogues in `Span` and `foo(uncheckedAssumingAligned:)` is the lowest level interface that a type such as `Iterator` would call (since it maintains index validity and alignment as an invariant).
+
+We could additionally have a `foo(assumingAligned:)` overload that does bounds checking, but it's unclear what the use case would be.
+
+Another alternative is to only have a variant that skips both bounds and alignment checks and call it `foo(unchecked:)`. However, this use of `unchecked:` is far more nuanced than `Span`'s and it's not the case that any `i` in `0..<count` would be valid.
+
+We could also only offer `foo(_:)` and `foo(uncheckedAssumingAligned:)`. Unaligned API such as `isScalarAligned(_:)` and `isScalarAligned(unchecked:)` would keep their names.
+
 
-...
 
 ## Acknowledgments
 
 Karoy Lorentey, Karl, Geordie_J, and fclout, contributed to this proposal with their clarifying questions and discussions.
 
+
+
+

From 98b50b50dc854a60422aaf5797e11c9b5d6b0126 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 24 Jun 2024 18:21:31 -0600
Subject: [PATCH 05/16] Link to impl

---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index bdfad02e39..df897c9b0b 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -5,7 +5,7 @@
 * Review Manager: TBD
 * Status: **Awaiting implementation**
 * Bug: rdar://48132971, rdar://96837923
-* Implementation: (pending)
+* Implementation: [Prototype](https://github.com/apple/swift-collections/pull/394)
 * Upcoming Feature Flag: (pending)
 * Review: ([pitch 1](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
 

From 7e2657ab85d46e9ea6dd81cbdf5bd97f6407f375 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Tue, 25 Jun 2024 09:05:30 -0600
Subject: [PATCH 06/16] Clean up todos

---
 proposals/nnnn-utf8-span.md | 67 ++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index df897c9b0b..e0f4f46d0b 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -12,7 +12,7 @@
 
 ## Introduction
 
-We introduce `UTF8Span` for efficient and safe Unicode processing over contiguous storage. `UTF8Span` is a memory safe non-escapable type similar to `Span` (**TODO**: link span proposal).
+We introduce `UTF8Span` for efficient and safe Unicode processing over contiguous storage. `UTF8Span` is a memory safe non-escapable type [similar to `Span`](https://github.com/swiftlang/swift-evolution/pull/2307).
 
 Native `String`s are stored as validly-encoded UTF-8 bytes in an internal contiguous memory buffer. The standard library implements `String`'s API as internal methods which operate on top of this buffer, taking advantage of the validly-encoded invariant and specialized Unicode knowledge. We propose making this UTF-8 buffer and its methods public as API for more advanced libraries and developers.
 
@@ -24,9 +24,6 @@ For example, if these bytes were part of a data structure, the developer would n
 
 Furthermore, `String` may not be available on all embedded platforms due to the fact that it's conformance to `Comparable` and `Collection` depend on data tables bundled with the stdlib. `UTF8Span` is a more appropriate type for these platforms, and only some explicit API make use of data tables.
 
-**TODO** annotate those API as unavailable on embedded
-
-
 ### UTF-8 validity and efficiency
 
 UTF-8 validation is particularly common concern and the subject of a fair amount of [research](https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/). Once an input is known to be validly encoded UTF-8, subsequent operations such as decoding, grapheme breaking, comparison, etc., can be implemented much more efficiently under this assumption of validity. Swift's `String` type's native storage is guaranteed-valid-UTF8 for this reason.
@@ -58,7 +55,7 @@ public struct UTF8Span: Copyable, ~Escapable {
    ║ ASCII ║ NFC ║ SSC ║ reserved ║ count ║
    ╚═══════╩═════╩═════╩══════════╩═══════╝
 
-   ASCII means the contents are all-ASCII (<0x7F). 
+   ASCII means the contents are all-ASCII (<0x7F).
    NFC means contents are in normal form C for fast comparisons.
    SSC means single-scalar Characters (i.e. grapheme clusters): every
      `Character` holds only a single `Unicode.Scalar`.
@@ -76,12 +73,6 @@ public struct UTF8Span: Copyable, ~Escapable {
 
 ```
 
-**TODO**: dependsOn(owner) or omit?
-
-**TODO**: Should we have null-termination support? A null-terminated UTF8Span has a NUL byte after its contents and contains no interior NULs. How would we ensure the NUL byte is exclusively borrowed by us?
-
-**TODO**: Should we track contains-newlines or only-newline-terminated? That would speed up Regex `.*` matching considerably.
-
 ### Creation and validation
 
 `UTF8Span` is validated at initialization time, and encoding errors are diagnosed and thrown.
@@ -232,10 +223,12 @@ extension UTF8.EncodingError {
   }
 }
 
+@_unavailableInEmbedded
 extension UTF8.EncodingError.Kind: CustomStringConvertible {
   public var description: String { get }
 }
 
+@_unavailableInEmbedded
 extension UTF8.EncodingError: CustomStringConvertible {
   public var description: String { get }
 }
@@ -247,8 +240,6 @@ extension UTF8Span {
 }
 ```
 
-**TODO**: null-terminated strings where we borrow and remember the terminator (and ensure there's no interior nulls)?
-
 ### Basic operations
 
 #### Core Scalar API
@@ -354,7 +345,7 @@ extension UTF8Span {
     _ i: Int
   ) -> (Unicode.Scalar, nextScalarStart: Int)
 
-  /// Decode the `Unicode.Scalar` starting at `i`. Return it and the start of 
+  /// Decode the `Unicode.Scalar` starting at `i`. Return it and the start of
   /// the next scalar.
   ///
   /// `i` must be scalar-aligned.
@@ -425,6 +416,7 @@ extension UTF8Span {
 #### Core Character API
 
 ```swift
+@_unavailableInEmbedded
 extension UTF8Span {
   /// Whether `i` is on a boundary between `Character`s (i.e. grapheme
   /// clusters).
@@ -614,6 +606,7 @@ extension UTF8Span {
 #### Derived Character operations
 
 ```swift
+@_unavailableInEmbedded
 extension UTF8Span {
   /// Find the nearest `Character` (i.e. grapheme cluster)-aligned position
   /// that is `<= i`.
@@ -664,6 +657,7 @@ extension UTF8Span {
   ) -> Bool
 
   /// Whether this span has the same `Character`s as `other`.
+  @_unavailableInEmbedded
   @_alwaysEmitIntoClient
   public func charactersEqual(
     to other: some Sequence<Character>
@@ -672,8 +666,6 @@ extension UTF8Span {
 }
 ```
 
-**TODO**: lexicographically less than? `std::mismatch`? others?
-
 #### Canonical equivalence and ordering
 
 `UTF8Span` can perform Unicode canonical equivalence checks (i.e. the semantics of `String.==` and `Character.==`).
@@ -682,12 +674,14 @@ extension UTF8Span {
 extension UTF8Span {
   /// Whether `self` is equivalent to `other` under Unicode Canonical
   /// Equivalance.
+  @_unavailableInEmbedded
   public func isCanonicallyEquivalent(
     to other: UTF8Span
   ) -> Bool
 
-  /// Whether `self` orders less than `other` under Unicode Canonical 
+  /// Whether `self` orders less than `other` under Unicode Canonical
   /// Equivalance using normalized code-unit order (in NFC).
+  @_unavailableInEmbedded
   public func isCanonicallyLessThan(
     _ other: UTF8Span
   ) -> Bool
@@ -819,17 +813,19 @@ extension UTF8Span {
   /// Returns whether the contents are known to be NFC. This is not
   /// always checked at initialization time and is set by `checkForNFC`.
   @inlinable @inline(__always)
+  @_unavailableInEmbedded
   public var isKnownNFC: Bool { get }
 
   /// Do a scan checking for whether the contents are in Normal Form C.
   /// When the contents are in NFC, canonical equivalence checks are much
   /// faster.
   ///
-  /// `quickCheck` will check for a subset of NFC contents using the 
+  /// `quickCheck` will check for a subset of NFC contents using the
   /// NFCQuickCheck algorithm, which is faster than the full normalization
   /// algorithm. However, it cannot detect all NFC contents.
   ///
   /// Updates the `isKnownNFC` bit.
+  @_unavailableInEmbedded
   public mutating func checkForNFC(
     quickCheck: Bool
   ) -> Bool
@@ -839,6 +835,7 @@ extension UTF8Span {
   ///
   /// This is not always checked at initialization time. It is set by
   /// `checkForSingleScalarCharacters`.
+  @_unavailableInEmbedded
   @inlinable @inline(__always)
   public var isKnownSingleScalarCharacters: Bool { get }
 
@@ -851,6 +848,7 @@ extension UTF8Span {
   /// However, it cannot detect all single-scalar `Character` contents.
   ///
   /// Updates the `isKnownSingleScalarCharacters` bit.
+  @_unavailableInEmbedded
   public mutating func checkForSingleScalarCharacters(
     quickCheck: Bool
   ) -> Bool
@@ -860,10 +858,13 @@ extension UTF8Span {
 ### Spans from strings
 
 ```swift
+@_unavailableInEmbedded
 extension String {
   /// ... note that a copy may happen if `String` is not native...
   public var utf8Span: UTF8Span { _read }
 }
+
+@_unavailableInEmbedded
 extension Substring {
   // ... note that a copy may happen if `Substring` is not native...
   public var utf8Span: UTF8Span { _read }
@@ -896,11 +897,19 @@ Future API could include checks for whether the content is in a particular norma
 
 ### UnicodeScalarView and CharacterView
 
-Like `Span`, we are deferring adding any collection-like types to non-escapable `UTF8Span`. Future work includes adding view types and corresponding iterators.   
+Like `Span`, we are deferring adding any collection-like types to non-escapable `UTF8Span`. Future work includes adding view types and corresponding iterators.
+
+For an example implementation of those see [the `UTFSpanViews.swift` test file](https://github.com/apple/swift-collections/pull/394).
+
+### More Collectiony algorithms
+
+We propose equality checks (e.g. `scalarsEqual`), as those are incredibly common and useful operations. We have (tentatively) deferred other algorithms until non-escapable collections are figured out.
 
-For an example implementation of those see **TODO**: link to test in repo
+However, we can add select high-value algorithms if motivated by the community. We'd want to
 
-### Returning all the encoding errors
+
+
+### More validation API
 
 Future work includes returning all the encoding errors found in a given input.
 
@@ -911,7 +920,7 @@ extension UTF8 {
   ) -> some Sequence<UTF8.EncodingError>
 ```
 
-See **TODO**: link to example implementation
+See [`_checkAllErrors` in `UTF8EncodingError.swift`](https://github.com/apple/swift-collections/pull/394).
 
 ### Transcoded views, normalized views, case-folded views, etc
 
@@ -921,7 +930,7 @@ For example, transcoded views can be generalized:
 
 ```swift
 extension UTF8Span {
-  /// A view of the span's contents as a bidirectional collection of 
+  /// A view of the span's contents as a bidirectional collection of
   /// transcoded `Encoding.CodeUnit`s.
   @frozen
   public struct TranscodedView<Encoding: _UnicodeEncoding> {
@@ -951,14 +960,14 @@ extension UTF8Span.CharacterView {
   func matchCharacterClass(
     _: CharacterClass,
     startingAt: Index,
-    limitedBy: Index    
+    limitedBy: Index
   ) throws -> Index?
 
   func matchQuantifiedCharacterClass(
     _: CharacterClass,
     _: QuantificationDescription,
     startingAt: Index,
-    limitedBy: Index    
+    limitedBy: Index
   ) throws -> Index?
 }
 ```
@@ -982,7 +991,7 @@ String's internal storage class is null-terminated valid UTF-8 (by substituting
 
 ### Yield UTF8Spans in byte parsers
 
-Span's proposal mentions a future direction of byte parsing helpers on a `Cursor` or `Iterator` type (**TODO**: link to span proposal section). We could extend these types (or analogous types on `Span<UInt>`) with UTF-8 parsing code:
+Span's proposal mentions a future direction of byte parsing helpers on a `Cursor` or `Iterator` type on `RawSpan`. We could extend these types (or analogous types on `Span<UInt>`) with UTF-8 parsing code:
 
 ```swift
 extension RawSpan.Cursor {
@@ -992,6 +1001,9 @@ extension RawSpan.Cursor {
 }
 ```
 
+### Track other bits
+
+Future work include tracking whether the contents are NULL-terminated (useful for C bridging), whether the contents contain any newlines or only a single newline at the end (useful for accelerating Regex `.`), etc.
 
 
 ## Alternatives considered
@@ -1017,7 +1029,7 @@ That being said, these names are definitely bikesheddable and we'd like suggesti
 
 ### Other bounds or alignment checked formulations
 
-For many operations that take an index that needs to be appropriately aligned, we propose `foo(_:)`, `foo(unchecked:)`, and `foo(uncheckedAssumingAligned:)`. 
+For many operations that take an index that needs to be appropriately aligned, we propose `foo(_:)`, `foo(unchecked:)`, and `foo(uncheckedAssumingAligned:)`.
 
 `foo(_:)` and `foo(unchecked:)` have analogues in `Span` and `foo(uncheckedAssumingAligned:)` is the lowest level interface that a type such as `Iterator` would call (since it maintains index validity and alignment as an invariant).
 
@@ -1029,6 +1041,7 @@ We could also only offer `foo(_:)` and `foo(uncheckedAssumingAligned:)`. Unalign
 
 
 
+
 ## Acknowledgments
 
 Karoy Lorentey, Karl, Geordie_J, and fclout, contributed to this proposal with their clarifying questions and discussions.

From 229732f2d044ba3c59bc31f2059983bbf1195248 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Tue, 25 Jun 2024 11:12:47 -0600
Subject: [PATCH 07/16] title

---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index e0f4f46d0b..573acd915f 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -1,4 +1,4 @@
-# Safe Access to Contiguous UTF-8 Storage
+# Safe UTF-8 Processing Over Contiguous Bytes
 
 * Proposal: [SE-NNNN](nnnn-utf8-span.md)
 * Authors: [Michael Ilseman](https://github.com/milseman), [Guillaume Lessard](https://github.com/glessard)

From 3c3a4b6d41ce52c9f07bac98260af5a83cb1f929 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Wed, 26 Jun 2024 14:32:06 -0600
Subject: [PATCH 08/16] Update future directions

---
 proposals/nnnn-utf8-span.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 573acd915f..fd1fe1bad0 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -905,9 +905,7 @@ For an example implementation of those see [the `UTFSpanViews.swift` test file](
 
 We propose equality checks (e.g. `scalarsEqual`), as those are incredibly common and useful operations. We have (tentatively) deferred other algorithms until non-escapable collections are figured out.
 
-However, we can add select high-value algorithms if motivated by the community. We'd want to
-
-
+However, we can add select high-value algorithms if motivated by the community.
 
 ### More validation API
 
@@ -1005,6 +1003,9 @@ extension RawSpan.Cursor {
 
 Future work include tracking whether the contents are NULL-terminated (useful for C bridging), whether the contents contain any newlines or only a single newline at the end (useful for accelerating Regex `.`), etc.
 
+### Putting more API on String
+
+`String` would also benefit from the query API, such as `isKnownNFC` and corresponding scan methods. Because a string may be a lazily-bridged instance of `NSString`, we don't always have the bits available to query or set, but this may become via pending future improvements in bridging.
 
 ## Alternatives considered
 

From d66a1ec3af75460ed8f09695e403dcb32c294748 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Wed, 26 Jun 2024 14:33:41 -0600
Subject: [PATCH 09/16] typo

---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index fd1fe1bad0..373cd323d6 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -1005,7 +1005,7 @@ Future work include tracking whether the contents are NULL-terminated (useful fo
 
 ### Putting more API on String
 
-`String` would also benefit from the query API, such as `isKnownNFC` and corresponding scan methods. Because a string may be a lazily-bridged instance of `NSString`, we don't always have the bits available to query or set, but this may become via pending future improvements in bridging.
+`String` would also benefit from the query API, such as `isKnownNFC` and corresponding scan methods. Because a string may be a lazily-bridged instance of `NSString`, we don't always have the bits available to query or set, but this may become viable pending future improvements in bridging.
 
 ## Alternatives considered
 

From 3cd5b2850e9a87853222e33a72f6635d27188bfa Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Wed, 26 Jun 2024 14:37:56 -0600
Subject: [PATCH 10/16] Future direction of printing and logging facilities

---
 proposals/nnnn-utf8-span.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 373cd323d6..7e519d102d 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -1007,6 +1007,10 @@ Future work include tracking whether the contents are NULL-terminated (useful fo
 
 `String` would also benefit from the query API, such as `isKnownNFC` and corresponding scan methods. Because a string may be a lazily-bridged instance of `NSString`, we don't always have the bits available to query or set, but this may become viable pending future improvements in bridging.
 
+### Generalize printing and logging facilities
+
+Many printing and logging protocols and facilities operate in terms of `String`. They could be generalized to work in terms of UTF-8 bytes instead, which is important for embedded.
+
 ## Alternatives considered
 
 ### Invalid start / end of input UTF-8 encoding errors

From 2101841b2bf7f5116dc60ea401c6dda8301f782d Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:04:26 -0600
Subject: [PATCH 11/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 7e519d102d..a6ca412f63 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -7,7 +7,7 @@
 * Bug: rdar://48132971, rdar://96837923
 * Implementation: [Prototype](https://github.com/apple/swift-collections/pull/394)
 * Upcoming Feature Flag: (pending)
-* Review: ([pitch 1](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715))
+* Review: ([pitch 1](https://forums.swift.org/t/pitch-utf-8-processing-over-unsafe-contiguous-bytes/69715)) ([pitch 2](https://forums.swift.org/t/pitch-safe-utf-8-processing-over-contiguous-bytes/72742))
 
 
 ## Introduction

From 6d4516e5dea7d23c3f5a1de6e9b91f9f572ac4f7 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:04:34 -0600
Subject: [PATCH 12/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index a6ca412f63..2a0084f5f7 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -768,9 +768,7 @@ extension UTF8Span {
 
   /// Whether `i` is in bounds
   @_alwaysEmitIntoClient
-  public func boundsCheck(_ i: Int) -> Bool {
-    i >= 0 && i < count
-  }
+  public func boundsCheck(_ i: Int) -> Bool
 
   /// Whether `bounds` is in bounds
   @_alwaysEmitIntoClient

From c6f01f3554572d0e6152814cd255daca743c06e2 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:04:46 -0600
Subject: [PATCH 13/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 2a0084f5f7..2554a6b4f9 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -791,9 +791,7 @@ extension UTF8Span {
     E: Error, Result: ~Copyable & ~Escapable
   >(
     _ body: (_ buffer: borrowing UnsafeBufferPointer<UInt8>) throws(E) -> Result
-  ) throws(E) -> dependsOn(self) Result {
-    try body(unsafeBaseAddress._ubp(0..<count))
-  }
+  ) throws(E) -> dependsOn(self) Result
 }
 ```
 

From 5d9543926d1a3b857a57c5c60db6c297cd60eebb Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:04:52 -0600
Subject: [PATCH 14/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 2554a6b4f9..81cfb9c3bc 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -673,7 +673,7 @@ extension UTF8Span {
 ```swift
 extension UTF8Span {
   /// Whether `self` is equivalent to `other` under Unicode Canonical
-  /// Equivalance.
+  /// Equivalence.
   @_unavailableInEmbedded
   public func isCanonicallyEquivalent(
     to other: UTF8Span

From 3c01eb63113c22db9205a3a7d6f40298591c97eb Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:05:06 -0600
Subject: [PATCH 15/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index 81cfb9c3bc..ea81787393 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -680,7 +680,7 @@ extension UTF8Span {
   ) -> Bool
 
   /// Whether `self` orders less than `other` under Unicode Canonical
-  /// Equivalance using normalized code-unit order (in NFC).
+  /// Equivalence using normalized code-unit order (in NFC).
   @_unavailableInEmbedded
   public func isCanonicallyLessThan(
     _ other: UTF8Span

From ff456c6a7822ef7240c242451fdb5d6dba00e942 Mon Sep 17 00:00:00 2001
From: Michael Ilseman <michael.ilseman@gmail.com>
Date: Mon, 1 Jul 2024 16:05:25 -0600
Subject: [PATCH 16/16] Update proposals/nnnn-utf8-span.md

Co-authored-by: Ben Rimmington <me@benrimmington.com>
---
 proposals/nnnn-utf8-span.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/nnnn-utf8-span.md b/proposals/nnnn-utf8-span.md
index ea81787393..1c31567f7c 100644
--- a/proposals/nnnn-utf8-span.md
+++ b/proposals/nnnn-utf8-span.md
@@ -991,7 +991,7 @@ Span's proposal mentions a future direction of byte parsing helpers on a `Cursor
 extension RawSpan.Cursor {
   public mutating func parseUTF8(length: Int) throws -> UTF8Span
 
-  public mutating func parseNullTermiantedUTF8() throws -> UTF8Span
+  public mutating func parseNullTerminatedUTF8() throws -> UTF8Span
 }
 ```