This page lists some notes about what the interface of a string type should be. Unicode is very complex. It would be foolish to think that strings can offer a simple interface to Unicode, such as an array of characters.
A string is a bunch of text. Strings are not sequences. The term "character" is often used, despite it not being well-defined, and should be avoided.
The way strings are encoded in memory should not influence its interface, although it may be configurable by the programmer for performance reasons. While strings are not sequences themselves, sequences can be extracted from them. These sequences can be implemented lazily or eagerly, and can be iterated over. Some examples of sequences would be: lists of code points, lists of graphemes, lists of UTF-8 code units.
A reference interface is shown below.
module Code_point : sig type t val from_int : int -> t option (* not all ints are valid code points *) val to_int : t -> int end = struct (* … *) end module Grapheme : sig (* … *) end = struct (* … *) end module String : sig type t val encode_utf8 : t -> byte list val encode_utf16 : t -> short list val code_points : t -> Code_point.t list val graphemes : t -> Grapheme.t list (* … *) end = struct (* … *) end