探索 Windows 剪切板对多种格式的支持原理

写在前面

你是否遇到过下面的情况?

  • 从网页上拷贝一段文字到 word 中发现连同格式、超链接和图片等全部都拷贝下来了。但是拷贝到 notepad 中却只有文本。
  • 拷贝代码到其它编辑器中发现代码高亮、格式甚至黑色的背景都一起拷贝了。但是拷贝到 notepad 中依然只有文本。

本文就尝试探索一下这种现象是怎么形成的。

读文档

遇到这种问题第一时间肯定是去查微软的开发文档(这块的资料没有找到官方的简中和繁中版本)。

A window can place more than one object on the clipboard, each representing the same information in a different clipboard format. Users need not be aware of the clipboard formats used for an object on the clipboard.

Clipboard Formats – Win32 apps | Microsoft Docs

翻译一下就是剪切板可以存储多个数据对象,每个数据对象都表示相同的信息,但是他们的格式并不相同,并且这些东西是不需要被用户感知的。

那么都支持那些格式呢?根据官方文档来看包括标准格式、应用程序自行注册的格式(比如 QQ_Unicode_RichEdit_Format)、私有的格式、多重格式、合成格式、云剪切板格式和剪切板历史格式。

对于本文来说,我们主要关注「多重格式」,下面是文档的介绍。

A window can place more than one clipboard object on the clipboard, each representing the same information in a different clipboard format. When placing information on the clipboard, the window should provide data in as many formats as possible. To find out how many formats are currently used on the clipboard, call the CountClipboardFormats function.

Clipboard formats that contain the most information should be placed on the clipboard first, followed by less descriptive formats. A window pasting information from the clipboard typically retrieves a clipboard object in the first format it recognizes. Because clipboard formats are enumerated in the order they are placed on the clipboard, the first recognized format is also the most descriptive.

For example, suppose a user copies styled text from a word-processing document. The window containing the document might first place data on the clipboard in a registered format, such as RTF. Subsequently, the window would place data on the clipboard in a less descriptive format, such as text (CF_TEXT).

When the content of the clipboard is pasted into another window, the window retrieves data in the most descriptive format it recognizes. If the window recognizes RTF, the corresponding data is pasted into the document. Otherwise, the text data is pasted into the document and the formatting information is lost.

Clipboard Formats – Win32 apps | Microsoft Docs

大致的意思是剪切板上可以存储多个代表相同信息但是格式不同的数据对象,而且应该将信息最丰富的格式放在最前面,其余的信息也按照信息的丰富度依次排队。因此剪切板上的第一个格式是信息最丰富的格式。而应用程序也应该顺序读取剪切板上的各种格式的信息,直到遇到一个自己可以识别的格式。

实验

实验代码

读者可以使用 Visual Studio 新建一个 C# 的 Windows 窗体应用来实验。

/* 获取剪切板上的数据对象 */ var dataObj = Clipboard.GetDataObject(); /* 获取剪切板上的数据对象所支持的格式 */ var formats = dataObj.GetFormats(); foreach (string format in formats) { string str; try { str = string.Format("Format:{0} \nContent: {1}\n========", format, dataObj.GetData(format).ToString()); } catch (Exception) { str = string.Format("Format:{0} \nContent: {1}\n========", format, "无法打印的数据格式"); } Console.WriteLine(str); }
Code language: C# (cs)

实验环境

  • OS:Windows 10 x64 2004 家庭版
  • Office:2016 专业版 x64
  • 浏览器:Edge 版本 85.0.564.63 (官方内部版本) (64 位)
  • Visual Studio:2019

实验过程

从网页复制

随便从网页中复制一段普通的文本,运行程序查看结果。

Format:HTML Format Content: Version:0.9 StartHTML:0000000217 EndHTML:0000000923 StartFragment:0000000253 EndFragment:0000000887 SourceURL:https://docs.microsoft.com/zh-cn/windows/win32/dataxchg/clipboard-formats#multiple-clipboard-formats <html> <body> <!--StartFragment--><span style="color: rgb(23, 23, 23); font-family: &quot;Segoe UI&quot;, SegoeUI, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">The following topics describe the clipboard formats.</span><!--EndFragment--> </body> </html>? ======== Format:System.String Content: The following topics describe the clipboard formats. ======== Format:UnicodeText Content: The following topics describe the clipboard formats. ======== Format:Text Content: The following topics describe the clipboard formats. ======== Format:Locale Content: System.IO.MemoryStream ======== Format:OEMText Content: The following topics describe the clipboard formats. ========
Code language: Bash (bash)

可以看到按照信息丰富度降序排列依次是 HTML,C# 的字符串对象,Unicode 字符串,ANSI 字符串,与文本关联的 locale 标识符和 OEM 字符串。

The data is a handle (HGLOBAL) to the locale identifier (LCID) associated with text in the clipboard. When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language. You can use the CF_LOCALE format to associate a different locale with the clipboard text.

An application that pastes text from the clipboard can retrieve this format to determine which character set was used to generate the text.

Note that the clipboard does not support plain text in multiple character sets. To achieve this, use a formatted text data type such as RTF instead.

The system uses the code page associated with CF_LOCALE to implicitly convert from CF_TEXT to CF_UNICODETEXT. Therefore, the correct code page table is used for the conversion.

Standard Clipboard Formats (Winuser.h) – Win32 apps | Microsoft Docs

Text format containing characters in the OEM character set. Each line ends with a carriage return/linefeed (CR-LF) combination. A null character signals the end of the data.

Standard Clipboard Formats (Winuser.h) – Win32 apps | Microsoft Docs

从网页中复制一张图片

Format:HTML Format Content: Version:0.9 StartHTML:0000000179 EndHTML:0000000977 StartFragment:0000000215 EndFragment:0000000941 SourceURL:https://www.example.com/path/to/demo.html <html> <body> <!--StartFragment--><img src="https://www.example.com/path/to/demo.png" alt="" style="margin: 0px; padding: 0px; border: 0px; height: auto; max-width: 100%; color: rgb(0, 0, 0); font-family: &quot;PingFang SC&quot;, &quot;Microsoft YaHei&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;"><!--EndFragment--> </body> </html>? ========
Code language: Bash (bash)

这次就只有 HTML 格式的数据了。

从 Visual Studio 中复制代码

Format:System.String Content: var dataObj = Clipboard.GetDataObject(); ======== Format:UnicodeText Content: var dataObj = Clipboard.GetDataObject(); ======== Format:Text Content: var dataObj = Clipboard.GetDataObject(); ======== Format:Rich Text Format Content: {\rtf\ansi{\fonttbl{\f0 NSimSun;}}{\colortbl;\red0\green0\blue255;\red0\green0\blue0;}\f0 \fs19 \cf1 \cb0 \highlight0 var\cf2 dataObj = Clipboard.GetDataObject();} ========
Code language: Bash (bash)

按照信息丰富度降序排列依次是 C# 的字符串对象,Unicode 字符串,ANSI 字符串和富文本格式。

从 word 中复制文本

Format:Object Descriptor Content: System.IO.MemoryStream ======== Format:Rich Text Format Content: 长的要死,保存一次搞得网站挂了,就不放了。 ======== Format:HTML Format Content: 这个也是长的要死,又搞瘫了一次网站,也不放了。 ======== Format:System.String Content: 这是 word 中的一段文本。 ======== Format:UnicodeText Content: 这是 word 中的一段文本。 ======== Format:Text Content: 这是 word 中的一段文本。 ======== Format:EnhancedMetafile Content: 无法打印这种格式。
Code language: Bash (bash)

按照信息丰富度降序排列依次是 一段内存数据、富文本格式、C# 的字符串对象,Unicode 字符串,ANSI 字符串和 EnhancedMetafile

从 PowerPoint 中复制

Format:Preferred DropEffect Content: System.IO.MemoryStream ======== Format:InShellDragLoop Content: System.IO.MemoryStream ======== Format:Object Descriptor Content: System.IO.MemoryStream ======== Format:PowerPoint 12.0 Internal Theme Content: System.IO.MemoryStream ======== Format:PowerPoint 12.0 Internal Color Scheme Content: System.IO.MemoryStream ======== Format:PowerPoint 12.0 Internal Shapes Content: System.IO.MemoryStream ======== Format:Art::GVML ClipFormat Content: System.IO.MemoryStream ======== Format:PNG Content: System.IO.MemoryStream ======== Format:JFIF Content: System.IO.MemoryStream ======== Format:GIF Content: System.IO.MemoryStream ======== Format:System.Drawing.Bitmap Content: System.Drawing.Bitmap ======== Format:Bitmap Content: System.Drawing.Bitmap ======== Format:EnhancedMetafile Content: 无法打印的数据格式 ======== Format:MetaFilePict Content: 无法打印的数据格式 ========
Code language: Bash (bash)

多了一些不认识的东西,不过还是可以概括一下。按照信息丰富度降序排列依次是几段个不知名的内存数据,PNG 格式的图片、JFIF 格式的图片、GIF 格式的图片、BitMap 格式的图片。

从 Excel 中复制

Format:EnhancedMetafile Content: 无法打印的数据格式 ======== Format:MetaFilePict Content: System.IO.MemoryStream ======== Format:System.Drawing.Bitmap Content: System.Drawing.Bitmap ======== Format:Bitmap Content: System.Drawing.Bitmap ======== Format:Biff12 Content: System.IO.MemoryStream ======== Format:Biff8 Content: System.IO.MemoryStream ======== Format:Biff5 Content: System.IO.MemoryStream ======== Format:SymbolicLink Content: System.IO.MemoryStream ======== Format:DataInterchangeFormat Content: System.IO.MemoryStream ======== Format:XML Spreadsheet Content: System.IO.MemoryStream ======== Format:HTML Format Content: Version:1.0 StartHTML:0000000191 EndHTML:0000002529 StartFragment:0000002042 EndFragment:0000002477 SourceURL:file://新建%20Microsoft%20Excel%20工作表.xlsx <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=Content-Type content="text/html; charset=utf-8"> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/call_/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/call_/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> <style> <!--table {mso-displayed-decimal-separator:"\."; mso-displayed-thousand-separator:"\,";} @page {margin:.75in .7in .75in .7in; mso-header-margin:.3in; mso-footer-margin:.3in;} tr {mso-height-source:auto; mso-ruby-visibility:none;} col {mso-width-source:auto; mso-ruby-visibility:none;} br {mso-data-placement:same-cell;} td {padding-top:1px; padding-right:1px; padding-left:1px; mso-ignore:padding; color:black; font-size:11.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:等线; mso-generic-font-family:auto; mso-font-charset:134; mso-number-format:General; text-align:general; vertical-align:bottom; border:none; mso-background-source:auto; mso-pattern:auto; mso-protection:locked visible; white-space:nowrap; mso-rotate:0;} ruby {ruby-align:left;} rt {color:windowtext; font-size:9.0pt; font-weight:400; font-style:normal; text-decoration:none; font-family:等线; mso-generic-font-family:auto; mso-font-charset:134; mso-char-type:none; display:none;} --> </style> </head> <body link="#0563C1" vlink="#954F72"> <table border=0 cellpadding=0 cellspacing=0 width=192 style='border-collapse: collapse;width:144pt'> <!--StartFragment--> <col width=64 span=3 style='width:48pt'> <tr height=18 style='height:13.8pt'> <td height=18 align=right width=64 style='height:13.8pt;width:48pt'>1</td> <td align=right width=64 style='width:48pt'>2</td> <td align=right width=64 style='width:48pt'>3</td> </tr> <tr height=18 style='height:13.8pt'> <td height=18 align=right style='height:13.8pt'>4</td> <td align=right>5</td> <td align=right>6</td> </tr> <!--EndFragment--> </table> </body> </html> ======== Format:System.String Content: 1 2 3 4 5 6 ======== Format:UnicodeText Content: 1 2 3 4 5 6 ======== Format:Text Content: 1 2 3 4 5 6 ======== Format:Csv Content: System.IO.MemoryStream ======== Format:Hyperlink Content: System.IO.MemoryStream ======== Format:Rich Text Format Content: {\rtf1\ansi \ansicpg936 {\fonttbl{\f0\fnil \fcharset134 等线;}{\f1\fnil \fcharset134 等线;}{\f2\fnil \fcharset134 等线;}{\f3\fnil \fcharset134 等线;}{\f4\fnil \fcharset134 等线;}{\f5\fnil \fcharset134 等线;}{\f6\fnil \fcharset134 等线 Light;}{\f7\fnil \fcharset134 等线;}{\f8\fnil \fcharset134 等线;}{\f9\fnil \fcharset134 等线;}{\f10\fnil \fcharset134 等线;}{\f11\fnil \fcharset134 等线;}{\f12\fnil \fcharset134 等线;}{\f13\fnil \fcharset134 等线;}{\f14\fnil \fcharset134 等线;}{\f15\fnil \fcharset134 等线;}{\f16\fnil \fcharset134 等线;}{\f17\fnil \fcharset134 等线;}{\f18\fnil \fcharset134 等线;}{\f19\fnil \fcharset134 等线;}{\f20\fnil \fcharset134 等线;}{\f21\fnil \fcharset134 等线;}{\f22\fnil \fcharset134 等线;}{\f23\fnil \fcharset134 等线;}{\f24\fnil \fcharset134 等线;}{\f25\fnil \fcharset134 等线;}} {\info{\id220}}\plain {\colortbl\red0\green0\blue0;\red255\green255\blue255;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red0\green255\blue255;\red0\green0\blue0;\red255\green255\blue255;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red0\green255\blue255;\red128\green0\blue0;\red0\green128\blue0;\red0\green0\blue128;\red128\green128\blue0;\red128\green0\blue128;\red0\green128\blue128;\red192\green192\blue192;\red128\green128\blue128;\red153\green153\blue255;\red153\green51\blue102;\red255\green255\blue204;\red204\green255\blue255;\red102\green0\blue102;\red255\green128\blue128;\red0\green102\blue204;\red204\green204\blue255;\red0\green0\blue128;\red255\green0\blue255;\red255\green255\blue0;\red0\green255\blue255;\red128\green0\blue128;\red128\green0\blue0;\red0\green128\blue128;\red0\green0\blue255;\red0\green204\blue255;\red204\green255\blue255;\red204\green255\blue204;\red255\green255\blue153;\red153\green204\blue255;\red255\green153\blue204;\red204\green153\blue255;\red255\green204\blue153;\red51\green102\blue255;\red51\green204\blue204;\red153\green204\blue0;\red255\green204\blue0;\red255\green153\blue0;\red255\green102\blue0;\red102\green102\blue153;\red150\green150\blue150;\red0\green51\blue102;\red51\green153\blue102;\red0\green51\blue0;\red51\green51\blue0;\red153\green51\blue0;\red153\green51\blue102;\red51\green51\blue153;\red51\green51\blue51;;\red255\green255\blue255;\red100\green100\blue100;\red240\green240\blue240;\red0\green0\blue0;\red255\green255\blue255;\red160\green160\blue160;\red0\green120\blue215;\red0\green0\blue0;\red200\green200\blue200;\red55\green55\blue55;\red255\green255\blue255;\red100\green100\blue100;\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue0;\red255\green255\blue225;\red0\green0\blue0;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;\red192\green192\blue192;\red150\green150\blue150;\red128\green128\blue128;\red102\green102\blue102;\red51\green51\blue51;\red50\green106\blue199;\red192\green53\blue62;\red129\green87\blue183;\red0\green124\blue32;\red176\green62\blue132;\red182\green73\blue0;\red38\green115\blue146;\red51\green102\blue153;\red128\green0\blue0;\red0\green128\blue0;\red0\green0\blue128;\red128\green128\blue0;\red128\green0\blue128;\red0\green128\blue128;\red0\green0\blue208;\red212\green212\blue212;} \trowd \trgaph30\trleft-30\trrh250\cellx995\cellx2020\cellx3044\pard \intbl \qr \f0\fs22 \cf8 1\cell \qr 2\cell \qr 3\cell \pard \intbl \row\trowd \trgaph30\trleft-30\trrh250\cellx995\cellx2020\cellx3044\pard \intbl \qr 4\cell \qr 5\cell \qr 6\cell \pard \intbl \row} ======== Format:Embed Source Content: System.IO.MemoryStream ======== Format:Object Descriptor Content: System.IO.MemoryStream ======== Format:Link Source Content: System.IO.MemoryStream ======== Format:Link Source Descriptor Content: System.IO.MemoryStream ======== Format:Link Content: System.IO.MemoryStream ======== Format:Format129 Content: System.IO.MemoryStream ========
Code language: Bash (bash)

又是一大堆不认识的,随便看看吧。

实验结论

实验很粗糙,数据仅对 NULL 负责。

从浏览器复制时会优先将 HTML 作为首选格式;从 Visual Studio 中复制时会优先以 C# 字符串作为首选格式;从 word 中复制优先以一段内存为优先格式,其次是富文本和 HTML。从 PowerPoint 中复制则是以一段内存为优先格式,其次就是各种图片;从 Excel 中复制则以 EnhancedMetafile 作为首选格式,其次是一段内存和 HTML等。

所以程序确实可以根据需求的不同向剪切板中写入多种信息丰富度不同的数据。

总结

Windows 下的程序会根据需求的不同在用户复制内容时向剪切板写入表达相同信息但是格式不同的数据。接收粘贴消息时,程序可以从剪切板中的多种格式中选择最合适的格式并使用。所以就会出现从网页复制的内容的格式粘贴后几乎完整地出现在 word 中,但是在 notepad 中就只剩文本这种现象了。

本文作者:ADD-SP
本文链接:https://www.addesp.com/archives/1685
版权声明:本博客所有文章除特别声明外,均默认采用 CC-BY-NC-SA 4.0 许可协议。
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇