探索 Windows 剪切板对多种格式的支持原理

写在前面

你是否遇到过下面的情况?

  • 从网页上拷贝一段文字到 word 中发现连同格式、超链接和图片等全部都拷贝下来了。但是拷贝到 notepad 中却只有文本。
  • 拷贝代码到其它编辑器中发现代码高亮、格式甚至黑色的背景都一起拷贝了。但是拷贝到 notepad 中依然只有文本。

本文就尝试探索一下这种现象是怎么形成的。

读文档

遇到这种问题第一时间肯定是去查微软的开发文档(这块的资料没有找到官方的简中和繁中版本)。

A window can place more than one object on the clipboard, each representing the same information in a different clipboard format. Users need not be aware of the clipboard formats used for an object on the clipboard.

Clipboard Formats – Win32 apps | Microsoft Docs

翻译一下就是剪切板可以存储多个数据对象,每个数据对象都表示相同的信息,但是他们的格式并不相同,并且这些东西是不需要被用户感知的。

那么都支持那些格式呢?根据官方文档来看包括标准格式、应用程序自行注册的格式(比如 QQ_Unicode_RichEdit_Format)、私有的格式、多重格式、合成格式、云剪切板格式和剪切板历史格式。

对于本文来说,我们主要关注「多重格式」,下面是文档的介绍。

A window can place more than one clipboard object on the clipboard, each representing the same information in a different clipboard format. When placing information on the clipboard, the window should provide data in as many formats as possible. To find out how many formats are currently used on the clipboard, call the CountClipboardFormats function.

Clipboard formats that contain the most information should be placed on the clipboard first, followed by less descriptive formats. A window pasting information from the clipboard typically retrieves a clipboard object in the first format it recognizes. Because clipboard formats are enumerated in the order they are placed on the clipboard, the first recognized format is also the most descriptive.

For example, suppose a user copies styled text from a word-processing document. The window containing the document might first place data on the clipboard in a registered format, such as RTF. Subsequently, the window would place data on the clipboard in a less descriptive format, such as text (CF_TEXT).

When the content of the clipboard is pasted into another window, the window retrieves data in the most descriptive format it recognizes. If the window recognizes RTF, the corresponding data is pasted into the document. Otherwise, the text data is pasted into the document and the formatting information is lost.

Clipboard Formats – Win32 apps | Microsoft Docs

大致的意思是剪切板上可以存储多个代表相同信息但是格式不同的数据对象,而且应该将信息最丰富的格式放在最前面,其余的信息也按照信息的丰富度依次排队。因此剪切板上的第一个格式是信息最丰富的格式。而应用程序也应该顺序读取剪切板上的各种格式的信息,直到遇到一个自己可以识别的格式。

实验

实验代码

读者可以使用 Visual Studio 新建一个 C# 的 Windows 窗体应用来实验。

/* 获取剪切板上的数据对象 */
var dataObj = Clipboard.GetDataObject();
/* 获取剪切板上的数据对象所支持的格式 */
var formats = dataObj.GetFormats();
foreach (string format in formats)
{
    string str;
    try
    {
         str = string.Format("Format:{0} \nContent: {1}\n========", format, dataObj.GetData(format).ToString());
    }
    catch (Exception)
    {
         str = string.Format("Format:{0} \nContent: {1}\n========", format, "无法打印的数据格式");
    }
    Console.WriteLine(str);
}

实验环境

  • OS:Windows 10 x64 2004 家庭版
  • Office:2016 专业版 x64
  • 浏览器:Edge 版本 85.0.564.63 (官方内部版本) (64 位)
  • Visual Studio:2019

实验过程

从网页复制

随便从网页中复制一段普通的文本,运行程序查看结果。

Format:HTML Format 
Content: Version:0.9
StartHTML:0000000217
EndHTML:0000000923
StartFragment:0000000253
EndFragment:0000000887
SourceURL:https://docs.microsoft.com/zh-cn/windows/win32/dataxchg/clipboard-formats#multiple-clipboard-formats
<html>
<body>
<!--StartFragment--><span style="color: rgb(23, 23, 23); font-family: &quot;Segoe UI&quot;, SegoeUI, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">The following topics describe the clipboard formats.</span><!--EndFragment-->
</body>
</html>?
========
Format:System.String 
Content: The following topics describe the clipboard formats.
========
Format:UnicodeText 
Content: The following topics describe the clipboard formats.
========
Format:Text 
Content: The following topics describe the clipboard formats.
========
Format:Locale 
Content: System.IO.MemoryStream
========
Format:OEMText 
Content: The following topics describe the clipboard formats.
========

可以看到按照信息丰富度降序排列依次是 HTML,C# 的字符串对象,Unicode 字符串,ANSI 字符串,与文本关联的 locale 标识符和 OEM 字符串。

The data is a handle (HGLOBAL) to the locale identifier (LCID) associated with text in the clipboard. When you close the clipboard, if it contains CF_TEXT data but no CF_LOCALE data, the system automatically sets the CF_LOCALE format to the current input language. You can use the CF_LOCALE format to associate a different locale with the clipboard text.

An application that pastes text from the clipboard can retrieve this format to determine which character set was used to generate the text.

Note that the clipboard does not support plain text in multiple character sets. To achieve this, use a formatted text data type such as RTF instead.

The system uses the code page associated with CF_LOCALE to implicitly convert from CF_TEXT to CF_UNICODETEXT. Therefore, the correct code page table is used for the conversion.

Standard Clipboard Formats (Winuser.h) – Win32 apps | Microsoft Docs

Text format containing characters in the OEM character set. Each line ends with a carriage return/linefeed (CR-LF) combination. A null character signals the end of the data.

Standard Clipboard Formats (Winuser.h) – Win32 apps | Microsoft Docs

从网页中复制一张图片

Format:HTML Format 
Content: Version:0.9
StartHTML:0000000179
EndHTML:0000000977
StartFragment:0000000215
EndFragment:0000000941
SourceURL:https://www.example.com/path/to/demo.html
<html>
<body>
<!--StartFragment--><img src="https://www.example.com/path/to/demo.png" alt="" style="margin: 0px; padding: 0px; border: 0px; height: auto; max-width: 100%; color: rgb(0, 0, 0); font-family: &quot;PingFang SC&quot;, &quot;Microsoft YaHei&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;"><!--EndFragment-->
</body>
</html>?
========

这次就只有 HTML 格式的数据了。

从 Visual Studio 中复制代码

Format:System.String 
Content: var dataObj = Clipboard.GetDataObject();
========
Format:UnicodeText 
Content: var dataObj = Clipboard.GetDataObject();
========
Format:Text 
Content: var dataObj = Clipboard.GetDataObject();
========
Format:Rich Text Format 
Content: {\rtf\ansi{\fonttbl{\f0 NSimSun;}}{\colortbl;\red0\green0\blue255;\red0\green0\blue0;}\f0 \fs19 \cf1 \cb0 \highlight0 var\cf2  dataObj = Clipboard.GetDataObject();}
========

按照信息丰富度降序排列依次是 C# 的字符串对象,Unicode 字符串,ANSI 字符串和富文本格式。

从 word 中复制文本

Format:Object Descriptor 
Content: System.IO.MemoryStream
========
Format:Rich Text Format 
Content: 长的要死,保存一次搞得网站挂了,就不放了。
========
Format:HTML Format 
Content: 这个也是长的要死,又搞瘫了一次网站,也不放了。
========
Format:System.String 
Content: 这是 word 中的一段文本。
========
Format:UnicodeText 
Content: 这是 word 中的一段文本。
========
Format:Text 
Content: 这是 word 中的一段文本。
========
Format:EnhancedMetafile
Content: 无法打印这种格式。

按照信息丰富度降序排列依次是 一段内存数据、富文本格式、C# 的字符串对象,Unicode 字符串,ANSI 字符串和 EnhancedMetafile

从 PowerPoint 中复制

Format:Preferred DropEffect 
Content: System.IO.MemoryStream
========
Format:InShellDragLoop 
Content: System.IO.MemoryStream
========
Format:Object Descriptor 
Content: System.IO.MemoryStream
========
Format:PowerPoint 12.0 Internal Theme 
Content: System.IO.MemoryStream
========
Format:PowerPoint 12.0 Internal Color Scheme 
Content: System.IO.MemoryStream
========
Format:PowerPoint 12.0 Internal Shapes 
Content: System.IO.MemoryStream
========
Format:Art::GVML ClipFormat 
Content: System.IO.MemoryStream
========
Format:PNG 
Content: System.IO.MemoryStream
========
Format:JFIF 
Content: System.IO.MemoryStream
========
Format:GIF 
Content: System.IO.MemoryStream
========
Format:System.Drawing.Bitmap 
Content: System.Drawing.Bitmap
========
Format:Bitmap 
Content: System.Drawing.Bitmap
========
Format:EnhancedMetafile 
Content: 无法打印的数据格式
========
Format:MetaFilePict 
Content: 无法打印的数据格式
========

多了一些不认识的东西,不过还是可以概括一下。按照信息丰富度降序排列依次是几段个不知名的内存数据,PNG 格式的图片、JFIF 格式的图片、GIF 格式的图片、BitMap 格式的图片。

从 Excel 中复制

Format:EnhancedMetafile 
Content: 无法打印的数据格式
========
Format:MetaFilePict 
Content: System.IO.MemoryStream
========
Format:System.Drawing.Bitmap 
Content: System.Drawing.Bitmap
========
Format:Bitmap 
Content: System.Drawing.Bitmap
========
Format:Biff12 
Content: System.IO.MemoryStream
========
Format:Biff8 
Content: System.IO.MemoryStream
========
Format:Biff5 
Content: System.IO.MemoryStream
========
Format:SymbolicLink 
Content: System.IO.MemoryStream
========
Format:DataInterchangeFormat 
Content: System.IO.MemoryStream
========
Format:XML Spreadsheet 
Content: System.IO.MemoryStream
========
Format:HTML Format 
Content: Version:1.0
StartHTML:0000000191
EndHTML:0000002529
StartFragment:0000002042
EndFragment:0000002477
SourceURL:file://新建%20Microsoft%20Excel%20工作表.xlsx

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/call_/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/call_/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;
	mso-ruby-visibility:none;}
col
	{mso-width-source:auto;
	mso-ruby-visibility:none;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:等线;
	mso-generic-font-family:auto;
	mso-font-charset:134;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
ruby
	{ruby-align:left;}
rt
	{color:windowtext;
	font-size:9.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:等线;
	mso-generic-font-family:auto;
	mso-font-charset:134;
	mso-char-type:none;
	display:none;}
-->
</style>
</head>

<body link="#0563C1" vlink="#954F72">

<table border=0 cellpadding=0 cellspacing=0 width=192 style='border-collapse:
 collapse;width:144pt'>
<!--StartFragment-->
 <col width=64 span=3 style='width:48pt'>
 <tr height=18 style='height:13.8pt'>
  <td height=18 align=right width=64 style='height:13.8pt;width:48pt'>1</td>
  <td align=right width=64 style='width:48pt'>2</td>
  <td align=right width=64 style='width:48pt'>3</td>
 </tr>
 <tr height=18 style='height:13.8pt'>
  <td height=18 align=right style='height:13.8pt'>4</td>
  <td align=right>5</td>
  <td align=right>6</td>
 </tr>
<!--EndFragment-->
</table>

</body>

</html>

========
Format:System.String 
Content: 1	2	3
4	5	6

========
Format:UnicodeText 
Content: 1	2	3
4	5	6

========
Format:Text 
Content: 1	2	3
4	5	6

========
Format:Csv 
Content: System.IO.MemoryStream
========
Format:Hyperlink 
Content: System.IO.MemoryStream
========
Format:Rich Text Format 
Content: {\rtf1\ansi \ansicpg936
{\fonttbl{\f0\fnil \fcharset134 等线;}{\f1\fnil \fcharset134 等线;}{\f2\fnil \fcharset134 等线;}{\f3\fnil \fcharset134 等线;}{\f4\fnil \fcharset134 等线;}{\f5\fnil \fcharset134 等线;}{\f6\fnil \fcharset134 等线 Light;}{\f7\fnil \fcharset134 等线;}{\f8\fnil \fcharset134 等线;}{\f9\fnil \fcharset134 等线;}{\f10\fnil \fcharset134 等线;}{\f11\fnil \fcharset134 等线;}{\f12\fnil \fcharset134 等线;}{\f13\fnil \fcharset134 等线;}{\f14\fnil \fcharset134 等线;}{\f15\fnil \fcharset134 等线;}{\f16\fnil \fcharset134 等线;}{\f17\fnil \fcharset134 等线;}{\f18\fnil \fcharset134 等线;}{\f19\fnil \fcharset134 等线;}{\f20\fnil \fcharset134 等线;}{\f21\fnil \fcharset134 等线;}{\f22\fnil \fcharset134 等线;}{\f23\fnil \fcharset134 等线;}{\f24\fnil \fcharset134 等线;}{\f25\fnil \fcharset134 等线;}}
{\info{\id220}}\plain {\colortbl\red0\green0\blue0;\red255\green255\blue255;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red0\green255\blue255;\red0\green0\blue0;\red255\green255\blue255;\red255\green0\blue0;\red0\green255\blue0;\red0\green0\blue255;\red255\green255\blue0;\red255\green0\blue255;\red0\green255\blue255;\red128\green0\blue0;\red0\green128\blue0;\red0\green0\blue128;\red128\green128\blue0;\red128\green0\blue128;\red0\green128\blue128;\red192\green192\blue192;\red128\green128\blue128;\red153\green153\blue255;\red153\green51\blue102;\red255\green255\blue204;\red204\green255\blue255;\red102\green0\blue102;\red255\green128\blue128;\red0\green102\blue204;\red204\green204\blue255;\red0\green0\blue128;\red255\green0\blue255;\red255\green255\blue0;\red0\green255\blue255;\red128\green0\blue128;\red128\green0\blue0;\red0\green128\blue128;\red0\green0\blue255;\red0\green204\blue255;\red204\green255\blue255;\red204\green255\blue204;\red255\green255\blue153;\red153\green204\blue255;\red255\green153\blue204;\red204\green153\blue255;\red255\green204\blue153;\red51\green102\blue255;\red51\green204\blue204;\red153\green204\blue0;\red255\green204\blue0;\red255\green153\blue0;\red255\green102\blue0;\red102\green102\blue153;\red150\green150\blue150;\red0\green51\blue102;\red51\green153\blue102;\red0\green51\blue0;\red51\green51\blue0;\red153\green51\blue0;\red153\green51\blue102;\red51\green51\blue153;\red51\green51\blue51;;\red255\green255\blue255;\red100\green100\blue100;\red240\green240\blue240;\red0\green0\blue0;\red255\green255\blue255;\red160\green160\blue160;\red0\green120\blue215;\red0\green0\blue0;\red200\green200\blue200;\red55\green55\blue55;\red255\green255\blue255;\red100\green100\blue100;\red0\green0\blue0;\red255\green255\blue255;\red0\green0\blue0;\red255\green255\blue225;\red0\green0\blue0;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;\red192\green192\blue192;\red150\green150\blue150;\red128\green128\blue128;\red102\green102\blue102;\red51\green51\blue51;\red50\green106\blue199;\red192\green53\blue62;\red129\green87\blue183;\red0\green124\blue32;\red176\green62\blue132;\red182\green73\blue0;\red38\green115\blue146;\red51\green102\blue153;\red128\green0\blue0;\red0\green128\blue0;\red0\green0\blue128;\red128\green128\blue0;\red128\green0\blue128;\red0\green128\blue128;\red0\green0\blue208;\red212\green212\blue212;}
\trowd \trgaph30\trleft-30\trrh250\cellx995\cellx2020\cellx3044\pard \intbl \qr \f0\fs22 \cf8 1\cell \qr 2\cell \qr 3\cell 
\pard \intbl \row\trowd \trgaph30\trleft-30\trrh250\cellx995\cellx2020\cellx3044\pard \intbl \qr 4\cell \qr 5\cell \qr 6\cell \pard \intbl \row}

========
Format:Embed Source 
Content: System.IO.MemoryStream
========
Format:Object Descriptor 
Content: System.IO.MemoryStream
========
Format:Link Source 
Content: System.IO.MemoryStream
========
Format:Link Source Descriptor 
Content: System.IO.MemoryStream
========
Format:Link 
Content: System.IO.MemoryStream
========
Format:Format129 
Content: System.IO.MemoryStream
========

又是一大堆不认识的,随便看看吧。

实验结论

实验很粗糙,数据仅对 NULL 负责。

从浏览器复制时会优先将 HTML 作为首选格式;从 Visual Studio 中复制时会优先以 C# 字符串作为首选格式;从 word 中复制优先以一段内存为优先格式,其次是富文本和 HTML。从 PowerPoint 中复制则是以一段内存为优先格式,其次就是各种图片;从 Excel 中复制则以 EnhancedMetafile 作为首选格式,其次是一段内存和 HTML等。

所以程序确实可以根据需求的不同向剪切板中写入多种信息丰富度不同的数据。

总结

Windows 下的程序会根据需求的不同在用户复制内容时向剪切板写入表达相同信息但是格式不同的数据。接收粘贴消息时,程序可以从剪切板中的多种格式中选择最合适的格式并使用。所以就会出现从网页复制的内容的格式粘贴后几乎完整地出现在 word 中,但是在 notepad 中就只剩文本这种现象了。

本文作者:ADD-SP
本文链接https://www.addesp.com/archives/1685
版权声明:本博客所有文章除特别声明外,均默认采用 CC-BY-NC-SA 4.0 许可协议。
暂无评论

发送评论 编辑评论


				
上一篇
下一篇