问题描述
我需要解析HTML代码。更具体地,分析所有表中的每一行中的每个单元。每一行代表一个单独的对象和每个小区表示不同的属性。欲分析这些能够写带内的每个数据的XML文件(没有无用的HTML代码)。我已成功能够从HTML文件解析每个列,但现在我不知道我的选择是写这到一个XML文件。我感到莫名其妙
HTML:
< TR>< ; TR>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFF>
1
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFalign =left>
< A HREF =?/冰/ player.htm ID = 8471675>悉尼克罗斯比< / A>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =中心>
PIT
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =中心>
C
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
39
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
32
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
33
< / TD>
< TD类=statBox分类的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#E0E0E0ALIGN =右>
将;字体颜色=#000000>
65
< / FONT>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
20
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
29
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
10
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
1
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
3
< / TD>
< TD类=statBox的风格=边界宽度:0像素0像素1px的0像素;背景颜色:#FFFFFFALIGN =右>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
0
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
154
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
20.8
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
21:54
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
22.6
< / TD>
< TD类=statBox的风格=边界宽度:0像素0像素1px的0像素;背景颜色:#FFFFFFALIGN =右>
55.7
< / TD>
< / TR>< / TR>
C#:
使用HtmlAgilityPack;
命名空间统计
{
类StatsParser
{
私人字符串htmlCode;
私有静态字符串文件名=[+ DateTime.Now.ToShortDateString()+NHL统计资料]的.xml
公共StatsParser(字符串htmlCode)
{
this.htmlCode = htmlCode;
this.ParseHtml();
}
公共无效ParseHtml()
{
的HTMLDocument DOC =新的HTMLDocument();
doc.LoadHtml(htmlCode);
试
{
//获取所有表中的文件
HtmlNodeCollection表= doc.DocumentNode.SelectNodes(//表);
//迭代中的第一个表
HtmlNodeCollection行的所有行=表[0] .SelectNodes(.// TR);
的for(int i = 0; I< rows.Count ++ I)
{
//迭代在此行$ B $所有列b HtmlNodeCollection COLS =行[I] .SelectNodes(.// TD [@类='statBox']);
为(INT J = 0; J< cols.Count ++ j)条
{
//获取列的值,并打印
字符串值= COLS [J] .InnerText;
如果(价值=!)
System.Windows.MessageBox.Show(值);
}
}
}
赶上(的NullReferenceException)
{
System.Windows.Forms.MessageBox.Show(异常!);
}
}
XML:
<?XML版本=1.0编码=UTF-8>?;
<统计日期=2011-01-01>
将;玩家评级=1>
<名称>悉尼克罗斯比< /名称>
<团队及GT;凹窝LT; /团队及GT;
<地位与GT; C< /位置>
< GamesPlayed> 39 LT; / GamesPlayed>
<目标> 32 LT; /目标>
<协助> 33 LT; /助攻>
< /播放器>
< /统计>
东张西望MSDN之后,我终于找到了一个实现方案我的问题:
使用系统;使用HtmlAgilityPack
;
使用的System.Xml;
命名空间HockeyStats
{
类StatsParser
{
私人字符串htmlCode;
私有静态字符串文件名=[+ DateTime.Now.ToShortDateString()+NHL统计资料]的.xml
公共StatsParser(字符串htmlCode)
{
this.htmlCode = htmlCode;
this.ParseHtml();
}
公共无效ParseHtml()
{
的HTMLDocument DOC =新的HTMLDocument();
doc.LoadHtml(htmlCode);
XmlWriter的作家=无效;
试
{
//创建一个XmlWriterSettings带有正确选项的对象。
XmlWriterSettings设置=新XmlWriterSettings();
settings.Indent = TRUE;
settings.IndentChars =();
settings.OmitXmlDeclaration = FALSE;
//创建XmlWriter对象,写一些内容。
作家= XmlWriter.Create(@.. \..\+文件名,设置);
writer.WriteStartElement(统计);
writer.WriteAttributeString(日期,DateTime.Now.ToShortDateString());
//迭代另一行
HtmlNodeCollection行= doc.DocumentNode.SelectNodes内的所有行(.// TR / TR);
的for(int i = 0; I< rows.Count ++ I)
{
//迭代在此行$ B $所有列b HtmlNodeCollection COLS =行[I] .SelectNodes(.// TD [@类='statBox']);
为(INT J = 0; J< 20; ++ j)条
{
开关(J)
{
的情况下0:
{
writer.WriteStartElement(玩家);
writer.WriteAttributeString(等级,COLS [J] .InnerText.Trim());打破;
}
案例1:writer.WriteElementString(姓名,COLS [J] .InnerText.Trim());打破;
案例2:writer.WriteElementString(团队,COLS [J] .InnerText.Trim());打破;
案例3:writer.WriteElementString(POS,COLS [J] .InnerText.Trim());打破;
壳体4:writer.WriteElementString(GPCOLS [j]的.InnerText.Trim());打破;
壳体5:writer.WriteElementString(G,COLS [j]的.InnerText.Trim());打破;
案例6:writer.WriteElementString(A,COLS [J] .InnerText.Trim());打破;
案例7:writer.WriteElementString(PlusMinus,COLS [J] .InnerText.Trim());打破;
案例8:writer.WriteElementString(PIM,COLS [J] .InnerText);打破;
案例9:writer.WriteElementString(PP,COLS [J] .InnerText);打破;
案例10:writer.WriteElementString(SH,COLS [J] .InnerText);打破;
案例11:writer.WriteElementString(GW,COLS [J] .InnerText);打破;
案例12:writer.WriteElementString(OT,COLS [J] .InnerText);打破;
案例13:writer.WriteElementString(镜头,COLS [J] .InnerText);打破;
案例14:writer.WriteElementString(ShotPctg,COLS [J] .InnerText);打破;
案例15:writer.WriteElementString(TOIPerGame,COLS [J] .InnerText);打破;
案例16:writer.WriteElementString(ShiftsPerGame,COLS [J] .InnerText);打破;
案例17:writer.WriteElementString(FOWinPctg,COLS [J] .InnerText);打破;
}
}
}
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.Flush();
}
终于
{
如果
writer.Close()(作家!= NULL);
}
}
}
}
这给出了下面的XML文件作为输出:
< XML版本=1.0编码=UTF-8? >
<统计日期=2011-01-01>
将;玩家评级=1>
<名称>悉尼克罗斯比< /名称>
<团队及GT;凹窝LT; /团队及GT;
<平面> C< / POS>
<&GP GT; 39 LT; / GP>
< G> 32 LT; / G>
< A> 33 LT; / A>
将; PlusMinus> 20℃/ PlusMinus>
<&PIM GT; 29 LT; / PIM>
< PP> 10< / PP>
将; SH大于1&下; / SH>
< GW>第3版; / GW>
<射击和GT; 0℃; /射击和GT;
< ShotPctg> 154 LT; / ShotPctg>
< TOIPerGame> 20.8< / TOIPerGame>
< ShiftsPerGame> 21:54< / ShiftsPerGame>
< FOWinPctg> 22.6< / FOWinPctg>
< /播放器>
< /统计>
I need to parse Html code. More specifically, parse each cell of every rows in all tables. Each row represent a single object and each cell represent different properties. I want to parse these to be able to write an XML file with every data inside (without the useless HTML code). I have successfully been able to parse each column from the HTML file but now I don't know what my options are for writing this to an XML file. I am baffled.
HTML:
<tr><tr>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF">
1
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="left">
<a href="/ice/player.htm?id=8471675">Sidney Crosby</a>
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center">
PIT
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center">
C
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
39
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
32
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
33
</td>
<td class="statBox sorted" style="border-width:0px 1px 1px 0px; background-color: #E0E0E0" align="right">
<font color="#000000">
65
</font>
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
20
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
29
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
10
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
1
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
3
</td>
<td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right">
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
0
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
154
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
20.8
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
21:54
</td>
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right">
22.6
</td>
<td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right">
55.7
</td>
</tr></tr>
C#:
using HtmlAgilityPack;
namespace Stats
{
class StatsParser
{
private string htmlCode;
private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml";
public StatsParser(string htmlCode)
{
this.htmlCode = htmlCode;
this.ParseHtml();
}
public void ParseHtml()
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
try
{
// Get all tables in the document
HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");
// Iterate all rows in the first table
HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
for (int i = 0; i < rows.Count; ++i)
{
// Iterate all columns in this row
HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='statBox']");
for (int j = 0; j < cols.Count; ++j)
{
// Get the value of the column and print it
string value = cols[j].InnerText;
if (value!="")
System.Windows.MessageBox.Show(value);
}
}
}
catch (NullReferenceException)
{
System.Windows.Forms.MessageBox.Show("Exception!!");
}
}
XML:
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Position>C</Position>
<GamesPlayed>39</GamesPlayed>
<Goals>32</Goals>
<Assists>33</Assists>
</Player>
</Stats>
After looking around MSDN, I finally found an implementation solution to my problem:
using System;
using HtmlAgilityPack;
using System.Xml;
namespace HockeyStats
{
class StatsParser
{
private string htmlCode;
private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml";
public StatsParser(string htmlCode)
{
this.htmlCode = htmlCode;
this.ParseHtml();
}
public void ParseHtml()
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlCode);
XmlWriter writer = null;
try
{
// Create an XmlWriterSettings object with the correct options.
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.OmitXmlDeclaration = false;
// Create the XmlWriter object and write some content.
writer = XmlWriter.Create(@"..\..\"+fileName, settings);
writer.WriteStartElement("Stats");
writer.WriteAttributeString("Date", DateTime.Now.ToShortDateString());
// Iterate all rows within another row
HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr/tr");
for (int i = 0; i < rows.Count; ++i)
{
// Iterate all columns in this row
HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='statBox']");
for (int j = 0; j < 20; ++j)
{
switch (j)
{
case 0:
{
writer.WriteStartElement("Player");
writer.WriteAttributeString("Rank", cols[j].InnerText.Trim()); break;
}
case 1: writer.WriteElementString("Name", cols[j].InnerText.Trim()); break;
case 2: writer.WriteElementString("Team", cols[j].InnerText.Trim()); break;
case 3: writer.WriteElementString("Pos", cols[j].InnerText.Trim()); break;
case 4: writer.WriteElementString("GP", cols[j].InnerText.Trim()); break;
case 5: writer.WriteElementString("G", cols[j].InnerText.Trim()); break;
case 6: writer.WriteElementString("A", cols[j].InnerText.Trim()); break;
case 7: writer.WriteElementString("PlusMinus", cols[j].InnerText.Trim()); break;
case 8: writer.WriteElementString("PIM", cols[j].InnerText); break;
case 9: writer.WriteElementString("PP", cols[j].InnerText); break;
case 10: writer.WriteElementString("SH", cols[j].InnerText); break;
case 11: writer.WriteElementString("GW", cols[j].InnerText); break;
case 12: writer.WriteElementString("OT", cols[j].InnerText); break;
case 13: writer.WriteElementString("Shots", cols[j].InnerText); break;
case 14: writer.WriteElementString("ShotPctg", cols[j].InnerText); break;
case 15: writer.WriteElementString("TOIPerGame", cols[j].InnerText); break;
case 16: writer.WriteElementString("ShiftsPerGame", cols[j].InnerText); break;
case 17: writer.WriteElementString("FOWinPctg", cols[j].InnerText); break;
}
}
}
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.Flush();
}
finally
{
if (writer != null)
writer.Close();
}
}
}
}
which gives the following XML file as an output:
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
这篇关于解析表,电池使用HTML敏捷性在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!