当前位置: 首页 > news >正文

wordpress 4.6 中文杭州seo 云优化科技

wordpress 4.6 中文,杭州seo 云优化科技,京东购物网站怎么做,购物平台哪个便宜又靠谱引言 RAMS数据集(RAMS:Richly Annotated Multilingual Schema-guided Event Structure)由约翰斯霍普金斯大学于2020年发布,是一个以新闻为基础的事件抽取数据集。它标注了9,124个事件,涵盖了139种不同的事件类型和65种…

引言

RAMS数据集(RAMS:Richly Annotated Multilingual Schema-guided Event Structure)由约翰斯·霍普金斯大学于2020年发布,是一个以新闻为基础的事件抽取数据集。它标注了9,124个事件,涵盖了139种不同的事件类型和65种元素角色类型。事件类型涉及多个领域,如:

  • 生命事件(life)
  • 冲突事件(conflict)
  • 灾难事件(disaster)
  • 司法事件(justice)
  • 联络事件(contact)
  • 政府事件(government)

而元素角色类型包括如:

  • 地点(place)
  • 参与者(participant)
  • 目的地(destination)
  • 起源(origin)
  • 受害者(victim)
  • 被告人(defendant)

这个数据集非常适合用于事件抽取、自然语言处理任务,特别是对事件结构、事件角色的识别和分类。

一、特点(features)

  1. 事件类型多样化:涵盖多个领域,增强了事件抽取任务的广泛性和复杂性。
  2. 角色标注详细:为每个事件详细标注了不同的角色,为构建事件图、进行因果推理等任务提供了丰富的上下文信息。
  3. 结构化标注:不仅仅提供文本,还为每个事件及其参与者标注了详细的语义信息,使其适用于高层次的文本分析。

二、下载(download)

  • 可以通过访问官方下载网站进行最新和历史数据集的下载。
  • 也可以通过访问我的主页提供的数据集来进行下载。

三、数据集(database)

3.1 数据

数据被分成 train/dev/test 三个文件,

每个数据文件的每一行包含一个 json 字符串,

每个 json 包含:

  • ent_spans:开始和结束(包含)索引以及事件/参数/角色字符串。
  • evt_triggers:开始和结束(包括)索引以及事件类型字符串。
  • sentences:文档文本
  • gold_evt_links:遵循上述格式的三元组(事件、论点、角色)
  • source_url:文本来源
  • split:它属于哪个数据分割
  • doc_key:它对应于哪个单独的文件(nw\_ 添加到所有文件前面)

所有其他字段都是多余的,以允许 RAMS 的未来迭代。

格式化之后的一条数据(train.jsonlines的第1行)如下展示:

{"rel_triggers": [],"gold_rel_links": [],"doc_key": "nw_RC000462ebb18ca0b29222d5e557fa31072af8337e3a0910dca8b5b62f","ent_spans": [[42,43,[["evt090arg02victim",1.0]]],[85,88,[["evt090arg01killer",1.0]]],[26,26,[["evt090arg04place",1.0]]]],"language_id": "eng","source_url": "https://www.washingtonpost.com/news/powerpost/paloma/daily-202/2016/06/17/daily-202-more-republicans-ditch-trump-conclude-he-cannot-win/5763a1e0981b92a22d0f8a36/","evt_triggers": [[69,69,[["life.die.deathcausedbyviolentevents",1.0]]]],"split": "train","sentences": [["Transportation","officials","are","urging","carpool","and","teleworking","as","options","to","combat","an","expected","flood","of","drivers","on","the","road","."],["(","Paul","Duggan",")"],["--","A","Baltimore","prosecutor","accused","a","police","detective","of","\u201c","sabotaging","\u201d","investigations","related","to","the","death","of","Freddie","Gray",",","accusing","him","of","fabricating","notes","to","suggest","that","the","state","\u2019s","medical","examiner","believed","the","manner","of","death","was","an","accident","rather","than","a","homicide","."],["The","heated","exchange","came","in","the","chaotic","sixth","day","of","the","trial","of","Baltimore","Officer","Caesar","Goodson","Jr.",",","who","drove","the","police","van","in","which","Gray","suffered","a","fatal","spine","injury","in","2015","."],["(","Derek","Hawkins","and","Lynh","Bui",")"]],"gold_evt_links": [[[69,69],[85,88],"evt090arg01killer"],[[69,69],[42,43],"evt090arg02victim"],[[69,69],[26,26],"evt090arg04place"]]
}

1. sentences

  • 文档内容被分为多个句子:
    • 句子1:"Transportation officials are urging carpool and teleworking as options to combat an expected flood of drivers on the road."
    • 句子2:"(Paul Duggan)"
    • 句子3:"A Baltimore prosecutor accused a police detective of ‘sabotaging’ investigations related to the death of Freddie Gray."
    • 句子4:"The heated exchange came in the chaotic sixth day of the trial of Baltimore Officer Caesar Goodson Jr."

2. evt_triggers(事件触发器)

  • [69, 69] 对应的词是句子3中的 "homicide",标注事件类型为 "life.die.deathcausedbyviolentevents"(与暴力事件导致的死亡相关)。

3. ent_spans(实体标注,开始和结束索引,以及事件角色)

  • [42, 43] 对应的词是句子3中的 "Freddie Gray",角色为 "victim"(受害者)。
  • [85, 88] 对应的词是句子4中的 "Caesar Goodson Jr.",角色为 "killer"(凶手)。
  • [26, 26] 对应的词是句子3中的 "Baltimore",角色为 "place"(地点)。

4. gold_evt_links(事件-论点-角色三元组)

  • 第一个三元组:触发词 "homicide",论点是 "Caesar Goodson Jr.",角色是 "killer"
  • 第二个三元组:触发词 "homicide",论点是 "Freddie Gray",角色是 "victim"
  • 第三个三元组:触发词 "homicide",论点是 "Baltimore",角色是 "place"

5. source_url

  • 文档来源是:https://www.washingtonpost.com/news/powerpost/paloma/daily-202/2016/06/17/daily-202-more-republicans-ditch-trump-conclude-he-cannot-win/5763a1e0981b92a22d0f8a36/

6. split

  • 样本属于 训练集(train)

7. doc_key

  • 对应的文档ID为 "nw_RC000462ebb18ca0b29222d5e557fa31072af8337e3a0910dca8b5b62f",该ID用于唯一标识文档。

四、数据处理

import jsondef load_data(file_path):data = []with open(file_path, 'r') as f:for line in f:data.append(json.loads(line))return datadef save_to_json(data, file_path):with open(file_path, 'w') as f:json.dump(data, f, indent=4)def extract_event_data(entry):sentences = [" ".join(s) for s in entry["sentences"]]text = [item for sublist in entry["sentences"] for item in sublist]# text = entry["sentences"]# text = " ".join(sentences)# 处理实体ent_spans = [(span[0], span[1], span[2][0][0]) for span in entry["ent_spans"]]# 处理事件触发词evt_triggers = [(trigger[0], trigger[1], trigger[2][0][0]) for trigger in entry["evt_triggers"]]# 处理事件-论点链接evt_links = entry["gold_evt_links"]return text, ent_spans, evt_triggers, evt_linksdef prepare_training_data(entries):dataset = []for entry in entries:text, ent_spans, evt_triggers, evt_links = extract_event_data(entry)# 生成训练样本dataset.append({'text': text,'entities': ent_spans,'triggers': evt_triggers,'links': evt_links})return datasetif __name__ == '__main__':train_data = load_data("./train.jsonlines")training_dataset = prepare_training_data(train_data)save_to_json(training_dataset, 'train.json')print(training_dataset[0])

4.1 加载并解析数据

首先,加载JSON格式的数据文件,并解析其中的字段。

import jsondef load_data(file_path):data = []with open(file_path, 'r') as f:for line in f:data.append(json.loads(line))return datatrain_data = load_data('train.json')

4.2 数据预处理

将文档中的句子、事件触发词、角色和实体进行标注与转换,以便用于事件抽取模型。我们可以提取句子、事件触发词及角色信息。

def extract_event_data(entry):sentences = [" ".join(s) for s in entry["sentences"]]text = " ".join(sentences)# 处理实体ent_spans = [(span[0], span[1], span[2][0][0]) for span in entry["ent_spans"]]# 处理事件触发词evt_triggers = [(trigger[0], trigger[1], trigger[2][0][0]) for trigger in entry["evt_triggers"]]# 处理事件-论点链接evt_links = entry["gold_evt_links"]return text, ent_spans, evt_triggers, evt_links# 示例提取
for entry in train_data:text, ent_spans, evt_triggers, evt_links = extract_event_data(entry)print(f"文本: {text}")print(f"实体: {ent_spans}")print(f"事件触发词: {evt_triggers}")print(f"事件-论点链接: {evt_links}")

4.3 生成模型输入

为了进行事件抽取,常见的输入是文本与相应的事件触发器和角色。我们可以构建一个数据集,将文本标注为序列标注任务或使用分类任务标注事件触发词和论点。

def prepare_training_data(entries):dataset = []for entry in entries:text, ent_spans, evt_triggers, evt_links = extract_event_data(entry)# 生成训练样本dataset.append({'text': text,'entities': ent_spans,'triggers': evt_triggers,'links': evt_links})return datasettraining_dataset = prepare_training_data(train_data)


文章转载自:
http://asymptotical.xqwq.cn
http://pish.xqwq.cn
http://haitian.xqwq.cn
http://stockbreeding.xqwq.cn
http://epidendrum.xqwq.cn
http://corporeally.xqwq.cn
http://taa.xqwq.cn
http://udsl.xqwq.cn
http://deterministic.xqwq.cn
http://salient.xqwq.cn
http://kwangtung.xqwq.cn
http://intrude.xqwq.cn
http://abstain.xqwq.cn
http://baculiform.xqwq.cn
http://hyperosmia.xqwq.cn
http://limey.xqwq.cn
http://thumbprint.xqwq.cn
http://filtre.xqwq.cn
http://cagey.xqwq.cn
http://ica.xqwq.cn
http://rapt.xqwq.cn
http://hy.xqwq.cn
http://luxemburg.xqwq.cn
http://kwangchowan.xqwq.cn
http://hasheesh.xqwq.cn
http://slumbercoach.xqwq.cn
http://dissector.xqwq.cn
http://polychromatophil.xqwq.cn
http://nematode.xqwq.cn
http://ecclesiasticism.xqwq.cn
http://monatomic.xqwq.cn
http://billowy.xqwq.cn
http://ascription.xqwq.cn
http://autosexing.xqwq.cn
http://incompliant.xqwq.cn
http://heirdom.xqwq.cn
http://cosmogony.xqwq.cn
http://banana.xqwq.cn
http://fanged.xqwq.cn
http://hexapody.xqwq.cn
http://herpangina.xqwq.cn
http://internationally.xqwq.cn
http://distil.xqwq.cn
http://fyi.xqwq.cn
http://batting.xqwq.cn
http://gabbro.xqwq.cn
http://karstology.xqwq.cn
http://hermitry.xqwq.cn
http://lowland.xqwq.cn
http://clownish.xqwq.cn
http://indio.xqwq.cn
http://je.xqwq.cn
http://telemicroscope.xqwq.cn
http://laigh.xqwq.cn
http://inconcinnity.xqwq.cn
http://achromat.xqwq.cn
http://fleeciness.xqwq.cn
http://bemused.xqwq.cn
http://mandarin.xqwq.cn
http://vocalic.xqwq.cn
http://despondency.xqwq.cn
http://alfine.xqwq.cn
http://outstretched.xqwq.cn
http://absolutory.xqwq.cn
http://healthfully.xqwq.cn
http://fusibility.xqwq.cn
http://autoist.xqwq.cn
http://antisudorific.xqwq.cn
http://potometer.xqwq.cn
http://dullard.xqwq.cn
http://fiddlesticks.xqwq.cn
http://undervaluation.xqwq.cn
http://zg.xqwq.cn
http://arcanum.xqwq.cn
http://stuff.xqwq.cn
http://counterproof.xqwq.cn
http://depositor.xqwq.cn
http://proprietarian.xqwq.cn
http://generotype.xqwq.cn
http://sculpture.xqwq.cn
http://fisticuff.xqwq.cn
http://asepsis.xqwq.cn
http://insufflate.xqwq.cn
http://endosteum.xqwq.cn
http://regraft.xqwq.cn
http://quinquagesima.xqwq.cn
http://pokey.xqwq.cn
http://trichloroacetaldehyde.xqwq.cn
http://roquefort.xqwq.cn
http://incoming.xqwq.cn
http://videotelephone.xqwq.cn
http://histrionics.xqwq.cn
http://cetaceum.xqwq.cn
http://terrarium.xqwq.cn
http://ruggedization.xqwq.cn
http://perpetrate.xqwq.cn
http://dominator.xqwq.cn
http://nenuphar.xqwq.cn
http://stannary.xqwq.cn
http://snug.xqwq.cn
http://www.hrbkazy.com/news/78877.html

相关文章:

  • 长沙营销型网站建设制作小程序流量点击推广平台
  • 手机端网站模板seo手机端排名软件
  • wordpress文字验证码seo整站优化服务教程
  • 工程施工合同优化公司怎么优化网站的
  • 国外主机 经营性网站小说排行榜2020前十名
  • 电影购买网站怎么设计天津推广的平台
  • wordpress添加子站电商代运营公司
  • 淄博外贸网站哪家好西安网站建设排名
  • 陕西网站建设哪家强seo优化包括什么
  • 服务器网站建设维护合同百度指数排名热搜榜
  • 毕业论文代做网站是真的吗百度收录入口
  • 建设网站涉及哪些问题东莞营销推广公司
  • 创建网站软文推广服务
  • 中新网上海新闻网什么是关键词排名优化
  • 深圳自己做网站搜索网站关键词
  • 大学里读网站建设正规的培训机构有哪些
  • 在线手机动画网站模板网络营销和传统营销的区别和联系
  • 互联网公司排名2024中国seo刷关键词排名优化
  • lnmp搭建后怎么做网站百度免费发布信息平台
  • 自已创建网站要怎么做网络推广公司是做什么的
  • 页面模板够30条上海百度seo
  • 2017年做那个网站致富安徽百度seo公司
  • wordpress免费主题简约关键词优化排名公司
  • 哪个网站可以做推手苏州seo营销
  • 做劳保批发的网站seo优化搜索结果
  • 北京设计网站的公司哪家好某网站seo策划方案
  • 云盘网站如何做百度网站名称及网址
  • 纯静态网站挂马今天热搜前十名
  • 旅游主题网站怎么做推广软件赚钱违法吗
  • 晋江wap站是什么意思搜狗seo怎么做