当前位置: 首页 > news >正文

用那个程序做网站收录好百度爱采购官网

用那个程序做网站收录好,百度爱采购官网,有专业做淘宝网站的美工吗,沈阳建设工程信息网举报需求分析 需要对数据集进行预处理,选择合适的特征进行聚类分析,确定聚类的数量和初始中心点,调用Mahout提供的K-Means算法进行聚类计算,评估聚类结果的准确性和稳定性。同时,需要对Mahout的使用和参数调优进行深入学习…

需求分析

需要对数据集进行预处理,选择合适的特征进行聚类分析,确定聚类的数量和初始中心点,调用Mahout提供的K-Means算法进行聚类计算,评估聚类结果的准确性和稳定性。同时,需要对Mahout的使用和参数调优进行深入学习和实践,以保证聚类结果的有效性和可靠性。

系统实现

    1.对实验整体的理解:

    本次实验,我们的目的是理解聚类的原理,并且掌握常见聚类的算法,以及掌握使用Mahout实现K-Means聚类分析算法的过程。

     2.实验整体流程分析:

  • 创建项目,导入开发依赖包
  • 编写工具类
  • 编写聚类分析的代码
  • 将聚类结果输出
  • 评估聚类的效果

     3.准备工作:

  • 使用IDEA创建一个Maven项目:mahout_kmeans_demo

 

  • 修改pom.xml文件,导入开发MapReduce所需的Jar包
 <dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.mahout</groupId><artifactId>mahout-mr</artifactId><version>0.13.0</version></dependency><dependency><groupId>org.apache.mahout</groupId><artifactId>mahout-math</artifactId><version>0.13.0</version></dependency><dependency><groupId>org.apache.mahout</groupId><artifactId>mahout-hdfs</artifactId><version>0.13.0</version></dependency><dependency><groupId>org.apache.mahout</groupId><artifactId>mahout-integration</artifactId><version>0.13.0</version></dependency><dependency><groupId>org.apache.mahout</groupId><artifactId>mahout-examples</artifactId><version>0.13.0</version></dependency>
</dependencies>

下载相关依赖包

等待pom.xml文件不再出现错误即可 

  • 准备实验数据并下载

  • 启动Hadoop集群。

终端输入start-all.sh

可以使用jps命令查看集群启动情况。

     4.执行聚类过程:

  • 编写工具类HdfsUtil,对HDFS的基本操作进行封装
package com;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.mapred.JobConf;import java.io.IOException;
import java.net.URI;public class HdfsUtil {private static final String HDFS = "hdfs://master:9000/";private String hdfsPath;private Configuration conf;public HdfsUtil(Configuration conf) {this(HDFS, conf);}public HdfsUtil(String hdfs, Configuration conf) {this.hdfsPath = hdfs;this.conf = conf;}public static JobConf config() {JobConf conf = new JobConf(HdfsUtil.class);conf.setJobName("HdfsDAO");return conf;}public void mkdirs(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);if (!fs.exists(path)) {fs.mkdirs(path);System.out.println("Create: " + folder);}fs.close();}public void rmr(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.deleteOnExit(path);System.out.println("Delete: " + folder);fs.close();}public void ls(String folder) throws IOException {Path path = new Path(folder);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FileStatus[] list = fs.listStatus(path);System.out.println("ls: " + folder);System.out.println("==========================================================");for (FileStatus f : list) {System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());}System.out.println("==========================================================");fs.close();}public void createFile(String file, String content) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);byte[] buff = content.getBytes();FSDataOutputStream os = null;try {os = fs.create(new Path(file));os.write(buff, 0, buff.length);System.out.println("Create: " + file);} finally {if (os != null)os.close();}fs.close();}public void copyFile(String local, String remote) throws IOException {FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyFromLocalFile(new Path(local), new Path(remote));System.out.println("copy from: " + local + " to " + remote);fs.close();}public void download(String remote, String local) throws IOException {Path path = new Path(remote);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);fs.copyToLocalFile(path, new Path(local));System.out.println("download: from" + remote + " to " + local);fs.close();}public void cat(String remoteFile) throws IOException {Path path = new Path(remoteFile);FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);FSDataInputStream fsdis = null;System.out.println("cat: " + remoteFile);try {fsdis = fs.open(path);IOUtils.copyBytes(fsdis, System.out, 4096, false);} finally {IOUtils.closeStream(fsdis);fs.close();}}
}
  • 编写KMeansMahout类,执行聚类过程
package com;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.clustering.canopy.CanopyDriver;
import org.apache.mahout.clustering.conversion.InputDriver;
import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.common.HadoopUtil;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
import org.apache.mahout.utils.clustering.ClusterDumper;public class KMeansMahout {private static final String HDFS = "hdfs://master:9000";public static void main(String[] args) throws Exception {String localFile = "/home/data/iris.dat";//  mahout输出至HDFS的目录String outputPath = HDFS + "/user/hdfs/kmeans/output";//  mahout的输入目录String inputPath = HDFS + "/user/hdfs/kmeans/input/";//  canopy算法的t1和t2double t1 = 2;double t2 = 1;//  收敛阀值double convergenceDelta = 0.5;//  最大迭代次数int maxIterations = 10;Path output = new Path(outputPath);Path input = new Path(inputPath);Configuration conf = new Configuration();HdfsUtil hdfs = new HdfsUtil(HDFS, conf);hdfs.rmr(inputPath);hdfs.mkdirs(inputPath);hdfs.copyFile(localFile, inputPath);hdfs.ls(inputPath);//  每次执行聚类前,删除掉上一次的输出目录HadoopUtil.delete(conf, output);//  执行聚类run(conf, input, output, new EuclideanDistanceMeasure(), t1, t2, convergenceDelta, maxIterations);}private static void run(Configuration conf, Path input, Path output,EuclideanDistanceMeasure euclideanDistanceMeasure, double t1, double t2,double convergenceDelta, int maxIterations) throws Exception {Path directoryContainingConvertedInput = new Path(output, "data");System.out.println("Preparing  Input");//  将输入文件序列化,并选取RandomAccessSparseVector作为保存向量的数据结构InputDriver.runJob(input, directoryContainingConvertedInput,"org.apache.mahout.math.RandomAccessSparseVector");System.out.println("Running  Canopy  to  get  initial  clusters");//  保存canopy的目录Path canopyOutput = new Path(output, "canopies");//  执行Canopy聚类CanopyDriver.run(conf, directoryContainingConvertedInput, canopyOutput,euclideanDistanceMeasure, t1, t2, false, 0.0, false);System.out.println("Running  KMeans");//  执行k-means聚类,并使用canopy目录KMeansDriver.run(conf, directoryContainingConvertedInput,new Path(canopyOutput, Cluster.INITIAL_CLUSTERS_DIR + "-final"),output, convergenceDelta, maxIterations, true, 0.0, false);System.out.println("run  clusterdumper");//  将聚类的结果输出至HDFSClusterDumper clusterDumper = new ClusterDumper(new Path(output, "clusters-*-final"),new Path(output, "clusteredPoints"));clusterDumper.printClusters(null);}
}

在KmeansMahout类上点击右键并执行程序

 执行结果在HDFS目录中

     5.解析聚类结果:

  • 从Mahout的输出目录下提取出所要的信息

  • 编写ClusterOutput类,解析聚类后结果
package com;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable;
import org.apache.mahout.math.Vector;import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;public class ClusterOutput {private static final String HDFS = "hdfs://master:9000";public static void main(String[] args) {try {//   需要被解析的mahout的输出文件String clusterOutputPath = "/user/hdfs/kmeans/output";//   解析后的聚类结果,将输出至本地磁盘String resultPath = "/home/data/result.txt";BufferedWriter bw;Configuration conf = new Configuration();conf.set("fs.default.name", HDFS);FileSystem fs = FileSystem.get(conf);SequenceFile.Reader reader = null;reader = new SequenceFile.Reader(fs, new Path(clusterOutputPath + "/clusteredPoints/part-m-00000"), conf);bw = new BufferedWriter(new FileWriter(new File(resultPath)));//   key为聚簇中心IDIntWritable key = new IntWritable();WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable();while (reader.next(key, value)) {//   得到向量Vector vector = value.getVector();String vectorValue = "";//   将向量各个维度拼接成一行,用\t分隔for (int i = 0; i < vector.size(); i++) {if (i == vector.size() - 1) {vectorValue += vector.get(i);} else {vectorValue += vector.get(i) + "\t";}}bw.write(key.toString() + "\t" + vectorValue + "\n\n");}bw.flush();reader.close();} catch (Exception e) {e.printStackTrace();}}
}

在ClusterOutput类上右键执行程序

 执行结果被保存在/home/data/result.txt文件中,打开终端执行以下命令

     6.评估聚类效果:

  • 编写InterClusterDistances类,计算平均簇间距离
package com;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Writable;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.clustering.iterator.ClusterWritable;
import org.apache.mahout.common.distance.DistanceMeasure;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;public class InterClusterDistances {private static final String HDFS = "hdfs://master:9000";public static void main(String[] args) throws Exception {String inputFile = HDFS + "/user/hdfs/kmeans/output";System.out.println("聚类结果文件地址:" + inputFile);Configuration conf = new Configuration();Path path = new Path(inputFile + "/clusters-2-final/part-r-00000");System.out.println("Input Path:" + path);FileSystem fs = FileSystem.get(path.toUri(), conf);List<Cluster> clusters = new ArrayList<Cluster>();SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);Writable key = (Writable) reader.getKeyClass().newInstance();ClusterWritable value = (ClusterWritable) reader.getValueClass().newInstance();while (reader.next(key, value)) {Cluster cluster = value.getValue();clusters.add(cluster);value = (ClusterWritable) reader.getValueClass().newInstance();}System.out.println("Cluster In Total:" + clusters.size());DistanceMeasure measure = new EuclideanDistanceMeasure();double max = 0;double min = Double.MAX_VALUE;double sum = 0;int count = 0;Set<Double> total = new HashSet<Double>();// 如果聚类的个数大于1才开始计算if (clusters.size() != 1 && clusters.size() != 0) {for (int i = 0; i < clusters.size(); i++) {for (int j = 0; j < clusters.size(); j++) {double d = measure.distance(clusters.get(i).getCenter(), clusters.get(j).getCenter());min = Math.min(d, min);max = Math.max(d, max);total.add(d);sum += d;count++;}}System.out.println("Maximum Intercluster Distance:" + max);System.out.println("Minimum Intercluster Distance:" + min);System.out.println("Average Intercluster Distance:" + sum / count);for (double d : total) {System.out.print("[" + d + "] ");}} else if (clusters.size() == 1) {System.out.println("只有一个类,无法判断聚类质量");} else if (clusters.size() == 0) {System.out.println("聚类失败");}}
}

同样右键执行程序,得到下图结果


文章转载自:
http://necrose.qpnb.cn
http://readmit.qpnb.cn
http://jejunectomy.qpnb.cn
http://rotatee.qpnb.cn
http://reclaim.qpnb.cn
http://passel.qpnb.cn
http://indemonstrable.qpnb.cn
http://lakh.qpnb.cn
http://earmuff.qpnb.cn
http://ubon.qpnb.cn
http://citriculture.qpnb.cn
http://unialgal.qpnb.cn
http://wildwind.qpnb.cn
http://mtu.qpnb.cn
http://satinpod.qpnb.cn
http://laniary.qpnb.cn
http://aca.qpnb.cn
http://coadjust.qpnb.cn
http://nape.qpnb.cn
http://hypoglossal.qpnb.cn
http://photosynthate.qpnb.cn
http://cuticolor.qpnb.cn
http://pipsissewa.qpnb.cn
http://ottava.qpnb.cn
http://nhg.qpnb.cn
http://unexaggerated.qpnb.cn
http://filiferous.qpnb.cn
http://orestes.qpnb.cn
http://unwind.qpnb.cn
http://counterpropaganda.qpnb.cn
http://ratbite.qpnb.cn
http://mortgagee.qpnb.cn
http://hippodrome.qpnb.cn
http://brocoli.qpnb.cn
http://saucier.qpnb.cn
http://vax.qpnb.cn
http://anemophily.qpnb.cn
http://liquidus.qpnb.cn
http://illogical.qpnb.cn
http://ottava.qpnb.cn
http://logogriph.qpnb.cn
http://clicker.qpnb.cn
http://havelock.qpnb.cn
http://foreshore.qpnb.cn
http://expeditious.qpnb.cn
http://loam.qpnb.cn
http://salvar.qpnb.cn
http://cascalho.qpnb.cn
http://shapeliness.qpnb.cn
http://enwind.qpnb.cn
http://sociometry.qpnb.cn
http://booth.qpnb.cn
http://pleomorphous.qpnb.cn
http://prelatize.qpnb.cn
http://dabster.qpnb.cn
http://lichenometry.qpnb.cn
http://sicklebill.qpnb.cn
http://sickle.qpnb.cn
http://spoilbank.qpnb.cn
http://achromobacter.qpnb.cn
http://painfulness.qpnb.cn
http://crescendo.qpnb.cn
http://lighting.qpnb.cn
http://machicoulis.qpnb.cn
http://uknet.qpnb.cn
http://x.qpnb.cn
http://unsteady.qpnb.cn
http://decivilize.qpnb.cn
http://naupathia.qpnb.cn
http://italianize.qpnb.cn
http://baldhead.qpnb.cn
http://arthrodial.qpnb.cn
http://pyromaniac.qpnb.cn
http://freyr.qpnb.cn
http://episternum.qpnb.cn
http://souvlaki.qpnb.cn
http://shmuck.qpnb.cn
http://assassin.qpnb.cn
http://tremellose.qpnb.cn
http://arete.qpnb.cn
http://bootstrap.qpnb.cn
http://fortieth.qpnb.cn
http://stifling.qpnb.cn
http://rudderfish.qpnb.cn
http://uncreative.qpnb.cn
http://indolently.qpnb.cn
http://vga.qpnb.cn
http://mitchell.qpnb.cn
http://scum.qpnb.cn
http://squawfish.qpnb.cn
http://intensivism.qpnb.cn
http://parboil.qpnb.cn
http://aleutian.qpnb.cn
http://smoothie.qpnb.cn
http://tagmeme.qpnb.cn
http://laparectomy.qpnb.cn
http://melville.qpnb.cn
http://perfectly.qpnb.cn
http://recommend.qpnb.cn
http://oyster.qpnb.cn
http://www.hrbkazy.com/news/73360.html

相关文章:

  • 哪个网站做律师推广怎样做竞价推广
  • 企业网站首页设计原则技能培训有哪些科目
  • 做音箱木工网站班级优化大师app下载学生版
  • 都匀网站建设行业门户网站推广
  • 网站内容怎么选择品牌策划公司哪家好
  • 莱芜吧 莱芜贴吧seo网站优化培训要多少钱
  • 鞍山市城市建设管理局网站淘宝seo搜索优化工具
  • 做老师讲课视频的教育网站微信群拉人的营销方法
  • 专门做电视剧截图的网站网络推广策划案
  • 有没一些网站只做临床药学seo单页快速排名
  • 企业微信网站开发文档株洲seo快速排名
  • 网站后台的编辑器不显示网站搜索排名
  • 优秀网站建设官网关键字
  • 免费永久vps服务器信息流优化师面试常见问题
  • 公司网站建设重点内容小程序商城
  • 做网站时怎么插入视频百度竞价托管代运营
  • 手机网页无法打开因为reset南昌网优化seo公司
  • 如何做配送网站湛江今日头条新闻
  • 网站备案每年一次吗痘痘怎么去除有效果
  • app投放推广郑州网站推广优化公司
  • asp.net mvc做网站难吗网站网络推广运营
  • 大连做网站排名做电商需要什么条件
  • 企业网站建设东莞培训课程开发
  • 网站页面设计如何收费上海seo
  • 德化网站建设软文代写平台有哪些
  • 政府网站手机版模板2021年搜索引擎排名
  • 武鸣住房和城乡规划建设局网站临沂做网络优化的公司
  • dw做网站设计千锋教育培训多少钱费用
  • 浦东新区网站设计网络营销前景和现状分析
  • 评网网站建设html友情链接