本文介绍了R:如何使用Data Science Toolbox对简单地址进行地址解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Google的地理编码感到厌烦,因此决定尝试其他方法.数据科学工具包( http://www.datasciencetoolkit.org )使您可以对不限数量的地址进行地理编码. R具有出色的封装,可以用作其功能的包装(CRAN:RDSTK).该软件包具有一个名为street2coordinates()的功能,该功能可与Data Science Toolkit的地理编码实用程序接口.

I am fedup with Google's geocoding, and decided to try an alternative. The Data Science Toolkit (http://www.datasciencetoolkit.org) allows you to Geocode unlimited number of addresses. R has an excellent package that serves as a wrapper for its functions (CRAN:RDSTK). The package has a function called street2coordinates() that interfaces with the Data Science Toolkit's geocoding utility.

但是,如果您尝试对诸如 City,Country 之类的简单内容进行地理编码,则RDSTK函数street2coordinates()不起作用.在下面的示例中,我将尝试使用该函数来获取凤凰城的纬度和经度:

However, the RDSTK function street2coordinates() does not work if you try to geocode something simple like City, Country. In the following example I will try to use the function to get the latitude and longitude for the city of Phoenix:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)

数据科学工具包中的实用程序可以完美运行.这是给出答案的URL请求: http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor = false& address = Phoenix + Arizona + United + States

The utility from the data science toolkit works perfectly. This is the URL request that gives the answer:http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=Phoenix+Arizona+United+States

我对对多个地址(完整的地址和城市名称)进行地理编码感兴趣.我知道Data Science Toolkit URL会很好地工作. 我该如何与URL交互,并在包含地址的数据框中获取多个纬度和经度?

I am interested in geocoding multiple addresses (which complete addresses and city names). I know that the Data Science Toolkit URL will work well. How do I interface with the URL and get multiple latitudes and longitudes into a data frame with the addresses?

这是一个示例数据集:

dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))

推荐答案

喜欢吗?

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713

这利用了street2coordinates API(此处记录的)的POST接口,该接口返回所有结果都在1个请求中,而不是使用多个GET请求.

This takes advantage of the POST interface to the street2coordinates API (documented here), which returns all the results in 1 request, rather than using multiple GET requests.

编辑(回复OP的评论)

缺少Phoenix似乎是street2coordinates API中的错误.如果您访问 API演示页面并尝试美国亚利桑那州凤凰城",则会收到无效响应.但是,如您的示例所示,使用他们的"Google风格的地理编码器" 确实给出了Phoenix的结果.因此,这是使用重复的GET请求的解决方案.请注意,这运行要慢得多.

The absence of Phoenix seems to be a bug in the street2coordinates API. If you go the API demo page and try "Phoenix, Arizona, United States", you get a null response. However, as your example shows, using their "Google-style Geocoder" does give a result for Phoenix. So here's a solution using repeated GET requests. Note that this runs much slower.

geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
  require(httr)
  require(rjson)
  url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
  response <- GET(url,query=list(sensor="FALSE",address=addr))
  json <- fromJSON(content(response,type="text"))
  loc  <- json['results'][[1]][[1]]$geometry$location
  return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
#                                     address         long        lat
# 1        Birmingham, Alabama, United States   -86.801904  33.456412
# 2            Mobile, Alabama, United States   -88.103184  30.701142
# 3           Phoenix, Arizona, United States -112.0733333 33.4483333
# 4            Tucson, Arizona, United States  -110.970869  32.217975
# 5      Little Rock, Arkansas, United States   -91.207356  33.608922
# 6       Berkeley, California, United States   -122.29673  37.860576
# 7         Duarte, California, United States  -118.298662  33.786594
# 8      Encinitas, California, United States  -116.846046  33.016928
# 9       La Jolla, California, United States  -117.876447  33.857515
# 10   Los Angeles, California, United States  -117.885359  35.187133
# 11        Orange, California, United States  -117.853112  33.787795
# 12  Redwood City, California, United States  -117.885359  35.187133
# 13    Sacramento, California, United States  -121.555406  38.380456
# 14 San Francisco, California, United States  -117.885359  35.187133
# 15      Stanford, California, United States    -122.1675   37.42509
# 16     Hartford, Connecticut, United States   -72.763564   41.78516
# 17    New Haven, Connecticut, United States   -72.927507  41.365709

这篇关于R:如何使用Data Science Toolbox对简单地址进行地址解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-13 13:37