Algo un poco extraño pero estaba buscando la manera de pasar los datos de una búsqueda de google a un listview, alguna idea?
Hola.
¿Podrías aclarar el siguiente detalle, por favor?:
A) Estás usando la API oficial de
Google para .NET con un motor de búsqueda o '
Custom Search Engine' (
CSE) mediante la API de '
Google Custom Search'.
B) Estás intentando desarrollar un algoritmo casero de búsqueda en
Google.com para parsear manualmente el documento
Html devuelto en la query.
Si no estás utilizando la API para .NET de
Google, entonces debo advertirte de que lo que estás intentando hacer es algo prohibido en los términos legales (
TOS) de
Google, pero creo que podemos ignorar este detalle por que realmente
Google es un servicio público y por mucho que digan en el
TOS de
Google esto no deja de ser ético y público, solo faltaria que nos quitasen la libertad de poder utilizar
Google como nos de la real gana en nuestra aplicación...
El caso es que si no estás utilizando la API de
Google Custom Search, la cual por cierto es de pago (si, para usar el motor de búsqueda de Google, hay que pagar), entonces va a ser un código muy tedioso de llevar a cabo, no solamente por el parsing manual, sino por que los servicios de
Google están sujetos a cambios cada poco tiempo, así que con tiempo cambiarán algo y habrá que ir adaptando el código cada vez que hagan cambios que afecten a éste servicio...
¿Tienes parte del código ya hecha?
Bueno, al final me he animado a desarrollar una solución (casi)completa.
Requisitos:
- VisualStudio 2015 (o adaptar la sintaxis a una versión anterior de VB.NET)
- .NET Framework 4.5 para utilizar los métodos asincrónicos (o eliminar esos bloques de código en elcódigo fuente)
- HtmlAgilityPack:https://htmlagilitypack.codeplex.com/
Aviso que no es perfecto y puede tener algunas limitaciones, pero funciona hasta donde lo he testeado.
Primero, una class para representar (algunos de) los parámetros de búsqueda de Google:
' ***********************************************************************
' Author : Elektro
' Modified : 22-July-2016
' ***********************************************************************
#Region " Imports "
Imports System.Collections.Specialized
#End Region
Namespace Google.Types
''' <summary>
''' Represents the parameters of a query to <c>Google Search</c> service.
''' </summary>
Public Class GoogleSearchOptions : Inherits Object
#Region " Constant Values "
''' <summary>
''' The maximum number of results that can be included in the search results.
''' </summary>
Protected ReadOnly maxNumberOfResults As Integer = 1000
''' <summary>
''' The maximum search results that can be included to documents in the specified domain, host or web directory.
''' </summary>
Protected ReadOnly maxWebsiteLength As Integer = 125
#End Region
#Region " Properties "
''' <summary>
''' The search query. Words are separated by <c>+</c> signs.
''' <para></para>
''' Parameter name: <c>q</c>, Default value: N/A
''' </summary>
''' <remarks>
''' See <see href="https://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/request_format.html#1076993"/> for additional query features.
''' </remarks>
Public Overridable Property SearchTerm As String = ""
''' <summary>
''' Restricts searches to pages in the specified language.
''' If there are no results In the specified language, the search appliance displays results In all languages.
''' <para></para>
''' Parameter name: <c>lr</c>, Default value: Empty string
''' </summary>
''' <remarks>
''' See <see href="https://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/request_format.html#1077312"/> for more information.
''' </remarks>
Public Overridable Property Language As String = ""
''' <summary>
''' Limits search results to documents in the specified domain, host or web directory.
''' <para></para>
''' The specified value must contain less than <c>125</c> characters.
''' <para></para>
''' Parameter name: <c>as_sitesearch</c>, Default value: Empty string
''' </summary>
Public Overridable Property Website As String
<DebuggerNonUserCode>
Get
Return Me.websiteB
End Get
<DebuggerStepperBoundary>
Set(ByVal value As String)
If (value.Length > Me.maxWebsiteLength) Then
value = value.Substring(0, Me.maxWebsiteLength)
End If
Me.websiteB = value
End Set
End Property
''' <summary>
''' ( Backing field )
''' Limits search results to documents in the specified domain, host or web directory.
''' <para></para>
''' The specified value must contain less than <c>125</c> characters.
''' </summary>
Protected websiteB As String = ""
''' <summary>
''' Include only files of the specified type in the search results.
''' <para></para>
''' Parameter name: <c>as_filetype</c>, Default value: Empty string
''' </summary>
''' <remarks>
''' See <see href="https://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/request_format.html#1077199"/> for a list of possible values
''' </remarks>
Public Overridable Property Filetype As String = ""
''' <summary>
''' Sets the character encoding that is used to interpret the query string.
''' <para></para>
''' Parameter name: <c>ie</c>, Default value: <c>utf-8</c>
''' </summary>
''' <remarks>
''' See <see href="https://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/request_format.html#1077479"/> for more information
''' </remarks>
Public Overridable Property InputEncoding As Encoding = Encoding.GetEncoding("utf-8")
''' <summary>
''' Sets the character encoding that is used to encode the results.
''' <para></para>
''' Parameter name: <c>oe</c>, Default value: <c>utf-8</c>
''' </summary>
''' <remarks>
''' See <see href="https://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/request_format.html#1077479"/> for more information
''' </remarks>
Public Overridable Property OutputEncoding As Encoding = Encoding.Default
''' <summary>
''' The maximum number of results to include in the search results.
''' <para></para>
''' The maximum value of this parameter is 1000.
''' <para></para>
''' Parameter name: <c>num</c>, Default value: <c>10</c>
''' </summary>
Public Overridable Property NumberOfResults As Integer
<DebuggerNonUserCode>
Get
Return Me.numberOfResultsB
End Get
<DebuggerStepperBoundary>
Set(ByVal value As Integer)
If (value < 0) Then
value = 1
ElseIf (value > Me.maxNumberOfResults) Then
value = Me.maxNumberOfResults
End If
Me.numberOfResultsB = value
End Set
End Property
''' <summary>
''' ( Backing field )
''' The maximum number of results to include in the search results.
''' <para></para>
''' The maximum value of this parameter is 1000.
''' </summary>
Protected numberOfResultsB As Integer = 10
#End Region
#Region " Constructors "
''' <summary>
''' Initializes a new instance of the <see cref="GoogleSearchOptions"/> class.
''' </summary>
<DebuggerNonUserCode>
Public Sub New()
End Sub
#End Region
#Region " Public Methods "
''' <summary>
''' Returns a <see cref="NameValueCollection"/> that represents the <c>Google</c> query for this instance.
''' </summary>
''' <returns>
''' A <see cref="NameValueCollection"/> that represents the <c>Google</c> query for this instance.
''' </returns>
<DebuggerStepperBoundary>
Public Overridable Function ToNameValueCollection() As NameValueCollection
Dim params As New NameValueCollection()
params.Add("q", HttpUtility.UrlEncode(Me.SearchTerm))
If Not String.IsNullOrEmpty(Me.Filetype) Then
params.Add("as_filetype", Me.Filetype.ToLower())
End If
If Not String.IsNullOrEmpty(Me.Website) Then
params.Add("as_sitesearch", Me.Website.ToLower())
End If
If (Me.InputEncoding IsNot Nothing) Then
params.Add("ie", Me.InputEncoding.WebName.ToLower())
End If
If (Me.OutputEncoding IsNot Nothing) Then
params.Add("oe", Me.OutputEncoding.WebName.ToLower())
End If
If Not String.IsNullOrEmpty(Me.Language) Then
params.Add("lr", Me.Language.ToLower())
End If
If (Me.NumberOfResults > 0) Then
params.Add("num", Me.NumberOfResults.ToString())
End If
Return params
End Function
#End Region
#Region " Operator Overrides "
''' <summary>
''' Returns a <see cref="String"/> that represents the <c>Google</c> query for this instance.
''' </summary>
''' <returns>
''' A <see cref="String"/> that represents the <c>Google</c> query for this instance.
''' </returns>
<DebuggerStepperBoundary>
Public Overrides Function ToString() As String
Dim url As String = "https://www.google.com/search?"
Dim params As NameValueCollection = Me.ToNameValueCollection()
Dim sb As New StringBuilder
sb.Append(url)
For Each key As String In params.AllKeys
sb.AppendFormat("{0}={1}&", key, params(key))
Next
sb.Remove((sb.Length - 1), 1)
Return sb.ToString()
End Function
#End Region
End Class
End Namespace
Otra class para representar los resultados de búsqueda de Google:
' ***********************************************************************
' Author : Elektro
' Modified : 22-July-2016
' ***********************************************************************
Namespace Google.Types
''' <summary>
''' Represents a <c>Google Search</c> result.
''' </summary>
Public Class GoogleSearchResult : Inherits Object
#Region " Properties "
''' <summary>
''' Gets the search result title.
''' </summary>
''' <value>
''' The search result title.
''' </value>
Public Overridable ReadOnly Property Title As String
''' <summary>
''' Gets the search result title in the specified text encoding.
''' </summary>
''' <param name="inEnc">
''' The source text encoding.
''' </param>
'''
''' <param name="outEnc">
''' The target text encoding.
''' </param>
''' <value>
''' The search result title in the specified text encoding.
''' </value>
Public Overridable ReadOnly Property Title(ByVal inEnc As Encoding, ByVal outEnc As Encoding) As String
Get
Dim data As Byte() = inEnc.GetBytes(Me.Title)
Return outEnc.GetString(data)
End Get
End Property
''' <summary>
''' Gets the search result description.
''' </summary>
''' <value>
''' The search result description.
''' </value>
Public Overridable ReadOnly Property Description As String
''' <summary>
''' Gets the search result description in the specified text encoding.
''' </summary>
''' <param name="inEnc">
''' The source text encoding.
''' </param>
'''
''' <param name="outEnc">
''' The target text encoding.
''' </param>
''' <value>
''' The search result description in the specified text encoding.
''' </value>
Public Overridable ReadOnly Property Description(ByVal inEnc As Encoding, ByVal outEnc As Encoding) As String
Get
Dim data As Byte() = inEnc.GetBytes(Me.Description)
Return outEnc.GetString(data)
End Get
End Property
''' <summary>
''' Gets the search result Url.
''' </summary>
''' <value>
''' The search result Url.
''' </value>
Public Overridable ReadOnly Property Url As String
''' <summary>
''' Gets the search result <see cref="Uri"/>.
''' </summary>
''' <value>
''' The search result <see cref="Uri"/>.
''' </value>
Public Overridable ReadOnly Property Uri As Uri
<DebuggerStepperBoundary>
Get
Try
Return New Uri(Me.Url)
Catch ex As UriFormatException ' ToDo: Add a proper custom exception handler.
Throw
End Try
End Get
End Property
#End Region
#Region " Constructors "
''' <summary>
''' Prevents a default instance of the <see cref="GoogleSearchResult"/> class from being created.
''' </summary>
<DebuggerNonUserCode>
Private Sub New()
End Sub
''' <summary>
''' Initializes a new instance of the <see cref="GoogleSearchResult"/> class.
''' </summary>
''' <param name="title">
''' The search result title.
''' </param>
'''
''' <param name="url">
''' The search result url.
''' </param>
'''
''' <param name="description">
''' The search result description.
''' </param>
<DebuggerStepperBoundary>
Public Sub New(ByVal title As String, ByVal url As String, ByVal description As String)
Me.Title = title
Me.Url = url
Me.Description = description
End Sub
#End Region
End Class
End Namespace
Por último, esta class para buscar en Google y obtener los resultados de búsqueda:
' ***********************************************************************
' Author : Elektro
' Modified : 22-July-2016
' ***********************************************************************
#Region " Imports "
Imports System.Threading.Thread
Imports Google.Types
Imports HtmlAgilityPack
#End Region
Namespace Google.Tools
''' <summary>
''' Searches the Worl Wide Web using <c>Google Search</c> service
''' </summary>
Public NotInheritable Class GoogleSearcher : Inherits Object
#Region " Constructors "
''' <summary>
''' Prevents a default instance of the <see cref="GoogleSearcher"/> class from being created.
''' </summary>
Private Sub New()
End Sub
#End Region
#Region " Constant Values "
''' <summary>
''' A fake agent to get the expected data to parse in the resulting html string response of <c>Google Search</c> requests.
''' </summary>
Private Shared ReadOnly fakeAgent As String =
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2"
#End Region
#Region " Public Methods "
''' <summary>
''' Searches the Worl Wide Web using <c>Google Search</c> service with the specified search options.
''' </summary>
''' <param name="searchOptions">
''' The search options to use with the <c>Google Search</c> service.
''' </param>
''' <param name="page">
''' The search results page. Each page contains 100 results.
''' <para></para>
''' A search page contains 100 max. results. Note that max. total search results are 1000 (10 pages x 100 results).
''' </param>
''' <returns>
''' The resulting response as a <c>Html</c> string.
''' </returns>
<DebuggerStepperBoundary>
Public Shared Function GetSearchResultString(ByVal searchOptions As GoogleSearchOptions,
ByVal page As Integer) As String
If (page < 0) Then
Throw New ArgumentOutOfRangeException(paramName:="page")
End If
If (searchOptions.NumberOfResults < 100) AndAlso (page > 0) Then
Return String.Empty
End If
Using wc As New WebClient
wc.Headers.Add("user-agent", GoogleSearcher.fakeAgent)
Return wc.DownloadString(searchOptions.ToString() & String.Format("&start={0}", (100 * page)))
End Using
End Function
''' <summary>
''' Asynchronously searches the Worl Wide Web using <c>Google Search</c> service with the specified search options.
''' </summary>
''' <param name="searchOptions">
''' The search options to use with the <c>Google Search</c> service.
''' </param>
''' <param name="page">
''' The search results page.
''' <para></para>
''' A search page contains 100 max. results. Note that max. total search results are 1000 (10 pages x 100 results).
''' </param>
''' <returns>
''' The resulting response as a <c>Html</c> string.
''' </returns>
<DebuggerStepperBoundary>
Public Shared Async Function GetSearchResultStringAsync(ByVal searchOptions As GoogleSearchOptions,
ByVal page As Integer) As Task(Of String)
If (page < 0) Then
Throw New ArgumentOutOfRangeException(paramName:="page")
End If
If (searchOptions.NumberOfResults < 100) AndAlso (page > 0) Then
Return String.Empty
End If
Dim uri As New Uri(searchOptions.ToString() & String.Format("&start={0}", (100 * page)))
Using wc As New WebClient
wc.Headers.Add("user-agent", GoogleSearcher.fakeAgent)
Return Await wc.DownloadStringTaskAsync(searchOptions.ToString() & String.Format("&start={0}", (100 * page)))
End Using
End Function
''' <summary>
''' Searches the Worl Wide Web using <c>Google Search</c> service with the specified search options.
''' </summary>
''' <param name="searchOptions">
''' The search options to use with the <c>Google Search</c> service.
''' </param>
''' <returns>
''' A <see cref="List(Of GoogleSearchResult)"/> that represents the search results.
''' </returns>
<DebuggerStepThrough>
Public Shared Function GetSearchResult(ByVal searchOptions As GoogleSearchOptions) As List(Of GoogleSearchResult)
Dim html As String = GoogleSearcher.GetSearchResultString(searchOptions, page:=0)
Dim pageCount As Integer
Dim results As New List(Of GoogleSearchResult)
Do While True
If String.IsNullOrEmpty(html) Then ' No search-page to parse.
Return results
End If
' Load the html document.
Dim doc As New HtmlDocument
doc.LoadHtml(html)
' Select the result nodes.
Dim nodes As HtmlNodeCollection
nodes = doc.DocumentNode.SelectNodes("//div[@class='g']")
If (nodes Is Nothing) Then ' No search results in this page.
Return results
End If
' Loop trough the nodes.
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim title As String = "Title unavailable."
Dim description As String = "Description unavailable."
Dim url As String = ""
Try
title = HttpUtility.HtmlDecode(node.SelectSingleNode(".//div[@class='rc']/h3[@class='r']/a[@href]").InnerText)
Catch ex As NullReferenceException
End Try
Try
description = HttpUtility.HtmlDecode(node.SelectSingleNode(".//div[@class='rc']/div[@class='s']/div[1]/span[@class='st']").InnerText)
Catch ex As NullReferenceException
End Try
Try
url = HttpUtility.UrlDecode(node.SelectSingleNode(".//div[@class='rc']/h3[@class='r']/a[@href]").GetAttributeValue("href", "Unknown URL"))
Catch ex As NullReferenceException
Continue For
End Try
results.Add(New GoogleSearchResult(title, url, description))
If (results.Count = searchOptions.NumberOfResults) Then
Exit Do
End If
Next node
Sleep(TimeSpan.FromSeconds(2))
html = GoogleSearcher.GetSearchResultString(searchOptions, Interlocked.Increment(pageCount))
Loop
Return results
End Function
''' <summary>
''' Asynchronously searches the Worl Wide Web using <c>Google Search</c> service with the specified search options.
''' </summary>
''' <param name="searchOptions">
''' The search options to use with the <c>Google Search</c> service.
''' </param>
''' <returns>
''' A <see cref="List(Of GoogleSearchResult)"/> that represents the search results.
''' </returns>
<DebuggerStepThrough>
Public Shared Async Function GetSearchResultAsync(ByVal searchOptions As GoogleSearchOptions) As Task(Of List(Of GoogleSearchResult))
Dim html As String = Await GoogleSearcher.GetSearchResultStringAsync(searchOptions, page:=0)
Dim pageCount As Integer
Dim results As New List(Of GoogleSearchResult)
Do While True
If String.IsNullOrEmpty(html) Then ' No search-page to parse.
Return results
End If
' Load the html document.
Dim doc As New HtmlDocument
doc.LoadHtml(html)
' Select the result nodes.
Dim nodes As HtmlNodeCollection
nodes = doc.DocumentNode.SelectNodes("//div[@class='g']")
If (nodes Is Nothing) Then ' No search results in this page.
Return results
End If
' Loop trough the nodes.
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim title As String = "Title unavailable."
Dim description As String = "Description unavailable."
Dim url As String = ""
Try
title = HttpUtility.HtmlDecode(node.SelectSingleNode(".//div[@class='rc']/h3[@class='r']/a[@href]").InnerText)
Catch ex As NullReferenceException
End Try
Try
description = HttpUtility.HtmlDecode(node.SelectSingleNode(".//div[@class='rc']/div[@class='s']/div[1]/span[@class='st']").InnerText)
Catch ex As NullReferenceException
End Try
Try
url = HttpUtility.UrlDecode(node.SelectSingleNode(".//div[@class='rc']/h3[@class='r']/a[@href]").GetAttributeValue("href", "Unknown URL"))
Catch ex As NullReferenceException
Continue For
End Try
results.Add(New GoogleSearchResult(title, url, description))
If (results.Count = searchOptions.NumberOfResults) Then
Exit Do
End If
Next node
Sleep(TimeSpan.FromSeconds(2))
html = GoogleSearcher.GetSearchResultString(searchOptions, Interlocked.Increment(pageCount))
Loop
Return results
End Function
#End Region
End Class
End Namespace
Y aquí te dejo un ejemplo de uso:
Imports Google.Tools
Imports Google.Types
Dim searchOptions As New GoogleSearchOptions
With searchOptions
.SearchTerm = "elhacker.net"
.Language = "lang_en"
.InputEncoding = Encoding.GetEncoding("utf-8")
.OutputEncoding = Encoding.GetEncoding("Windows-1252")
.NumberOfResults = 100
End With
Dim searchResults As List(Of GoogleSearchResult) = GoogleSearcher.GetSearchResult(searchOptions)
Dim resultCount As Integer
For Each result As GoogleSearchResult In searchResults
Console.WriteLine("[{0:00}]", Interlocked.Increment(resultCount))
Console.WriteLine("Title: {0}", result.Title)
Console.WriteLine("Desc.: {0}", result.Description)
Console.WriteLine("Url..: {0}", result.Url)
Console.WriteLine()
Next result
Console.WriteLine("Finished.")
Resultado de ejecución:
[01]
Title: WarZone - elhacker.NET
Desc.: WarZone 1.0. el wargame de elhacker.net. WarZone es un wargame que contiene una serie de pruebas con simulaciones de vulnerabilidades web, pruebas de ...
Url..: http://warzone.elhacker.net/
[02]
Title: Blog elhacker.NET
Desc.: Una red masiva de cámaras de CCTV se pueden utilizar para realizar ataques DDoS a ordenadores de todo el mundo. Circuito cerrado de televisión o CCTV ...
Url..: http://blog.elhacker.net/
[03]
Title: elhacker.INFO
Desc.: 579 Gbps, nuevo récord de transferencia de datos en un ataque DDoS xx Consiguen saltarse los controles de seguridad de Gmail “dividiendo” en dos una.. xx ...
Url..: http://ns2.elhacker.net/
[04]
Title: elhacker.NET (@elhackernet) | Twitter
Desc.: 17.9K tweets • 13 photos/videos • 22.6K followers. "Varios usuarios estarían vendiendo sus cuentas de Pokémon GO en eBay por ... https://t.co/orvE4fhiIA"
Url..: https://twitter.com/elhackernet
[05]
Title: elhacker.NET Youtube - YouTube
Desc.: Canal oficial de elhacker.NET. ... Copia Video - Ataque ddos al foro mas Noob de la red www elhacker net - Duration: 4 minutes, 47 seconds. 523 views; 1 year ...
Url..: https://www.youtube.com/user/elhackerdotnet
[06]
Title: elhacker.net | Facebook
Desc.: elhacker.net. 4455 Me gusta · 9 personas están hablando de esto. Visita nuestro grupo aqui: https://www.facebook.com/groups/elhackerdotnet/
Url..: https://es-es.facebook.com/elhacker.net
[07]
Title: ElHacker.net RSS – Windows Apps on Microsoft Store
Desc.: Obtene las ultimas noticias RSS y de Twitter de ElHacker.net! More. Get the app. Get the app · Get the app · Get the app. To use this app on your PC, upgrade to ...
Url..: https://www.microsoft.com/en-us/store/apps/elhackernet-rss/9nblgggzkps5
[08]
Title: http://elhacker.net - Prezi
Desc.: http://caca.com javascript://http://caca.com. Full transcript. More presentations by xxxxxxxx%22<" xxxx>%22" · Copia de Prezi Business Presentation Res.
Url..: https://prezi.com/clcwzhtsg6eu/httpelhackernet/
[09]
Title: elhacker.net.url and Other Malware Associated Files - Exterminate It!
Desc.: Want to know what kind of malware elhacker.net.url is associated with? Want to know how to get rid of elhacker.net.url?
Url..: http://www.exterminate-it.com/malpedia/file/elhacker.net.url
[10]
Title: Steam Community :: Group :: elhacker.NET
Desc.: Hola gente!! Elhacker.net ya dispone de un servidor de teamspeak para que lo uséis junto con los demás usuarios del grupo y foro. La dirección del servidor ...
Url..: https://steamcommunity.com/groups/elhackernet
etc...
Saludos