-
公开(公告)号:US11030411B2
公开(公告)日:2021-06-08
申请号:US16883184
申请日:2020-05-26
发明人: Shaosheng Cao , Jun Zhou
IPC分类号: G06F40/295 , G06N3/04 , G06N3/08
摘要: Implementations of the present specification disclose a method for generating word vectors, apparatus, and device. The method includes: obtaining words by segmenting a corpus; establishing a feature vector of each obtained word based on n-ary characters; training a convolutional neural network based on the feature vectors of the obtained words and the feature vectors of context words associated with each obtained word in the corpus; and generating a word vector for each obtained word based on the feature vector of the obtained word and the trained convolutional neural network.
-
公开(公告)号:US10970596B2
公开(公告)日:2021-04-06
申请号:US16043043
申请日:2018-07-23
发明人: Jun Zhou
摘要: The present disclosure provides logistic regression gradient calculation methods and apparatuses. One exemplary calculation method comprises: acquiring training data, the training data including X-row user data and Y-row click-through data corresponding to the X-row user data; converting the X-row user data into X-column data; segmenting the X-column data and a weight vector to form N X-column data segmentation blocks and N weight vector segmentation blocks; starting N threads respectively to generate N sub-logistic regression gradients according to the N X-column data segmentation blocks, the N weight vector segmentation blocks, and the corresponding Y-row click-through data; and splicing the N sub-logistic regression gradients to form a full logistic regression gradient. With embodiments of the present disclosure, a computing machine can support training of a super-large-scale logistic regression model, which increases the calculation speed, shortens the training time, and greatly reduces the memory usage of the computing machine.
-
公开(公告)号:US10798056B2
公开(公告)日:2020-10-06
申请号:US16019897
申请日:2018-06-27
发明人: Jun Zhou
IPC分类号: H04L29/12 , G06F16/242 , H04L29/06 , G06F16/00 , G06F16/958 , G06F16/955
摘要: Techniques for navigating webpages requested through short links are provided. In some implementations, a short link uniform resource locator (URL) is received, the short link URL is processed to extract a simplified short link and an address code, and a determination is made as to whether the simplified short link is associated with a long link URL representing an address of a webpage. In response to determining that the simplified short link is associated with a long link URL, the associated long link URL is provided. In response to determining that the simplified short link is not associated with a long link URL, a common long link URL associated with the address code is provided.
-
公开(公告)号:US10789321B2
公开(公告)日:2020-09-29
申请号:US16019228
申请日:2018-06-26
发明人: Jun Zhou
IPC分类号: H04L29/12 , G06F16/955 , G06F16/958 , G06F16/242 , G06F16/00
摘要: A server receives a short link application from a requester. The short link application includes a long link uniform resource locator (URL). The server obtains a database identifier based on the long link URL. The server determines whether a database associated with the database identifier is accessible by the server. In response to a determination that the database associated with the database identifier is accessible by the server, the server obtains a short link URL associated with the long link URL from the database, and transmits the short link URL to the requester.
-
公开(公告)号:US10776334B2
公开(公告)日:2020-09-15
申请号:US16736673
申请日:2020-01-07
发明人: Shaosheng Cao , Xinxing Yang , Jun Zhou , Xiaolong Li
IPC分类号: G06F16/00 , G06F16/22 , G06F16/27 , G06F16/28 , G06F16/906
摘要: Embodiments of the present specification disclose random walking and a cluster-based random walking method, apparatus and device. A solution includes: obtaining information about each node included in graph data, generating, according to the information about each node, an index vector reflecting a degree value of a respective node, then generating an element vector reflecting an identifier of an adjacent node of the node, and generating a random sequence according to the index vector and the element vector, to implement random walks in the graph data. The solution is applicable to clusters and individual machines.
-
公开(公告)号:US20200097329A1
公开(公告)日:2020-03-26
申请号:US16697913
申请日:2019-11-27
发明人: Jun Zhou , Xiaolong Li
摘要: Evaluation results of a plurality of users are received from a plurality of data providers. The evaluation results are obtained by the plurality of data providers evaluating the plurality of users based on evaluation models of the plurality of data providers. A plurality of training samples is constructed by using the evaluation results. Each training sample includes a respective subset of the evaluation results corresponding to a same user of the plurality of users. A label for each training sample is generated based on an actual service execution status of the same user. A model is trained based on the plurality of training samples and the plurality of labels, including setting a plurality of variable coefficients, each variable coefficient specifying a contribution level of a corresponding data provider. Virtual resources to each data provider are allocated based on the plurality of variable coefficients.
-
7.
公开(公告)号:US20200034740A1
公开(公告)日:2020-01-30
申请号:US16587977
申请日:2019-09-30
发明人: Xinxing Yang , Shaosheng Cao , Jun Zhou , Xiaolong Li
摘要: An N×M dimensional target matrix is generated based on N data samples and M dimensional data features respectively corresponding to the N data samples. Encryption calculation is performed on the N×M dimensional target matrix based on a Principal Component Analysis (PCA) algorithm to obtain an N×K dimensional encryption matrix K is less than M. The N×K dimensional encryption matrix is transmitted to a modeling server. The modeling server trains a machine learning model by using the N×K dimensional encryption matrix as a training sample.
-
公开(公告)号:US20190042982A1
公开(公告)日:2019-02-07
申请号:US16132264
申请日:2018-09-14
发明人: Shenquan Qu , Jun Zhou , Yongming Ding
摘要: An automatic multi-threshold feature filtering method and an apparatus thereof are provided. In an iterative process of training a machine learning model, the feature filtering method calculates a feature filtering threshold and feature correlation values of a current round of iteration based on a result of a previous iteration, and performs feature filtering on samples based on the calculated feature filtering threshold and the calculated feature correlation values. The feature filtering apparatus of the present disclosure includes a calculation module and a feature filtering module. The method and apparatus of the present disclosure can automatically generate different feature filtering thresholds at each iteration, which greatly improves an accuracy of a filtering threshold, and can greatly increase the training speed of automatic machine learning and an accuracy of a machine learning model compared with fixed and single thresholds nowadays.
-
公开(公告)号:US12026618B2
公开(公告)日:2024-07-02
申请号:US18342204
申请日:2023-06-27
发明人: Jun Zhou
IPC分类号: G06N3/08 , G06F17/11 , G06F17/16 , G06F18/214 , G06N20/00
CPC分类号: G06N3/08 , G06F17/11 , G06F17/16 , G06F18/2148 , G06N20/00
摘要: The present disclosure provides a method and a system for training a machine learning system. Multiple pieces of sample data are used for training the machine learning system. The method includes acquiring multiple sample sets, each sample set including sample data in a corresponding sampling time period; setting a sampling rate for each sample set according to the corresponding sampling time period; acquiring multiple sample sets sampled according to set sampling rates; determining importance values of the multiple sampled sample sets; correcting each piece of sample data in the multiple sampled sample sets by using a corresponding importance value to obtain corrected sample data; and inputting the corrected sample data into the machine learning system to train the machine learning system.
-
公开(公告)号:US11720787B2
公开(公告)日:2023-08-08
申请号:US16114078
申请日:2018-08-27
发明人: Jun Zhou
IPC分类号: G06N3/08 , G06N20/00 , G06F18/214 , G06F17/11 , G06F17/16
CPC分类号: G06N3/08 , G06F17/11 , G06F17/16 , G06F18/2148 , G06N20/00
摘要: The present disclosure provides a method and a system for training a machine learning system. Multiple pieces of sample data are used for training the machine learning system. The method includes acquiring multiple sample sets, each sample set including sample data in a corresponding sampling time period; setting a sampling rate for each sample set according to the corresponding sampling time period; acquiring multiple sample sets sampled according to set sampling rates; determining importance values of the multiple sampled sample sets; correcting each piece of sample data in the multiple sampled sample sets by using a corresponding importance value to obtain corrected sample data; and inputting the corrected sample data into the machine learning system to train the machine learning system.
-
-
-
-
-
-
-
-
-